We can use the uniq, comm command provided by linux to deduplicate, or compare text file contents.

Let’s begin the second part of the intermediate level of word processing.

Unique Result

Let’s prepare some repeated and unique contents for the uniq command:

1
2
3
4
5
6
7
8
9
echo 'I am duplicated' >> uniq_file.txt
echo 'I am duplicated' >> uniq_file.txt
echo 'I am duplicated' >> uniq_file.txt
echo 'I am Duplicated' >> uniq_file.txt
echo 'I am Duplicated' >> uniq_file.txt
echo 'I am Duplicated' >> uniq_file.txt
echo 'Line unique 1' >> uniq_file.txt
echo 'Line unique 2' >> uniq_file.txt
echo 'Line unique 3' >> uniq_file.txt
img

Prepared some repeated and unique contents

By default, the uniq command output does not include subsequently repeated contents:

1
uniq uniq_file.txt
img

Output all but not the subsequently repeated contents

Repeated Contents

We can add the -d or –repeated parameter to output the file’s repeated contents:

1
2
uniq -d uniq_file.txt
uniq --repeated uniq_file.txt
img

Output uniq_file.txt file's repeated contents

Unique Contents

We can add the -u or –unique parameter to output the file’s unique contents:

1
2
uniq -u uniq_file.txt
uniq --unique uniq_file.txt
img

Output uniq_file.txt file's unique contents

Repeated Contents Without Case Sensitivity

We can add the -d or –repeated plus -i or –ignore-case parameters to output the file’s repeated contents without case sensitivity:

1
2
uniq -d -i uniq_file.txt
uniq --repeated --ignore-case uniq_file.txt
img

Output uniq_file.txt file's repeated contents without case sensitivity

Contents With Their Occurrence Numbers

We can add the -c or –count to output the file’s repeated contents without case sensitivity:

1
2
uniq -c uniq_file.txt
uniq --count uniq_file.txt
img

Output uniq_file.txt file's contents with their occurrence numbers

Compared Result

We need to prepare two files to demonstrate the comm command:

1
2
3
4
printf '%s\n' a b c d e     > file1
printf '%s\n'   b c d e f g > file2
cat file1
cat file2
img

Prints contents to file1 and file2

By default, comm outputs three columns of data, the first column is unique to the first file, the second column is unique to the second file, and the third column is the co-existent contents of the two files:

1
comm file1 file2
img

Compares file file1 to file file2

Hide First Column

We can hide the first column with the -1 parameter:

1
comm -1 file1 file2
img

Hide first column

Hide Second Column

We can hide the first column with the -2 parameter:

1
comm -2 file1 file2
img

Hide second column

Hide Third Column

We can hide the first column with the -3 parameter:

1
comm -3 file1 file2
img

Hide third column

Show First Column

We can hide the first column with the -23 parameter:

1
comm -23 file1 file2
img

Show first column

Show Second Column

We can hide the first column with the -13 parameter:

1
comm -13 file1 file2
img

Show second column

Show Third Column

We can hide the first column with the -12 parameter:

1
comm -12 file1 file2
img

Show third column

References 7.3 uniq: Uniquify files, 7.4 comm: Compare two sorted files line by line

Buy me a coffeeBuy me a coffee