25. Bash Shell - Text Processing: uniq, comm
Contents
We can use the uniq, comm command provided by linux to deduplicate, or compare text file contents.
Let’s begin the second part of the intermediate level of word processing.
Unique Result
Let’s prepare some repeated and unique contents for the uniq command:
|
|
By default, the uniq command output does not include subsequently repeated contents:
|
|
Repeated Contents
We can add the -d or –repeated parameter to output the file’s repeated contents:
|
|
Unique Contents
We can add the -u or –unique parameter to output the file’s unique contents:
|
|
Repeated Contents Without Case Sensitivity
We can add the -d or –repeated plus -i or –ignore-case parameters to output the file’s repeated contents without case sensitivity:
|
|
Contents With Their Occurrence Numbers
We can add the -c or –count to output the file’s repeated contents with their occurence numbers:
|
|
Compared Result
We need to prepare two files to demonstrate the comm command:
|
|
By default, comm outputs three columns of data, the first column is unique to the first file, the second column is unique to the second file, and the third column is the co-existent contents of the two files:
|
|
Hide First Column
We can hide the first column with the -1 parameter:
|
|
Hide Second Column
We can hide the second column with the -2 parameter:
|
|
Hide Third Column
We can hide the third column with the -3 parameter:
|
|
Show First Column
We can show the first column with the -23 parameter:
|
|
Show Second Column
We can show the second column with the -13 parameter:
|
|
Show Third Column
We can show the third column with the -12 parameter:
|
|
References 7.3 uniq: Uniquify files, 7.4 comm: Compare two sorted files line by line
Author Dong Chen
LastMod Tue Feb 26 2019