Here's a useful video on the topic:
Plain Uniq
If this is the file test.txt:
00
00
01
01
00
00
02
02
This is the output on passing it through the uniq command, either via pipes or as input via STDIN:
Command: uniq < test.txt
00
01
00
02
The first two lines of the original file are the same, . The next two lines are which are followed by two repetitions of again and two repetitions of . The uniq command replaces the consecutive repetitions with only one line in each case.
Uniq with counts
uniq -c < test.txt
This example indicates the count of repetitions for each of the lines it collapses.
If this is the test file, testCounts.txt:
00
00
01
01
00
00
02
02
03
aa
aa
aa
Command: uniq -c < input00.txt
2 00
2 01
2 00
2 02
1 03
3 aa
The first number is the count of the repeated occurrences in the original file.
Printing only duplicate lines
The -d option only prints those lines that are followed by one or more repetitions immediately after them:
uniq -d < testCounts.txt
OR
cat testCounts.txt | uniq -d
OR
uniq -d testCounts.txt
Printing only unique lines
The -u option only prints those lines that are succeeded and preceded by different lines:
uniq -u < testCounts.txt
OR
cat testCounts.txt | uniq -u
OR
uniq -u testCounts.txt
Other possible options:
- Limit comparison only to the first characters (using the -w option).
- Avoid comparing the first characters (using the -s option).
- Ignore variations in case between lines (using the -i option).
- Avoid comparing the first fields (using the -f option).
(This may be useful while processing TSV files when you'd like to ignore the first column if it has serial numbers.)
You might find these examples interesting and useful.