Set theory using the shell
What if you have two files with unique lines and you want to figure out what lines that only exist in one file or what lines the two files have in common? Maybe the first things that come to mind is to write a program in say Java or Perl, but the standard shell tools can be just as efficient and no coding necessary.
To find the lines that are in both files (union):
sort file1 file2 | uniq > union.txt
To find the lines that are in both files (intersection):
sort file1 file2 | uniq -d > intersection.txt
To find the lines that are in file1 but not file2:
sort file1 file2 file2 | uniq -u > only_in_file1.txt
uniq -d
prints only duplicates and uniq -u
prints only unique lines, while sort
sorts it's input file(s). In my experience this is at least as fast as a program written in Python or Java with input files consisting of millions of lines.
2 Comments:
Ahhh, Grasshopper, you have understood the true depth of the shell and are now a master yourself.
Beware, the power you now have is lethal, and may seem anti-social to all but the greatest of dweebs.
3:49 AM
Set theory using the shell
5:03 PM
Post a Comment
<< Home