Today, I had to find out difference between two huge lists of numbers.
Numbers are 17 digits long and list are of around 1 lac.
PS: I'm documenting both versions here for my future reference.
I used python, because diff doesn't felt good for me. because it will print both ins and outs of both files. Also, I ruled out diff's possibility, because I didn't felt it will work at that time.
Later, at home: I give diff a try.
Yes, it's not that beautiful as in python. I'd cut and sed a bit.
But still, it's a "one liner" and I like 'em a lot.
Numbers are 17 digits long and list are of around 1 lac.
PS: I'm documenting both versions here for my future reference.
I used python, because diff doesn't felt good for me. because it will print both ins and outs of both files. Also, I ruled out diff's possibility, because I didn't felt it will work at that time.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
# A shorter ugly version | |
total = set([i.strip() for i in open("total.txt").readlines()]) # list comprehension to remove \r\n from lines | |
coupon = set([i.strip() for i in open("coupon.txt").readlines()]) # set to remove duplicates and do set difference | |
open("result.txt", "w").write('\n'.join(sorted(total-coupon))) # use set difference and use sorted to sort then write in separate lines | |
""" | |
### Now see same thing above in beautiful & readable way | |
# reading file content into list | |
total = open("total.txt").readlines() | |
coupon = open("coupon.txt").readlines() | |
# striping "\r\n and/or spaces at ends from each line | |
total = [i.strip() for i in total] | |
coupon = [i.strip() for i in coupon] | |
#creating sets from list | |
total_set = set(total) | |
coupon_set = set(coupon) | |
# finding set difference | |
difference = total_set - coupon_set | |
#sorting the result | |
sorted_difference = sorted(difference) | |
# writing the result back into file, one number in a line | |
f_result = open("result.txt", "w") | |
f_result.write('\n'.join(sorted_diffrence)) | |
f_result.close() |
Later, at home: I give diff a try.
Yes, it's not that beautiful as in python. I'd cut and sed a bit.
But still, it's a "one liner" and I like 'em a lot.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ ## First see the one liner | |
$ diff -bBw total.txt coupon.txt | grep '<' | cut -d'<' -f2 | sort -nu | sed -e 's/^[ \t]*//' > result.txt | |
$ ## Now dissect it | |
$ diff -bBw total.txt coupon.txt # it will give us diff, but there is changes for both files | |
$ diff -bBw total.txt coupon.txt | grep '<' # numbers which are not in total but in coupon | |
$ diff -bBw total.txt coupon.txt | grep '<' | cut -d'<' -f2 # remove leading '<' printed by diff | |
$ diff -bBw total.txt coupon.txt | grep '<' | cut -d'<' -f2 | sort -nu # numerically sort & remove duplicates | |
$ diff -bBw total.txt coupon.txt | grep '<' | cut -d'<' -f2 | sort -nu | sed -e 's/^[ \t]*//' # remove leading space | |
$ diff -bBw total.txt coupon.txt | grep '<' | cut -d'<' -f2 | sort -nu | sed -e 's/^[ \t]*//' > result.txt # write result into a file |
Comments
Post a Comment