Every programmer wishes to write some code, which is both elegant, and readable. A master piece to reference to...
And there may not be a single programmer, who said at least once in their life time that, "If I had time, I'd rewrite it" or any similar one.
Today, let's discuss, how the idea of sorting a list of list went on,
I had to sort a list of lists, that's the result of a search, comes from Web Service. After that, the result is extracted from the XML document that's returned and formatted for front end. So, now all the data is in the form of strings. And I'm ended up with the task of sorting a list of list of strings, based on different items in the inner lists @ different times. And Obviously we can't compare the data as strings, we have to convert them into their proper data type before comparison.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sorted([[5, 'Man', 1], [2, 'Arun', 6], [4, 'Dummy', 6], [1, 'Naveen', 3], [3, 'Bajsd', 7]], key=lambda x: str(x[1]).lower()) |
This is a bad idea, that we are doing array sub-scripting on the lambda itself. Which is a lot inflexible. A quick visit to python wiki showed me, how to convert a cmp to key argument. This is more flexible, Now I can tell, which column to sort, how to convert that particular column to proper data type etc.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def compare(x, y): | |
return y < x | |
def cmp_to_key(mycmp, col, covert_to_proper_datatype): | |
class K(object): | |
class K(object): | |
def __init__(self, obj, *args): | |
self.obj = obj | |
def __lt__(self, other): | |
return mycmp(covert_to_proper_datatype(self.obj[col]), covert_to_proper_datatype(other.obj[col])) == False | |
def __gt__(self, other): | |
return mycmp(covert_to_proper_datatype(self.obj[col]), covert_to_proper_datatype(other.obj[col])) == True | |
def __eq__(self, other): | |
return mycmp(covert_to_proper_datatype(self.obj[col]), covert_to_proper_datatype(other.obj[col])) == False | |
return K | |
''' | |
test cases | |
>>>sorted([[5, 'Man', 1], [2, 'Arun', 6], [4, 'Dummy', 6], [1, 'Naveen', 3], [3, 'Bajsd', 7]], key=cmp_to_key(compare, 0, int), reverse=True) | |
>>>sorted([[5, 'Man', 1], [2, 'Arun', 6], [4, 'Dummy', 6], [1, 'Naveen', 3], [3, 'Bajsd', 7]], key=cmp_to_key(compare, 1, str), reverse=True) # w/ capitals into consideration | |
>>>sorted([[5, 'Man', 1], [2, 'Arun', 6], [4, 'Dummy', 6], [1, 'Naveen', 3], [3, 'Bajsd', 7]], key=cmp_to_key(compare, 1, lower), reverse=True) # w/o capitals into consideration | |
>>>sorted([[5, 'Man', "09/27/2011"], [2, 'Arun', "09/27/2011"], [4, 'Dummy', "09/27/2011"], [1, 'Naveen', "09/27/2011"], [3, 'Bajsd', "09/27/2011"]], key=cmp_to_key(compare, 2, lambda x: str(x).lower()), reverse=True) | |
from datetime import datetime as dt | |
>>>sorted([[5, 'Man', "4343-343-4343"], [2, 'Arun', "4343-343-4343"], [4, 'Dummy', "4343-343-4343"], [1, 'Naveen', "4343-343-4343"], [3, 'Bajsd', "4343-343-4343"]], key=cmp_to_key(compare, 2, lambda x: int(x.replace('-', ''))) | |
>>>import xml.dom.minidom as m | |
>>>sorted([[5, 'Man', "<a href="#">12</a>"], [2, 'Arun',"<a href="#">1</a>"], [4, 'Dummy', "<a href="#">22</a>"], [1, 'Naveen', "<a href="#">342</a>"], [3, 'Bajsd', "<a href="#">45</a>"]], key=cmp_to_key(compare, 2, lambda x: m.parseString(x).childNodes[0].childNodes[0].nodeValue), reverse=True) | |
''' |
Now, compare() is a very simple function, which I don't have to define separately, so I put that into cmp_to_key() function as a default parameter, a lambda.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def cmp_to_key(col, covert_to_proper_datatype, cmpr = (lambda x, y: x < y)): | |
class K(object): | |
def __init__(self, obj, *args): | |
self.obj = obj | |
def __lt__(self, other): | |
return cpmr(covert_to_proper_datatype(self.obj[col]), covert_to_proper_datatype(other.obj[col])) == True | |
return K |
But, hold on, Why should we create a lambda, if we can do that comparison directly, without creating a function first? So, our lambda got removed and comparison is performed directly on __lt__ special function
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def cmp_to_key(col, covert_to_proper_datatype): | |
#Discrad this approch, once we have an option to sort via webmethods, sorting by DB engine is more efficient than doing it ourselves. | |
class K(object): | |
def __init__(self, obj, *args): | |
self.obj = obj | |
def __lt__(self, other): | |
return covert_to_proper_datatype(self.obj[col]) < covert_to_proper_datatype(other.obj[col]) | |
return K |
Ok, now we are in good shape. Now, we can shift our focus into, How to specify a a function, which will convert a string in the lists to a proper data type.
I'm using DataTables jQuery plugin for client side display of paginating table. So, what I get back to server to determine column is a number. So, I made a list, which I can map to in sorting function based on which column to sort.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mapping_tbl = [] | |
mapping_tbl.insert(0, lambda x: int(m.parseString(x).childNodes[0].childNodes[0].nodeValue)) | |
mapping_tbl.insert(1, lambda x: x.lower()) | |
mapping_tbl.insert(2, lambda x: x.lower()) | |
mapping_tbl.insert(3, lambda x: int(x)) | |
mapping_tbl.insert(4, lambda x: datetime.strptime(x, "%m/%d/%Y")) | |
mapping_tbl.insert(5, lambda x: int(x.replace('-', ''))) | |
mapping_tbl.insert(6, lambda x: datetime.strptime(x, "%m/%d/%Y")) | |
mapping_tbl.insert(7, lambda x: datetime.strptime(x, "%m/%d/%Y")) | |
mapping_tbl.insert(8, lambda x: x.lower()) | |
mapping_tbl.insert(9, lambda x: x.lower()) | |
search_data = sorted(search_data, key=cmp_to_key(column_to_sort, mapping_tbl[column_to_sort]), reverse = (sort_dir != "asc")) |
Here, we are making a lot of repetitions, with DRY principle in mind, I decided to normalize those, like Database normalizations, you can say. But there's another reason for the decision, we have another type of searches, whose return records varies, and in effect will create a lot of duplicates of anonymous functions which does same thing. So here goes the solution
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mapping_functions = { | |
'str_lower' : lambda x: x.lower(), | |
'date_mdY' : lambda x: datetime.strptime(x, "%m/%d/%Y"), | |
'ssn' : lambda x: int(x.replace('-', '')) | |
'int' : lambda x: int(x) | |
'dom' : lambda x: int(m.parseString(x).childNodes[0].childNodes[0].nodeValue) | |
} | |
mapping_tbl = [] | |
mapping_tbl.insert(0, mapping_functions['dom']) | |
mapping_tbl.insert(1, mapping_functions['str_lower']) | |
mapping_tbl.insert(2, mapping_functions['str_lower']) | |
mapping_tbl.insert(3, mapping_functions['int']) | |
mapping_tbl.insert(4, mapping_functions['date_mdY']) | |
mapping_tbl.insert(5, mapping_functions['ssn']) | |
mapping_tbl.insert(6, mapping_functions['date_mdY']) | |
mapping_tbl.insert(7, mapping_functions['date_mdY']) | |
mapping_tbl.insert(8, mapping_functions['str_lower']) | |
mapping_tbl.insert(9, mapping_functions['str_lower']) | |
search_data = sorted(search_data, key=cmp_to_key(column_to_sort, mapping_tbl[column_to_sort]), reverse = (sort_dir != "asc")) |
Here goes final solution, bear in mind this is simplified version, I removed all the complexities and simplifications involved, because of Classes, packages, etc., which is there in real system. But, I do believe, this shows how a piece of code evolved to solve a problem, And why it took time and knowledge to become perfect / or near perfect.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from datetime import datetime | |
import xml.dom.minidom as m | |
def cmp_to_key(col, covert_to_proper_datatype): | |
class K(object): | |
def __init__(self, obj, *args): | |
self.obj = obj | |
def __lt__(self, other): | |
return covert_to_proper_datatype(self.obj[col]) < covert_to_proper_datatype(other.obj[col]) | |
return K | |
mapping_functions = { | |
'str_lower' : lambda x: x.lower(), | |
'date_mdY' : lambda x: datetime.strptime(x, "%m/%d/%Y"), | |
'ssn' : lambda x: int(x.replace('-', '')) | |
'int' : lambda x: int(x) | |
'dom' : lambda x: int(m.parseString(x).childNodes[0].childNodes[0].nodeValue) | |
} | |
mapping_tbl = [] | |
mapping_tbl.insert(0, mapping_functions['dom']) | |
mapping_tbl.insert(1, mapping_functions['str_lower']) | |
mapping_tbl.insert(2, mapping_functions['str_lower']) | |
mapping_tbl.insert(3, mapping_functions['int']) | |
mapping_tbl.insert(4, mapping_functions['date_mdY']) | |
mapping_tbl.insert(5, mapping_functions['ssn']) | |
mapping_tbl.insert(6, mapping_functions['date_mdY']) | |
mapping_tbl.insert(7, mapping_functions['date_mdY']) | |
mapping_tbl.insert(8, mapping_functions['str_lower']) | |
mapping_tbl.insert(9, mapping_functions['str_lower']) | |
search_data = sorted(search_data, key=cmp_to_key(column_to_sort, mapping_tbl[column_to_sort]), reverse = (sort_dir != "asc")) |
NB: sorted function uses a condition at end to assign a value to reverse, which is actually a Boolean. That single statement alone undergone a lot of transformation in the process. Well, that's for another blog entry.
Comments
Post a Comment