'Sorting a dictionary by value when the values are large float numbers (tried lambda and itemgetter but didn't get proper result)
I have a csv file with people's names and averages as below:
mandana,7.5
hamid,6.066666666666666
sina,11.285714285714286
sara,9.75
soheila,7.833333333333333
ali,5.0
sarvin,11.375
I want to sort it by the averages and write it into another file. I've tried lambda and itemgetter but I didn't get the proper result. Here is my code:
def calculate_sorted_averages(file1, file2):
with open (r'C:\Users\sony\Desktop\Python with Jadi\file1.csv', 'r') as f1:
reader=csv.reader(f1)
d={}
for row in reader:
name=row[0]
average=row[1]
d[name]=average
sorted_dict=OrderedDict(sorted(d.items(), key=operator.itemgetter(1), reverse=True))
with open (r'C:\Users\sony\Desktop\Python with Jadi\file2.csv', 'w', newline='') as f2:
for key in sorted_dict.keys():
writer=csv.writer(f2)
writer.writerow([key,sorted_dict[key]])
And here is my output:
sara,9.75
soheila,7.833333333333333
mandana,7.5
hamid,6.066666666666666
ali,5.0
sarvin,11.375
sina,11.285714285714286
As you can see it is not sorted. I've tried also lambda and it didn't work. I'm now frustrated and don't know what to do. Can anyone help me? Thanks.
Solution 1:[1]
You got your result because you're sorting lexicographically (comparing your floats as strings) instead of sorting by their numeric value.
All you're missing is casting the numeric value to float and you're done, and sort as usual with key=operator.itemgetter(1)
def calculate_sorted_averages(file1, file2):
d = {}
with open (r'path/to/unsorted.csv', 'r') as f1:
reader=csv.reader(f1)
for row in reader:
name=row[0]
average=row[1]
d[name]=float(average)
sorted_dict=OrderedDict(sorted(d.items(), key=operator.itemgetter(1), reverse=True))
with open (r'path/to/sorted.csv', 'w', newline='') as f2:
for key in sorted_dict.keys():
writer=csv.writer(f2)
writer.writerow([key,sorted_dict[key]])
Solution 2:[2]
aaa = {'0': ['mandana', 7.5], '1': ['hamid', 6.066666666666666], '2': ['sina', 11.285714285714286], '3': ['sara', 9.75],
'4': ['soheila', 7.833333333333333], '5': ['ali', 5.0], '6': ['sarvin', 11.375]}
sorted_ = sorted(aaa.items(), key=lambda x: x[1][1])
sorted_ = dict(sorted_)
Output
{'5': ['ali', 5.0], '1': ['hamid', 6.066666666666666], '0': ['mandana', 7.5], '4': ['soheila', 7.833333333333333], '3': ['sara', 9.75], '2': ['sina', 11.285714285714286], '6': ['sarvin', 11.375]}
You didn't show the entire dictionary with the keys. So I created my 'aaa'. Sorting takes place by the second element.
Solution 3:[3]
By default, text read from a file, with or without csv.reader, is stored into strings. You need to call float on the second element of each row, to interpret it as a floating-point number.
I think using an OrderedDict is a bit overkill here. One call to sorted is enough.
import csv
def calculate_sorted_averages(filename_input, filename_output):
with open(filename_input, 'r') as f1:
reader=csv.reader(f1)
sorted_rows = sorted(reader, key=lambda x: float(x[1]))
with open(filename_output, 'w') as f2:
writer = csv.writer(f2)
writer.writerows(sorted_rows)
calculate_sorted_averages('file1.csv', 'file2.csv')
Results:
$ cat file1.csv
mandana,7.5
hamid,6.066666666666666
sina,11.285714285714286
sara,9.75
soheila,7.833333333333333
ali,5.0
sarvin,11.375
$ cat file2.csv
ali,5.0
hamid,6.066666666666666
mandana,7.5
soheila,7.833333333333333
sara,9.75
sina,11.285714285714286
sarvin,11.375
Solution 4:[4]
You can try the pandas module for this.
The pandas.read_csv() function would read the csv file whose path you pass in as a parameter inside the function, and would convert it into a pandas dataframe or in simpler words it would display a table inside Python.
import pandas as pd
df = pd.read_csv("C:\Users\sony\Desktop\Python with Jadi\file1.csv")
df.columns = ["Name", "Value"] # To set the column names. Only do this if the dataframe doesn't already have a column name.
sorted_df = df.sort_values(by = "Value") # Sorting the dataframe by the values in the "Value" column
Output -
| Name | Value | |
|---|---|---|
| 5 | ali | 5.0 |
| 1 | hamid | 6.066666666666666 |
| 0 | mandana | 7.5 |
| 4 | soheila | 7.833333333333333 |
| 3 | sara | 9.75 |
| 2 | sina | 11.285714285714286 |
| 6 | sarvin | 11.375 |
You can convert this dataframe back to a csv file using to_csv(). Pass in the file path as the parameter and set index = False if you don't want the index to be added as a column.
Solution 5:[5]
Pandas can be used for this - you can install it with pip install pandas
import pandas as pd
df = pd.read_csv('filename.csv')
df.columns = ['name', 'value']
df.sort_values('value', inplace=True, ascending=True)
print(df)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Soof Golan |
| Solution 2 | |
| Solution 3 | Stef |
| Solution 4 | Zero |
| Solution 5 | aquaplane |
