'how to Keep the maximum value and remove other values of a list with the considration of othere lists
there is 3 lists. the first 2 lists show the id and third list is the Value. how to keep the maximum Values in third column with same id and remove the other Values. For example:
| list1 | list2 | list3 |
|---|---|---|
| 1 | 4 | 17 |
| 2 | 32 | 44 |
| 1 | 5 | 7 |
| 2 | 32 | 5 |
The result should be like:
| list1 | list2 | list3 |
|---|---|---|
| 1 | 4 | 17 |
| 2 | 32 | 44 |
| 1 | 5 | 7 |
this lists have more than 10 thousands Values and It would be great to avoid the loops.
Solution 1:[1]
df = pd.DataFrame({
'list1' : [1,2,1,2],
'list2' : [4,32,5,32],
'list3' : [17,44,7,3],
})
You can do it like this:
1.
df.sort_values('list3', ascending=False).drop_duplicates(subset=['list1', 'list2'], keep='first').sort_index()
or 2.
df.groupby(['list1', 'list2'])['list3'].max().reset_index()
Update for 2.:
out = df.groupby(['list1', 'list2'], as_index=False)['list3'].max()
Solution 2:[2]
With loops, you can do something like that :
def get_max(t):
res = []
for row in t:
if t[2] > t[1] and t[2] > t[0]:
res.append(row)
return res
You loop in each row, if the value of the last column is greater than the other ones, you keep it. You can also make it in one line :
def get_max(t):
return [row for row in t if t[2] > t[1] and t[2] > t[0]]
PS : As you get many data, the complexity of this algorithm is O(n) which is linear.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Lukas Laudrain |
