'Group rows based on +- threshold on high dimensional object
I have a large df with coordinates in multiple dimensions. I am trying to create classes (Objects) based on threshold difference between the coordinates. An example df is as below:
df = pd.DataFrame({'x': [1, 2, 3, 4, 5, 6], 'y': [10, 14, 5, 14, 3, 12],'z': [7, 6, 2, 43, 1, 40]})
So based on this df I want to group each row to a class based on -+ 2 across all coordinates. So the df will have a unique group name added to each row. So the output for this threshold function is:
'x' 'y' 'z' 'group'
1 10 7 -
2 14 6 -
3 5 2 G1
4 14 43 -
5 3 1 G1
6 12 40 -
It is similar to clustering but I want to work on my own threshold functions. How can this done in python.
EDIT To clarify the threshold is based on the similar coordinates. All rows with -+ threshold across all coordinates will be grouped as a single object. It can also be taken as grouping rows based on a threshold across all columns and assigning unique labels to each group.
Solution 1:[1]
As far as I understood, what you need is a function apply
. It was not very clear from your statement, whether you need all the differences between the coordinates, or just the neighbouring differences (x-y and y-z). The row 5
has the difference between x and z coordinate 4, but is still assigned to the class G1.
That's why I wrote it for the two possibilities and you can just choose which one you need more:
import pandas as pd
import numpy as np
def your_specific_function(row):
'''
For all differences use this:
diffs = np.array([abs(row.x-row.y), abs(row.y-row.z), abs(row.x-row.z)])
'''
# for only x - y, y - z use this:
diffs = np.diff(row)
statement = all(diffs <= 2)
if statement:
return 'G1'
else:
return '-'
df = pd.DataFrame({'x': [1, 2, 3, 4, 5, 6], 'y': [10, 14, 5, 14, 3, 12],'z': [7, 6, 2, 43, 1, 40]})
df['group'] = df.apply(your_specific_function, axis = 1)
print(df.head())
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | mackostya |