'Group rows based on +- threshold on high dimensional object

I have a large df with coordinates in multiple dimensions. I am trying to create classes (Objects) based on threshold difference between the coordinates. An example df is as below:

df = pd.DataFrame({'x': [1, 2, 3, 4, 5, 6], 'y': [10, 14, 5, 14, 3, 12],'z': [7, 6, 2, 43, 1, 40]})

So based on this df I want to group each row to a class based on -+ 2 across all coordinates. So the df will have a unique group name added to each row. So the output for this threshold function is:

'x' 'y' 'z' 'group'
1   10  7   -
2   14  6   -
3   5   2   G1
4   14  43  -
5   3   1   G1
6   12  40  - 

It is similar to clustering but I want to work on my own threshold functions. How can this done in python.

EDIT To clarify the threshold is based on the similar coordinates. All rows with -+ threshold across all coordinates will be grouped as a single object. It can also be taken as grouping rows based on a threshold across all columns and assigning unique labels to each group.



Solution 1:[1]

As far as I understood, what you need is a function apply. It was not very clear from your statement, whether you need all the differences between the coordinates, or just the neighbouring differences (x-y and y-z). The row 5 has the difference between x and z coordinate 4, but is still assigned to the class G1.

That's why I wrote it for the two possibilities and you can just choose which one you need more:

import pandas as pd
import numpy as np

def your_specific_function(row):
    '''
    For all differences use this:
        diffs = np.array([abs(row.x-row.y), abs(row.y-row.z), abs(row.x-row.z)])
    '''
    # for only x - y, y - z use this:
    diffs = np.diff(row)
    statement = all(diffs <= 2)
    if statement:
        return 'G1'
    else:
        return '-'
df = pd.DataFrame({'x': [1, 2, 3, 4, 5, 6], 'y': [10, 14, 5, 14, 3, 12],'z': [7, 6, 2, 43, 1, 40]})
df['group'] = df.apply(your_specific_function, axis = 1)
print(df.head())

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mackostya