'pair rows if conditions of multiple columns met

Store	Sales Amount	Profit
27	75474	9253
30	367852	84463
55	79416	15401

The resulting output should contain pairs of rows which has sales amount +- 3% OR Profit +- 1.5 % of each other

like if store 55's Sales amount fall within the range of +- 3% of sales amount OR +- 1.5 % of profit of any store(lets say 27) the output should be :

| Output|

27-55

df['Lower_range_sales'] = df['Sales_amount'] - df['Sales_amount']*0.03
df['Upper_range_sales'] = df['Sales_amount'] + df['Sales_amount']*0.03
df['Lower_range_Profit'] = df['Profit'] - df['Profit']*0.015
df['Upper_range_Profit'] = df['Profit'] + df['Profit']*0.015

Solution 1:^[1]

Let's call stores that meet your condition "within range" of each other. There are no such store in your sample data set so we need to mock it with some randomized data:

# Generate sample data
import string
from itertools import permutations

store_names = ["".join(p) for p in permutations(list(string.ascii_uppercase), 2)]

n = 100
np.random.seed(42)
df = pd.DataFrame({
    "Store": np.random.choice(store_names, n, replace=False),
    "Sales Amount": np.random.randint(10_000, 1_000_000, n),
    "Profit": np.random.randint(0, 100_000, n)
})

# ---------------------------------------------------------
# Code
# ---------------------------------------------------------

# Use numpy broadcast to calculate the lower and upper
# limit of each metric
sales = df["Sales Amount"].to_numpy()[:, None]
sales_lower, sales_upper = (sales * [0.97, 1.03]).T

profit = df["Profit"].to_numpy()[:, None]
profit_lower, profit_upper = (profit * [0.985, 1.015]).T

# mask is an n*n matrix, comparing every store against every
# other store to see if they are within range. If mask[i,j]
# is True, the two stores are within range of each other.
mask = (
    ((sales_lower <= sales) & (sales <= sales_upper))
    | ((profit_lower <= profit) & (profit <= profit_upper)) 
)

# If mask[i,j] is True, then mask[j,i] is also True. Hence
# we only need the upper triangle of the matrix (np.triu =
# triangle upper). And since mask[i,i] is always True, we
# don't need the diagonal either. Hence k=1.
# nonzero() returns the indices of all True elements.
s1, s2 = np.triu(mask, k=1).nonzero()

# Assemble the result
store = df["Store"].to_numpy()
result = pd.DataFrame({
    "Store1": store[s1],
    "Store2": store[s2]
})

Result:

Store1   Store2
     IV      AL   # store IV is within range of store AL
     IV      CG   # store IV is within range of store CG
     IV      ZU
     IV      DQ
     IV      RP
    ...     ...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Code Different

'pair rows if conditions of multiple columns met

| Output|

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]