'pair rows if conditions of multiple columns met
| Store | Sales Amount | Profit |
|---|---|---|
| 27 | 75474 | 9253 |
| 30 | 367852 | 84463 |
| 55 | 79416 | 15401 |
The resulting output should contain pairs of rows which has sales amount +- 3% OR Profit +- 1.5 % of each other
like if store 55's Sales amount fall within the range of +- 3% of sales amount OR +- 1.5 % of profit of any store(lets say 27) the output should be :
| Output|
27-55
df['Lower_range_sales'] = df['Sales_amount'] - df['Sales_amount']*0.03
df['Upper_range_sales'] = df['Sales_amount'] + df['Sales_amount']*0.03
df['Lower_range_Profit'] = df['Profit'] - df['Profit']*0.015
df['Upper_range_Profit'] = df['Profit'] + df['Profit']*0.015
Solution 1:[1]
Let's call stores that meet your condition "within range" of each other. There are no such store in your sample data set so we need to mock it with some randomized data:
# Generate sample data
import string
from itertools import permutations
store_names = ["".join(p) for p in permutations(list(string.ascii_uppercase), 2)]
n = 100
np.random.seed(42)
df = pd.DataFrame({
"Store": np.random.choice(store_names, n, replace=False),
"Sales Amount": np.random.randint(10_000, 1_000_000, n),
"Profit": np.random.randint(0, 100_000, n)
})
# ---------------------------------------------------------
# Code
# ---------------------------------------------------------
# Use numpy broadcast to calculate the lower and upper
# limit of each metric
sales = df["Sales Amount"].to_numpy()[:, None]
sales_lower, sales_upper = (sales * [0.97, 1.03]).T
profit = df["Profit"].to_numpy()[:, None]
profit_lower, profit_upper = (profit * [0.985, 1.015]).T
# mask is an n*n matrix, comparing every store against every
# other store to see if they are within range. If mask[i,j]
# is True, the two stores are within range of each other.
mask = (
((sales_lower <= sales) & (sales <= sales_upper))
| ((profit_lower <= profit) & (profit <= profit_upper))
)
# If mask[i,j] is True, then mask[j,i] is also True. Hence
# we only need the upper triangle of the matrix (np.triu =
# triangle upper). And since mask[i,i] is always True, we
# don't need the diagonal either. Hence k=1.
# nonzero() returns the indices of all True elements.
s1, s2 = np.triu(mask, k=1).nonzero()
# Assemble the result
store = df["Store"].to_numpy()
result = pd.DataFrame({
"Store1": store[s1],
"Store2": store[s2]
})
Result:
Store1 Store2
IV AL # store IV is within range of store AL
IV CG # store IV is within range of store CG
IV ZU
IV DQ
IV RP
... ...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Code Different |
