'How to groupby two columns, not considering order of values there?
I have a dataframe:
val1   val2   val3
a       b      10
a       b      2
b       a      3
f       k      5
f       k      2
when i do df.groupby(["val1", "val2"])["val3"].mean().reset_index() I get:
val1   val2   val3
a       b      6
b       a      3
f       k      3.5
but i don't want to take into account order of val1 and val2. so desired result is:
val1   val2   val3
a       b      5
f       k      3.5
How to do that?
Solution 1:[1]
nm = ["val1", "val2"]
grp = df[nm].apply(lambda x: tuple(sorted(list(x))), axis=1)
s = df.val3.groupby(grp).mean()
s.index = pd.MultiIndex.from_tuples(s.index, name=nm)
s.reset_index()
#   val1 val2  val3
# 0    a    b   5.0
# 1    f    k   3.5
Solution 2:[2]
Another solution, with frozenset:
x = (
    df.groupby(df[["val1", "val2"]].apply(frozenset, axis=1))
    .agg({"val1": "first", "val2": "first", "val3": "mean"})
    .reset_index(drop=True)
)
print(x.to_markdown())
Prints:
| val1 | val2 | val3 | |
|---|---|---|---|
| 0 | a | b | 5 | 
| 1 | f | k | 3.5 | 
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source | 
|---|---|
| Solution 1 | |
| Solution 2 | Andrej Kesely | 
