'For Dask, is there something equivalent to ngroup() that is not coumcount()
I am trying to assign a value for each group in dask:
print(df)
| Col1 |
|---|
| a |
| a |
| a |
| c |
| c |
| c |
| c |
| b |
| b |
| b |
| y |
| u |
| i |
df['Col2'] = df.groupby('Col1').ngroup()
print(df)
| Col1 | Col2 |
|---|---|
| a | 1 |
| a | 1 |
| a | 1 |
| c | 2 |
| c | 2 |
| c | 2 |
| c | 2 |
| b | 3 |
| b | 3 |
| b | 3 |
| y | 4 |
| u | 5 |
| i | 6 |
But dask does not recognize ngroup(). Is there an alternative?
# all the different ways I tried to get this going
df['tariff'] = str(np.random.randint(1 , 4, size=len(df), dtype=int)) df df.groupby(by=["b"]).sum() df['tariff'] = df.groupby('uid') df['tariff'] = df.groupby(['uid']).rank() df['tariff'] =str(np.random.randint(1 , 4, size=len(df), dtype=int)) df=df.sort_values('uid') df['account'] = df.groupby(['uid']).ngroup() df['account'] = df.groupby(['uid'])['value'].transform('nunique') df['account'] = df.groupby(['uid']).transform('nunique') df['account'] = df.groupby('uid').transform('ngroup') df['account'] = df.groupby('uid').ngroup() df['account'] = df.groupby(['uid']).cumcount()+1 df['account'] = df.groupby('uid')['value'].nunique() df['account'] = df.groupby(['uid']).transform('nunique') df['account'] = df.map_partitions(pd.rank(), axis="uid") df['account'] = df.groupby(['uid'], sort=False).ngroup() df['account'] ='1000000' + df['account'].astype(str)
Solution 1:[1]
Here's one non-ideal option:
import pandas as pd
import dask.dataframe as dd
df = pd.DataFrame({
'x': list('aabbcd'),
})
ddf = dd.from_pandas(df, npartitions=2)
nuniq = ddf['x'].nunique().compute()
c = list(range(nuniq+1))
ddf.groupby("x").apply(lambda g: g.assign(y = lambda x: c.pop(0)), meta={'x': 'f8', 'y': 'f8'}).compute()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | pavithraes |
