'Insert dictionary of lists as column into a sliced dataframe
In a follow-up of my previous question I am trying add another column to the following sliced dataframe:
>>> df = pd.DataFrame(np.array([[1, 1, 1, 1, 2, 2, 2], [0, 0, 0, 1, 0, 0, 1], ['some text', 'other text', 'more text', 'new text', 'text sample', 'sample', 'sample text'], ['kw1, kw2', 'kw1, kw2, kw3', 'kw1', 'kw1, kw2, kw3, kw4', 'kw1', 'kw1, kw2, kw3', 'kw1, kw2']), columns=['value', 'cluster', 'text', 'keywords'])
>>> result = df.groupby(['value', 'cluster', 'text']).keywords.sum().to_frame()
>>> result =
value cluster text keywords
1 0 some text kw1, kw2
other text kw1, kw2, kw3
more text kw1
1 new text kw1, kw2, kw3, kw4
2 0 text sample kw1
sample kw1, kw2, kw3
1 sample text kw1, kw2
Based on the last question, the content of the column I want to add should be based on a dictionary like this:
>>> summary2 = {0: ['some, summary', 'this, too, summ'], 1: ['kws, of, summ', 'summ, based, kw']}
EDIT: My plan is to match the keys of the dictionary with the column "value" and the items within the dictionary lists with the cluster. I want to match the values by position, e.g. key 0 matches value 1 and item "some, summary" matches cluster 0. So I receive this output:
value cluster summary text keywords
1 0 some, summary some text kw1, kw2
other text kw1, kw2, kw3
more text kw1
1 this, too, summ new text kw1, kw2, kw3, kw4
2 0 kws, of, summ text sample kw1
sample kw1, kw2, kw3
1 summ, based, kw sample text kw1, kw2
What I've tried so far is the following:
result['summary2'] = result.groupby(['value','cluster']).ngroup().map({item: k for k, v in summary2.items() for item in v})
The column however outputs only NaNs.
Solution 1:[1]
IIUC, you can try apply on rows
result = df.groupby(['value', 'cluster', 'text']).keywords.sum().to_frame()
summary2 = {0: ['some, summary', 'this, too, summ'], 1: ['kws, of, summ', 'summ, based, kw']}
result = (result.assign(summary=result.apply(lambda row: summary2[row.name[0]-1][row.name[1]], axis=1))
.set_index('summary', append=True)
.reorder_levels(["value", "cluster", "summary", "text"]))
print(result)
keywords
value cluster summary text
1 0 some, summary more text kw1
other text kw1, kw2, kw3
some text kw1, kw2
1 this, too, summ new text kw1, kw2, kw3, kw4
2 0 kws, of, summ sample kw1, kw2, kw3
text sample kw1
1 summ, based, kw sample text kw1, kw2
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ynjxsjmh |
