'Compare 2 DataFrames and modify columns
Here is my problem: I have a DataFrame (named combin) with various information about 9 745 books, including their publisher.
My aim is to assign a code to each of the 1 435 unique publishers, and create a new column (CODE_EDITEUR) with the right code in front of the corresponding publisher.
I would also like to create an index of their frequency (the count of the number of books published by each publisher).
Here is a sample of my df combin :
Titre EDITEUR
0 Book 1 Hachette Jeunesse
1 Book 2 Québec Amérique
2 Book 3 Subjectile
3 Book 4 Hachette Jeunesse
4 Book 5 Belin
...
And here is how my desired df combin would look like this :
Titre EDITEUR CODE_EDITEUR FREQUENCE_EDITEUR
0 Book 1 Hachette Jeunesse 1 2
1 Book 2 Québec Amérique 2 1
2 Book 3 Subjectile 3 1
3 Book 4 Hachette Jeunesse 1 2
4 Book 5 Belin 4 1
...
(NB : Names are in French, but éditeur just means publisher, and fréquence means frequency).
I have tried various methods, including creating a dictionnary for the publishers, but then I can't manage to assign the value tomy df (whether it is with assign or map)...
I also have tried for loops, but with no success.
Here is the last code I tried to write :
combin = combin.assign(CODE_EDITEUR = "")
dfEditeurs = pd.DataFrame(combin, columns=["EDITEUR"])
dfEditeurs = dfEditeurs.groupby('EDITEUR', as_index = False).nunique()
indexEd = dfEditeurs.index.tolist()
dfEditeurs = dfEditeurs.assign(Numéro = indexEd)
for i, r in combin.iterrows() :
if r['EDITEUR'] == dfEdit['EDITEUR'] :
combin.at[i, 'CODE_EDITEUR'] = dfEdit.at[i,'Numéro']
Solution 1:[1]
check this for example:
df = pd.DataFrame({'title':['book1','book2','book3','book4','book5'],
'publisher':['pub1','pub2','pub3','pub1','pub4']})
df = df.astype({'publisher':'category'}).assign(code=df.publisher.cat.codes+1,
freq=df.groupby('publisher')['title'].transform('count'))
'''
title publisher code freq
0 book1 pub1 1 2
1 book2 pub2 2 1
2 book3 pub3 3 1
3 book4 pub1 1 2
4 book5 pub4 4 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | SergFSM |
