'Compare 2 DataFrames and modify columns

Here is my problem: I have a DataFrame (named combin) with various information about 9 745 books, including their publisher. My aim is to assign a code to each of the 1 435 unique publishers, and create a new column (CODE_EDITEUR) with the right code in front of the corresponding publisher.

I would also like to create an index of their frequency (the count of the number of books published by each publisher).

Here is a sample of my df combin :

    Titre   EDITEUR
0   Book 1  Hachette Jeunesse
1   Book 2  Québec Amérique
2   Book 3  Subjectile
3   Book 4  Hachette Jeunesse
4   Book 5  Belin
...

And here is how my desired df combin would look like this :

    Titre   EDITEUR             CODE_EDITEUR    FREQUENCE_EDITEUR
0   Book 1  Hachette Jeunesse   1               2
1   Book 2  Québec Amérique     2               1
2   Book 3  Subjectile          3               1
3   Book 4  Hachette Jeunesse   1               2
4   Book 5  Belin               4               1
...

(NB : Names are in French, but éditeur just means publisher, and fréquence means frequency).


I have tried various methods, including creating a dictionnary for the publishers, but then I can't manage to assign the value tomy df (whether it is with assign or map)...

I also have tried for loops, but with no success.

Here is the last code I tried to write :

combin = combin.assign(CODE_EDITEUR = "")

dfEditeurs = pd.DataFrame(combin, columns=["EDITEUR"])

dfEditeurs = dfEditeurs.groupby('EDITEUR', as_index = False).nunique()

indexEd = dfEditeurs.index.tolist()

dfEditeurs = dfEditeurs.assign(Numéro = indexEd)


for i, r in combin.iterrows() :

      if r['EDITEUR'] == dfEdit['EDITEUR'] :

          combin.at[i, 'CODE_EDITEUR'] = dfEdit.at[i,'Numéro']


Solution 1:[1]

check this for example:

df = pd.DataFrame({'title':['book1','book2','book3','book4','book5'],
                   'publisher':['pub1','pub2','pub3','pub1','pub4']})

df = df.astype({'publisher':'category'}).assign(code=df.publisher.cat.codes+1,
                                                freq=df.groupby('publisher')['title'].transform('count'))
'''
   title publisher  code  freq
0  book1      pub1     1     2
1  book2      pub2     2     1
2  book3      pub3     3     1
3  book4      pub1     1     2
4  book5      pub4     4     1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 SergFSM