'Dict comprehension with pandas iterrows

I have some confusion with the output of the way I use dict comprehension~

birthday_dict = {data.name: (data.month, data.day) for (index, data) in df_birthday.iterrows()}

gives me

{0: (3, 24), 1: (6, 20), 2: (1, 20)}
birthday_dict = {data["name"]: (data.month, data.day) for (index, data) in df_birthday.iterrows()}

gives me

{'John': (3, 24), 'Alex': (6, 20), 'Dave': (1, 20)}

I thought that data.name is the same as data["name"] however they give different results when inserted as key index. I definitely prefer the latter one which makes the dictionary clearer on who's birthday the key value is giving but just wanted know the reasoning behind the different output in key index.

Thanks for your time!



Solution 1:[1]

That's because df.iterrows return (index, Series) pairs, and such Series has a name attribute as an index:

print(df.iloc[0])

name     John
month       3
day        24
Name: 0, dtype: object

You can see that there's a Name, and when you do data.name, what it returns is not the content of the series (i.e. "John"), but its metadata Name: 0.

Note that pandas will refer to the metadata before it looks up for content. Let's double check with another metadata dtype:

   name  month  day dtype
0  John      3   24  aaaa
1  Alex      6   20  aaaa
2  Dave      1   20  aaaa

{data.name: (data.month, data.day, data.dtype) for (index, data) in df.iterrows()}
# {0: (3, 24, dtype('O')), 1: (6, 20, dtype('O')), 2: (1, 20, dtype('O'))}

That being said, it is dangerous to get an item by attribute calling (i.e. data.{something}.

Instead, you should try to use indexing like data["something"]:

{data["name"]: (data["month"], data["day"], data["dtype"]) for (index, data) in df.iterrows()}

Output:

{'John': (3, 24, 'aaaa'), 'Alex': (6, 20, 'aaaa'), 'Dave': (1, 20, 'aaaa')}

You can see both name and dtype are now from the content, not metadata.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1