'Why does `list(<pd.DataFrame>)` return a list of column names?
Let's say df is a typical pandas.DataFrame instance, I am trying to understand how come list(df) would return a list of column names.
The goal here is for me to track it down in the source code to understand how list(<pd.DataFrame>) returns a list of column names.
So far, the best resources I've found are the following:
- Get a list from Pandas DataFrame column headers
- Summary: There are multiple ways of getting a list of DataFrame column names, and each varies either in performance or idiomatic convention.
- SO Answer
- Summary: DataFrame follows a dict-like convention, thus coercing with
list()would return a list of the keys of this dict-like structure.
- Summary: DataFrame follows a dict-like convention, thus coercing with
pandas.DataFramesource code:- I can't find within the source code that point to how
list()would create a list of column head names.
- I can't find within the source code that point to how
Solution 1:[1]
DataFrames are iterable. That's why you can pass them to the list constructor.
list(df) is equivalent to [c for c in df]. In both cases, DataFrame.__iter__ is called.
When you iterate over a DataFrame, you get the column names.
Why? Because the developers probably thought this is a nice thing to have.
Looking at the source, __iter__ returns an iterator over the attribute _info_axis, which seems to be the internal name of the columns.
Solution 2:[2]
Actually, as you have correctly stated in your question. One can think of a pandas dataframe as a list of lists (or more correctly a dict like object).
Take a look at this code which takes a dict and parses it into a df.
import pandas as pd
# create a dataframe
d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(d)
print(df)
x = list(df)
print(x)
x = list(d)
print(x)
The result in both cases (for the dataframe df and the dict d) is this:
['col1', 'col2']
['col1', 'col2']
This result confirms your thinking that a "DataFrame follows a dict-like convention" .
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | D.L |
