'How to deal with multi-level column names downloaded with yfinance

I have a list of tickers (tickerStrings) that I have to download all at once. When I try to use Pandas' read_csv it doesn't read the CSV file in the way it does when I download the data from yfinance.

I usually access my data by ticker like this: data['AAPL'] or data['AAPL'].Close, but when I read the data from the CSV file it does not let me do that.

if path.exists(data_file):
    data = pd.read_csv(data_file, low_memory=False)
    data = pd.DataFrame(data)
    print(data.head())
else:
    data = yf.download(tickerStrings, group_by="Ticker", period=prd, interval=intv)
    data.to_csv(data_file)

Here's the print output:

                  Unnamed: 0                 OLN               OLN.1               OLN.2               OLN.3  ...                 W.1                 W.2                 W.3                 W.4     W.5
0                        NaN                Open                High                 Low               Close  ...                High                 Low               Close           Adj Close  Volume
1                   Datetime                 NaN                 NaN                 NaN                 NaN  ...                 NaN                 NaN                 NaN                 NaN     NaN
2  2020-06-25 09:30:00-04:00    11.1899995803833  11.220000267028809  11.010000228881836  11.079999923706055  ...   201.2899932861328   197.3000030517578  197.36000061035156  197.36000061035156  112156
3  2020-06-25 09:45:00-04:00  11.130000114440918  11.260000228881836  11.100000381469727   11.15999984741211  ...  200.48570251464844  196.47999572753906  199.74000549316406  199.74000549316406   83943
4  2020-06-25 10:00:00-04:00  11.170000076293945  11.220000267028809  11.119999885559082  11.170000076293945  ...  200.49000549316406  198.19000244140625   200.4149932861328   200.4149932861328   88771

The error I get when trying to access the data:

Traceback (most recent call last):
File "getdata.py", line 49, in processData
    avg = data[x].Close.mean()
AttributeError: 'Series' object has no attribute 'Close'


Solution 1:[1]

To turn it into a dict of d[ticker]=df:

df = yf.download(tickers, group_by="ticker")
d = {idx: gp.xs(idx, level=0, axis=1) for idx, gp in df.groupby(level=0, axis=1)}

Solution 2:[2]

Another option which maintains the pandas dataframe but drops the data you don't need is to change the column index from a multiindex to a single index. Since you only care about the 'Close' column, the first step will be throwing the other ones out:

df = yf.download(...)
df = df[['Close']]

This is great but leaves each column with a multiindex which looks like (Close/AAPL) or (Close/MSFT) etc. What you really want is just the ticker.

df.columns = [col[1] for col in df.columns]

Now if you want to split the dataframe into separate ones for each column you can do this with list comprehension.

separated = [df.iloc[:,i] for i in range(len(df.columns))]

Solution 3:[3]

Use the below line to write and read the CSV file. They will be in the exact format as you downloaded from the yfinance API.

To write to a file

data.to_csv('file_loc')

To read the file

data = pd.read_csv('file_loc', header=[0, 1], index_col=[0])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 user1019288
Solution 2 zfj3ub94rf576hc4eegm
Solution 3 Peter Mortensen