'How to deal with multi-level column names downloaded with yfinance
I have a list of tickers (tickerStrings) that I have to download all at once. When I try to use Pandas' read_csv it doesn't read the CSV file in the way it does when I download the data from yfinance.
I usually access my data by ticker like this: data['AAPL'] or data['AAPL'].Close, but when I read the data from the CSV file it does not let me do that.
if path.exists(data_file):
data = pd.read_csv(data_file, low_memory=False)
data = pd.DataFrame(data)
print(data.head())
else:
data = yf.download(tickerStrings, group_by="Ticker", period=prd, interval=intv)
data.to_csv(data_file)
Here's the print output:
Unnamed: 0 OLN OLN.1 OLN.2 OLN.3 ... W.1 W.2 W.3 W.4 W.5
0 NaN Open High Low Close ... High Low Close Adj Close Volume
1 Datetime NaN NaN NaN NaN ... NaN NaN NaN NaN NaN
2 2020-06-25 09:30:00-04:00 11.1899995803833 11.220000267028809 11.010000228881836 11.079999923706055 ... 201.2899932861328 197.3000030517578 197.36000061035156 197.36000061035156 112156
3 2020-06-25 09:45:00-04:00 11.130000114440918 11.260000228881836 11.100000381469727 11.15999984741211 ... 200.48570251464844 196.47999572753906 199.74000549316406 199.74000549316406 83943
4 2020-06-25 10:00:00-04:00 11.170000076293945 11.220000267028809 11.119999885559082 11.170000076293945 ... 200.49000549316406 198.19000244140625 200.4149932861328 200.4149932861328 88771
The error I get when trying to access the data:
Traceback (most recent call last):
File "getdata.py", line 49, in processData
avg = data[x].Close.mean()
AttributeError: 'Series' object has no attribute 'Close'
Solution 1:[1]
To turn it into a dict of d[ticker]=df:
df = yf.download(tickers, group_by="ticker")
d = {idx: gp.xs(idx, level=0, axis=1) for idx, gp in df.groupby(level=0, axis=1)}
Solution 2:[2]
Another option which maintains the pandas dataframe but drops the data you don't need is to change the column index from a multiindex to a single index. Since you only care about the 'Close' column, the first step will be throwing the other ones out:
df = yf.download(...)
df = df[['Close']]
This is great but leaves each column with a multiindex which looks like (Close/AAPL) or (Close/MSFT) etc. What you really want is just the ticker.
df.columns = [col[1] for col in df.columns]
Now if you want to split the dataframe into separate ones for each column you can do this with list comprehension.
separated = [df.iloc[:,i] for i in range(len(df.columns))]
Solution 3:[3]
Use the below line to write and read the CSV file. They will be in the exact format as you downloaded from the yfinance API.
To write to a file
data.to_csv('file_loc')
To read the file
data = pd.read_csv('file_loc', header=[0, 1], index_col=[0])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | user1019288 |
| Solution 2 | zfj3ub94rf576hc4eegm |
| Solution 3 | Peter Mortensen |
