'"TypeError: unsupported operand type(s) for /: 'str' and 'str'" thrown in pct_change
I have some code that reads stock data with the pandas DataReader. That works perfectly. But I also need to read from CSV files. When I attempt to process it (with the same code I used on the DataReader data), I get "TypeError: unsupported operand type(s) for /: 'str' and 'str'" in pct_change. I thought maybe the CSV had some corrupt numbers in it, but it happens even on a small file like this:
1979-01-01 226.0
1979-01-02 226.8
1979-01-03 218.6
1979-01-04 223.2
1979-01-05 225.5
1979-01-08 223.1
1979-01-09 224.0
Here's the code that throws the error:
def sim_leverage(proxy, leverage=1, expense_ratio = 0.0, initial_value=1.0):
pct_chg = proxy.pct_change(1)
pct_chg = (pct_chg - expense_ratio / 252) * leverage
sim = (1 + pct_chg).cumprod() * initial_value
sim[0] = initial_value
return sim
The proxy argument is a DataFrame returned from DataReader (works) or read_csv() (doesn't work). I have no clue where / why pct_change is accessing strings...!?
Here's the code that reads the data:
if base_sym is None: # Read base symbol data from file? Filename in base_start
base = pd.read_csv(base_start)
else:
base = web.DataReader(base_sym, "yahoo", base_start, end_date)["Adj Close"].rename(base_sym)
Python 3.8.13, pandas 1.3.1.
Solution 1:[1]
Here's a test of read_csv() using your file contents (columns are separated by two spaces, as in the question text):
import pandas as pd
base = pd.read_csv('base_start.txt')
print(f"columns\n{base.columns}")
print(base)
Results:
columns
Index(['1979-01-01 226.0'], dtype='object')
1979-01-01 226.0
0 1979-01-02 226.8
1 1979-01-03 218.6
2 1979-01-04 223.2
3 1979-01-05 225.5
4 1979-01-08 223.1
5 1979-01-09 224.0
It looks like it isn't detecting any separators but rather reading strings like '1979-01-09 224.0' as values in a single column, and it's also inferring that the first row is a column heading '1979-01-01 226.0'. So the error "TypeError: unsupported operand type(s) for /: 'str' and 'str'" raised by pct_change() is apparently referring to successive string values in this lone column.
You can try calling read_csv() and sim_leverage() like this:
import pandas as pd
def sim_leverage(proxy, leverage=1, expense_ratio = 0.0, initial_value=1.0):
pct_chg = proxy.pct_change(1)
pct_chg = (pct_chg - expense_ratio / 252) * leverage
sim = (1 + pct_chg).cumprod() * initial_value
sim[0] = initial_value
return sim
base = pd.read_csv('base_start.txt', sep=' ', header=None, engine='python')
print(f"base:\n{base}")
sim = sim_leverage(base[1])
print(f"sim:\n{sim}")
Results:
base:
0 1
0 1979-01-01 226.0
1 1979-01-02 226.8
2 1979-01-03 218.6
3 1979-01-04 223.2
4 1979-01-05 225.5
5 1979-01-08 223.1
6 1979-01-09 224.0
sim:
0 1.000000
1 1.003540
2 0.967257
3 0.987611
4 0.997788
5 0.987168
6 0.991150
Name: 1, dtype: float64
Note that if we don't use the engine-'python' argument, this code raises the following warning:
ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
base = pd.read_csv('base_start.txt', sep=' ', header=None)
UPDATED:
Based on OP comments, here's an update to what I am seeing.
Contents of .csv input file:
Date,Close
1979-01-01,226.0
1979-01-02,226.8
1979-01-03,218.6
1979-01-04,223.2
1979-01-05,225.5
1979-01-08,223.1
1979-01-09,224.0
Python code:
import pandas as pd
def sim_leverage(proxy, leverage=1, expense_ratio = 0.0, initial_value=1.0):
pct_chg = proxy.pct_change(1)
pct_chg = (pct_chg - expense_ratio / 252) * leverage
sim = (1 + pct_chg).cumprod() * initial_value
sim[0] = initial_value
return sim
base = pd.read_csv('base_start.txt')
print(f"base:\n{base}")
sim = sim_leverage(base['Close'])
print(f"sim:\n{sim}")
Alternative code for calling sim_leverage() (gives the same output):
sim = sim_leverage(base.iloc[:,1])
Output:
base:
Date Close
0 1979-01-01 226.0
1 1979-01-02 226.8
2 1979-01-03 218.6
3 1979-01-04 223.2
4 1979-01-05 225.5
5 1979-01-08 223.1
6 1979-01-09 224.0
sim:
0 1.000000
1 1.003540
2 0.967257
3 0.987611
4 0.997788
5 0.987168
6 0.991150
Name: Close, dtype: float64
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
