'"TypeError: unsupported operand type(s) for /: 'str' and 'str'" thrown in pct_change

I have some code that reads stock data with the pandas DataReader. That works perfectly. But I also need to read from CSV files. When I attempt to process it (with the same code I used on the DataReader data), I get "TypeError: unsupported operand type(s) for /: 'str' and 'str'" in pct_change. I thought maybe the CSV had some corrupt numbers in it, but it happens even on a small file like this:

1979-01-01  226.0
1979-01-02  226.8
1979-01-03  218.6
1979-01-04  223.2
1979-01-05  225.5
1979-01-08  223.1
1979-01-09  224.0

Here's the code that throws the error:

def sim_leverage(proxy, leverage=1, expense_ratio = 0.0, initial_value=1.0):
    pct_chg = proxy.pct_change(1)
    pct_chg = (pct_chg - expense_ratio / 252) * leverage
    sim = (1 + pct_chg).cumprod() * initial_value
    sim[0] = initial_value
    return sim

The proxy argument is a DataFrame returned from DataReader (works) or read_csv() (doesn't work). I have no clue where / why pct_change is accessing strings...!?

Here's the code that reads the data:

    if base_sym is None:    # Read base symbol data from file?  Filename in base_start
        base = pd.read_csv(base_start)
    else:
        base = web.DataReader(base_sym, "yahoo", base_start, end_date)["Adj Close"].rename(base_sym)

Python 3.8.13, pandas 1.3.1.



Solution 1:[1]

Here's a test of read_csv() using your file contents (columns are separated by two spaces, as in the question text):

import pandas as pd
base = pd.read_csv('base_start.txt')
print(f"columns\n{base.columns}")
print(base)

Results:

columns
Index(['1979-01-01  226.0'], dtype='object')
   1979-01-01  226.0
0  1979-01-02  226.8
1  1979-01-03  218.6
2  1979-01-04  223.2
3  1979-01-05  225.5
4  1979-01-08  223.1
5  1979-01-09  224.0

It looks like it isn't detecting any separators but rather reading strings like '1979-01-09 224.0' as values in a single column, and it's also inferring that the first row is a column heading '1979-01-01 226.0'. So the error "TypeError: unsupported operand type(s) for /: 'str' and 'str'" raised by pct_change() is apparently referring to successive string values in this lone column.

You can try calling read_csv() and sim_leverage() like this:

import pandas as pd
def sim_leverage(proxy, leverage=1, expense_ratio = 0.0, initial_value=1.0):
    pct_chg = proxy.pct_change(1)
    pct_chg = (pct_chg - expense_ratio / 252) * leverage
    sim = (1 + pct_chg).cumprod() * initial_value
    sim[0] = initial_value
    return sim

base = pd.read_csv('base_start.txt', sep='  ', header=None, engine='python')
print(f"base:\n{base}")
sim = sim_leverage(base[1])
print(f"sim:\n{sim}")

Results:

base:
            0      1
0  1979-01-01  226.0
1  1979-01-02  226.8
2  1979-01-03  218.6
3  1979-01-04  223.2
4  1979-01-05  225.5
5  1979-01-08  223.1
6  1979-01-09  224.0
sim:
0    1.000000
1    1.003540
2    0.967257
3    0.987611
4    0.997788
5    0.987168
6    0.991150
Name: 1, dtype: float64

Note that if we don't use the engine-'python' argument, this code raises the following warning:

 ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
  base = pd.read_csv('base_start.txt', sep='  ', header=None)

UPDATED:

Based on OP comments, here's an update to what I am seeing.

Contents of .csv input file:

Date,Close
1979-01-01,226.0
1979-01-02,226.8
1979-01-03,218.6
1979-01-04,223.2
1979-01-05,225.5
1979-01-08,223.1
1979-01-09,224.0

Python code:

import pandas as pd
def sim_leverage(proxy, leverage=1, expense_ratio = 0.0, initial_value=1.0):
    pct_chg = proxy.pct_change(1)
    pct_chg = (pct_chg - expense_ratio / 252) * leverage
    sim = (1 + pct_chg).cumprod() * initial_value
    sim[0] = initial_value
    return sim

base = pd.read_csv('base_start.txt')
print(f"base:\n{base}")
sim = sim_leverage(base['Close'])
print(f"sim:\n{sim}")

Alternative code for calling sim_leverage() (gives the same output):

sim = sim_leverage(base.iloc[:,1])

Output:

base:
         Date  Close
0  1979-01-01  226.0
1  1979-01-02  226.8
2  1979-01-03  218.6
3  1979-01-04  223.2
4  1979-01-05  225.5
5  1979-01-08  223.1
6  1979-01-09  224.0
sim:
0    1.000000
1    1.003540
2    0.967257
3    0.987611
4    0.997788
5    0.987168
6    0.991150
Name: Close, dtype: float64

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1