'TypeError: '<' not supported between instances of 'NoneType' and 'float'

I am following a YouTube tutorial and I wrote this code from the tutorial

import numpy as np
import pandas as pd
from scipy.stats import percentileofscore as score

my_columns = [
  'Ticker', 
  'Price', 
  'Number of Shares to Buy', 
  'One-Year Price Return',
  'One-Year Percentile Return',
  'Six-Month Price Return',
  'Six-Month Percentile Return',
  'Three-Month Price Return',
  'Three-Month Percentile Return',
  'One-Month Price Return',
  'One-Month Percentile Return'
  ]
final_df = pd.DataFrame(columns = my_columns)
# populate final_df here....
pd.set_option('display.max_columns', None)
print(final_df[:1])
time_periods = ['One-Year', 'Six-Month', 'Three-Month', 'One-Month']    
for row in final_df.index:
  for time_period in time_periods:
    change_col = f'{time_period} Price Return'
    print(type(final_df[change_col])) 
    percentile_col = f'{time_period} Percentile Return'
    print(final_df.loc[row, change_col])
    final_df.loc[row, percentile_col] = score(final_df[change_col], final_df.loc[row, change_col])
print(final_df)

It prints my data frame as

| Ticker |  Price  | Number of Shares to Buy | One-Year Price Return  | One-Year Percentile Return | Six-Month Price Return | Six-Month Percentile Return | Three-Month Price Return | Three-Month Percentile Return | One-Month Price Return  | One-Month Percentile Return  |
|--------|---------|-------------------------|------------------------|----------------------------|------------------------|-----------------------------|--------------------------|-------------------------------|-------------------------|------------------------------|
| A      |  120.38 | N/A                     | 0.437579               | N/A                        | 0.280969               | N/A                         | 0.198355                 | N/A                           | 0.0455988               |             N/A              |

But when I call the score function I get this error

<class 'pandas.core.series.Series'>
0.4320217937551543
Traceback (most recent call last):
  File "program.py", line 72, in <module>
    final_df.loc[row, percentile_col] = score(final_df[change_col], final_df.loc[row, change_col])
  File "/Users/abhisheksrivastava/Library/Python/3.7/lib/python/site-packages/scipy/stats/stats.py", line 2017, in percentileofscore
    left = np.count_nonzero(a < score)
TypeError: '<' not supported between instances of 'NoneType' and 'float'

What is going wrong? I see the same code work in the YouTube video. I have next to none experience with Python

Edit:

I also tried

print(type(final_df['One-Year Price Return'])) 
print(type(final_df['Six-Month Price Return'])) 
print(type(final_df['Three-Month Price Return'])) 
print(type(final_df['One-Month Price Return'])) 
for row in final_df.index:
  final_df.loc[row, 'One-Year Percentile Return'] = score(final_df['One-Year Price Return'], final_df.loc[row, 'One-Year Price Return'])
  final_df.loc[row, 'Six-Month Percentile Return'] = score(final_df['Six-Month Price Return'], final_df.loc[row, 'Six-Month Price Return'])
  final_df.loc[row, 'Three-Month Percentile Return'] = score(final_df['Three-Month Price Return'], final_df.loc[row, 'Three-Month Price Return'])
  final_df.loc[row, 'One-Month Percentile Return'] = score(final_df['One-Month Price Return'], final_df.loc[row, 'One-Month Price Return'])
print(final_df)

but it still gets the same error

<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
Traceback (most recent call last):
  File "program.py", line 71, in <module>
    final_df.loc[row, 'One-Year Percentile Return'] = score(final_df['One-Year Price Return'], final_df.loc[row, 'OneYear Price Return'])
  File "/Users/abhisheksrivastava/Library/Python/3.7/lib/python/site-packages/scipy/stats/stats.py", line 2017, in percentileofscore
    left = np.count_nonzero(a < score)
TypeError: '<' not supported between instances of 'NoneType' and 'float'


Solution 1:[1]

What @Taras Mogetich wrote was pretty correct, however you might need to put the if-statement in its own for-loop. Liko so:

for row in hqm_dataframe.index:
    for time_period in time_periods:
    
        change_col = f'{time_period} Price Return'
        percentile_col = f'{time_period} Return Percentile'
        if hqm_dataframe.loc[row, change_col] == None:
            hqm_dataframe.loc[row, change_col] = 0.0

And then separately:

for row in hqm_dataframe.index:
    for time_period in time_periods:
    
        change_col = f'{time_period} Price Return'
        percentile_col = f'{time_period} Return Percentile'

        hqm_dataframe.loc[row, percentile_col] = score(hqm_dataframe[change_col], hqm_dataframe.loc[row, change_col])

Solution 2:[2]

Funny to google the problem I'm having and it's literally the exact same tutorial you're working through!

As mentioned, some data from the API call has a value of None, which causes an error with the percentileofscore function. My solution is to convert all None type to integer 0 upon initial creation of the hqm_dataframe.

hqm_columns = [
    'Ticker',
    'Price',
    'Number of Shares to Buy',
    'One-Year Price Return',
    'One-Year Return Percentile',
    'Six-Month Price Return',
    'Six-Month Return Percentile',
    'Three-Month Price Return',
    'Three-Month Return Percentile',
    'One-Month Price Return',
    'One-Month Return Percentile'
]

hqm_dataframe = pd.DataFrame(columns=hqm_columns)
convert_none = lambda x : 0 if x is None else x

for symbol_string in symbol_strings:
    batch_api_call_url = f'https://sandbox.iexapis.com/stable/stock/market/batch?symbols={symbol_string}&types=price,stats&token={IEX_CLOUD_API_TOKEN}'
    data = requests.get(batch_api_call_url).json()
    
    for symbol in symbol_string.split(','):
        hqm_dataframe = hqm_dataframe.append(
            pd.Series(
                [
                    symbol,
                    data[symbol]['price'],
                    'N/A',
                    convert_none(data[symbol]['stats']['year1ChangePercent']),
                    'N/A',
                    convert_none(data[symbol]['stats']['month6ChangePercent']),
                    'N/A',
                    convert_none(data[symbol]['stats']['month3ChangePercent']),
                    'N/A',
                    convert_none(data[symbol]['stats']['month1ChangePercent']),
                    'N/A'
                ],
                index = hqm_columns
            ),
            ignore_index=True
        )

Solution 3:[3]

Simply replace None values with 0 as follows,

hqm_dataframe.fillna(0,inplace=True)

Solution 4:[4]

After populating final_df, it's also possible to do:

final_df.fillna(value=0, inplace=True)

If you just want to replace each NaN by 0.

Solution 5:[5]

Are you sure that this is the whole code? It returns empty dataframe in my case. Please provide more details

Solution 6:[6]

Most of the other replies are correct, the issue is that there are None values in the dataframe and the percentileofscore method of scipy stats doesn't know how to parse those. I have a different solution that doesn't involve parsing through every entry on the dataframe.

I used the .replace method of dataframes to replace all the None entries with 0. The inplace = True is there so that the changes are saved to the dataframe instead of having to assign it.

hqm_dataframe.replace([None], 0, inplace = True)

Solution 7:[7]

Use np.nan instead 'N/A' and set the float type to the columns.

final_df = pd.DataFrame(columns = my_columns)

for symbol_string in symbol_strings:
    batch_api_call_url = f'https://sandbox.iexapis.com/stable/stock/market/batch?symbols={symbol_string}&types=price,stats&token={IEX_CLOUD_API_TOKEN}'
    data = requests.get(batch_api_call_url).json()
#    print(symbol_string.split(','))
#    print(data['AAPL']['stats'])
    for symbol in symbol_string.split(','):
        final_df = final_df.append(
            pd.Series(
                [
                    symbol,
                    data[symbol]['price'],
                    data[symbol]['stats']['year1ChangePercent'],
                    np.nan
                ],
                index = my_columns
            ),
            ignore_index=True
        )

hqm_df = pd.DataFrame(columns = hqm_columns)

for symbol_string in symbol_strings:
    batch_api_call_url = f'https://sandbox.iexapis.com/stable/stock/market/batch?symbols={symbol_string}&types=price,stats&token={IEX_CLOUD_API_TOKEN}'
    data = requests.get(batch_api_call_url).json()
    for symbol in symbol_string.split(','):
        hqm_df = hqm_df.append(
            pd.Series(
                [
                    symbol,
                    data[symbol]['price'],
                    np.nan,
                    data[symbol]['stats']['year1ChangePercent'],
                    np.nan,
                    data[symbol]['stats']['month6ChangePercent'],
                    np.nan,
                    data[symbol]['stats']['month3ChangePercent'],
                    np.nan,
                    data[symbol]['stats']['month1ChangePercent'],
                    np.nan
                ],
                index = hqm_columns
            ),
            ignore_index=True
        )

hqm_df['One-Year Price Return'] = hqm_df['One-Year Price Return'].astype('float')
hqm_df['Six-Month Price Return'] = hqm_df['Six-Month Price Return'].astype('float')
hqm_df['Three-Month Price Return'] = hqm_df['Three-Month Price Return'].astype('float')
hqm_df['One-Month Price Return'] = hqm_df['One-Month Price Return'].astype('float')

Solution 8:[8]

Basically i converted the series to float and set the default to 0 if the conversion failed as follows

mementum = ['One-Year', 
        'Six-Month',
        'Three-Month',
        'One-Month'
        ]
for period in mementum:
hq_df[f'{period} Price Return'] = hq_df[f'{period} Price Return'].astype(float).fillna(0.0)
for row in hq_df.index:
   for period in mementum:
       hq_df.loc[row, f'{period} Return Percentile'] = stats.percentileofscore(hq_df[f'{period} Price Return'] , hq_df.loc[row, f'{period} Price Return'] )

Solution 9:[9]

No, you don't have to worry about these errors: they are expected. It has to do with the fact that the NAB dataset can have duplicate timestamps. Since in any timeseries the timestamp -> value has to be a 1-1 relatoinship, the dataset loader drops duplicate values for any particular timestamp and only keeps the first.

You can see what's happening in the code here: https://github.com/salesforce/Merlion/blob/4a448de1676e62b6305578e3a21dd92d4d2ac245/ts_datasets/ts_datasets/anomaly/nab.py#L85

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Yohnn
Solution 2 dborski
Solution 3 Simas Joneliunas
Solution 4 gleniosp
Solution 5 shekhar chander
Solution 6 aoa
Solution 7 David Camppos
Solution 8 sayed saad
Solution 9 SalmonKiller