'Ignore "NaN" in pandas calculations?

I made a simple script that goes to the ECF website and searches for players (API calls) and then downloads their latest results. However, one calculation just shows "nan" if any of the column values are "nan".

import requests
import pandas as pd
from tabulate import tabulate

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 600)

playerurl = "https://www.ecfrating.org.uk/v2/new/api.php?v2/players/name/"
playerfname = input("Enter Player's (approx.) First Name (e.g Joh for John): ")
playerlname = input("Enter Player's (approx.) Last Name (e.g Smi for Smith): ")
URL2 = playerurl+playerfname+"%20"+playerlname

response2 = requests.get(URL2)
data2 = response2.json()
mydata2 = pd.DataFrame.from_dict(data2['players'])
mydata2 = mydata2.drop(['member_no','due_date','club_code','nation','nation2','flag','category'], axis=1)
mydata2.rename(columns = {'full_name':'Full_Name', 'club_name':'Club','gender':'Gender','date_last_game':'LastGame'}, inplace = True)
mydata2.index.name = "Row"

print(tabulate(mydata2, headers='keys', tablefmt='psql'))

chooseplayer = input("Choose Row Number: ")
playerID = mydata2.at[int(chooseplayer),'ECF_code']
baseurl = "https://www.ecfrating.org.uk/v2/new/api.php?v2/games/Standard"
howmanygames = input("Enter how many past games you want to see: ")
URL = baseurl+"/player/"+playerID+"/limit/"+howmanygames
response = requests.get(URL)
data = response.json()

mydata = pd.DataFrame.from_dict(data['games'])
mydata.loc[mydata.score==5,'score']='0.5'
mydata.loc[mydata.colour=="w",'colour']='W'
mydata.loc[mydata.colour=="b",'colour']='B'
mydata = mydata[mydata.opponent_no != 0]
mydata['score'] = pd.to_numeric(mydata['score'],errors='coerce')
mydata['increment'] = pd.to_numeric(mydata['increment'],errors='coerce')
mydata['opponent_rating'] = pd.to_numeric(mydata['opponent_rating'], errors='coerce')
mydata['player_rating'] = pd.to_numeric(mydata['player_rating'], errors='coerce')
mydata['diff_rating'] = mydata['opponent_rating']-mydata['player_rating']
mydata = mydata[mydata.score >= 0]

avgopprating = mydata['opponent_rating'].mean()
totalpercent = mydata['score'].sum()
averageopponent = mydata['diff_rating'].mean()
myrating = mydata['player_rating'].iloc[0]
lastrating = mydata['player_rating'].iloc[-1]
ratingdiff = myrating - lastrating

mydata = mydata.drop(['event_code', 'event_name', 'club_code', 'org_name', 'section_title', 'opponent_no', 'increment'], axis=1)
mydata.rename(columns = {'opponent_name':'Versus', 'opponent_rating':'Rated', 'player_rating':'NewRating', 'game_date':'Date','colour':'Colour','score':'Score', 'diff_rating':'Diff.'}, inplace = True)
mydata.round(0)
mydata.index.name = "Row"

print(f"Percentage Score: {(100*int(totalpercent)/int(howmanygames)):.0f} %")
print(f"Current Rating: {myrating:.0f}")
print(f"Rating change: {ratingdiff:.0f}")
print(f"Average Opponent Rating: {avgopprating:.0f}")

print(tabulate(mydata, headers='keys', tablefmt='psql'))

Sample Output:

Enter Player's (approx.) First Name (e.g Joh for John): jas
Enter Player's (approx.) Last Name (e.g Smi for Smith): ler
+-------+--------------+------------+-----------+---------------+----------+------------+
|   Row | Full_Name    | ECF_code   |   FIDE_no | Club          | Gender   | LastGame   |
|-------+--------------+------------+-----------+---------------+----------+------------|
|     0 | LeRoy, Jason | 233382L    |    461512 | North Bristol | M        | 2022-03-17 |
+-------+--------------+------------+-----------+---------------+----------+------------+
Choose Row Number: 0
Enter how many past games you want to see: 10
Percentage Score: 20 %
Current Rating: 1382
Rating change: nan
Average Opponent Rating: 1591
+-------+------------+----------+---------+----------------------+---------+-------------+---------+
|   Row | Date       | Colour   |   Score | Versus               |   Rated |   NewRating |   Diff. |
|-------+------------+----------+---------+----------------------+---------+-------------+---------|
|     0 | 2022-03-17 | B        |     0   | Marston, Alastair JD |    1756 |        1382 |     374 |
|     1 | 2022-03-01 | B        |     1   | Shipp, Arthur        |    1171 |        1384 |    -213 |
|     2 | 2022-02-17 | W        |     0   | Bartram, James       |    1674 |        1379 |     295 |
|     3 | 2022-02-02 | B        |     0.5 | Paines, John L       |    1390 |        1382 |       8 |
|     4 | 2022-01-20 | W        |     0   | Cullen, Gareth       |    1707 |        1382 |     325 |
|     5 | 2022-01-18 | B        |     0.5 | Paton, John G        |    1571 |        1385 |     186 |
|     6 | 2021-12-09 | B        |     0   | Zielinski, Piotr     |    1587 |        1380 |     207 |
|     7 | 2021-10-12 | B        |     0   | Smith, William       |    1787 |        1385 |     402 |
|     8 | 2021-09-30 | B        |     0   | Macarthur, Duncan M  |    1675 |        1387 |     288 |
|     9 | 2020-02-16 | W        |     0   | Daly, Grant          |     nan |         nan |     nan |
+-------+------------+----------+---------+----------------------+---------+-------------+---------+

The "Rating Change" calculation returns "nan" - I want it to ignore the "nan"s and compute the calculation as usual for the numbers. Tried many things - nothing seems to work!!



Solution 1:[1]

I want it to ignore the "nan"s and compute the calculation as usual for the numbers.

That sentence is strange...there is a reason for NaN ("Not a Number").

Your problem is somewhere between your data request and the pandas.to_numeric() calls. For instance, if nan is non-acceptable in your algorithm, you should avoid using to_numeric(..., errors='coerce'). See the documentation for available options.

Also, double-check the answer you're getting from the api (ie., data).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1