'Python : Pandas - ONLY remove NaN rows and move up data, do not move up data in rows with partial NaNs

Alright, so here is my code that I'm currently drafting to pull all national league players fielding stats. It works fine, however, I am interested in knowing how to drop ONLY lines of NaNs in dataframes without disturbing any of the data:

# import libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd

# create a url object
url = r'https://www.baseball-reference.com/leagues/NL/2022-standard-fielding.shtml'

# create list of the stats that we care about
standardFieldingStats = [
    'player',
    'team_ID',
    'G',
    'GS',
    'CG',
    'Inn_def',
    'chances',
    'PO',
    'A',
    'E_def',
    'DP_def',
    'fielding_perc',
    'tz_runs_total',
    'tz_runs_total_per_season',
    'bis_runs_total',
    'bis_runs_total_per_season',
    'bis_runs_good_plays',
    'range_factor_per_nine',
    'range_factor_per_game',
    'pos_summary'
]

# Create object page
page = requests.get(url)

# parser-lxml = Change html to Python friendly format
# Obtain page's information
soup = BeautifulSoup(page.text, 'lxml')

# grab each teams current year batting stats and turn it into a dataframe
tableNLFielding = soup.find('table', id='players_players_standard_fielding_fielding')

# grab player UID
puidList = []
rows = tableNLFielding.select('tr')
for row in rows:
    playerUID = row.select_one('td[data-append-csv]')
    playerUID = playerUID.get('data-append-csv')if playerUID else None
    if playerUID == None:
        continue
    else:
        puidList.append(playerUID)

# grab players position
compList = []
for row in rows:
    thingList = []
    for stat in range(len(standardFieldingStats)):
        thing = row.find("td", attrs={"data-stat" : standardFieldingStats[stat]})
        if thing == None:
            continue
        elif row.find("td", attrs={"data-stat" : 'player'}).text == 'Team Totals':
            continue
        elif row.find("td", attrs={"data-stat" : 'player'}).text == 'Rank in 15 NL teams':
            continue
        elif row.find("td", attrs={"data-stat" : 'player'}).text == 'Rank in 15 AL teams':
            continue
        elif thing.text == '':
            continue
        elif thing.text == 'NaN':
            continue
        else:
            thingList.append(thing.text)
    compList.append(thingList)

# insert the batting headers to a dataframe
NLFieldingDf = pd.DataFrame(data=compList, columns=standardFieldingStats)

#NLFieldingDf = NLFieldingDf.apply(lambda x: pd.Series(x.dropna().values))

#NLFieldingDf = NLFieldingDf.apply(lambda x: pd.Series(x.fillna('').values))

# make all NaNs blanks for aesthic reasons
#NLFieldingDf = NLFieldingDf.fillna('')

#NLFieldingDf.insert(loc=0, column='pUID', value=puidList)

An example is: Dataframe I want to remove NaNs from:

player             team   pos_summary
NaN                NaN    NaN
Brandon Woodruff   NaN    P   
William Woods      ATL    NaN
Kyle Wright        ATL    P

My dataframe when I try looks like this, moving the data out of place:

player             team   pos_summary
Brandon Woodruff   ATL    P   
William Woods      ATL    P
Kyle Wright

Ideally, I want this, but no NaN rows and maintaining rows with partial NaNs:

player             team   pos_summary
Brandon Woodruff          P   
William Woods      ATL    
Kyle Wright        ATL    P

Refer to the end of the complete code to see my attempts.



Solution 1:[1]

try this to remove all NaN rows

df.dropna(how="all")

Further, if you need to replace the NaN values with '', then use

df.fillna('', inplace=True)

Solution 2:[2]

You could do it that way, however, your data isn't accurate. You shouldn't be getting nulls in player position or team.

Secondly, if you need to parse <table> tags (and you don't need to pull out any attributes like a href) let pandas parse that table for you. It uses beautifulsoup under the hood.

import pandas as pd

url = r'https://www.baseball-reference.com/leagues/NL/2022-standard-fielding.shtml'
df = pd.read_html(url)[-1]
df = df[df['Rk'].ne('Rk')]   

Output:

print(df[['Name', 'Tm', 'Pos Summary']])
                 Name   Tm Pos Summary
0         C.J. Abrams  SDP    SS-2B-OF
1    Ronald Acuna Jr.  ATL          OF
2        Willy Adames  MIL          SS
3        Austin Adams  SDP           P
4         Riley Adams  WSN        C-1B
..                ...  ...         ...
509     Miguel Yajure  PIT           P
510  Mike Yastrzemski  SFG          OF
511  Christian Yelich  MIL          OF
512        Juan Yepez  STL          OF
513      Huascar Ynoa  ATL           P

[495 rows x 3 columns]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 chitown88