'Remove white space from entire DataFrame

i have a dataframe, 22 columns and 65 rows. The data comes in from csv file. Each of the values with dataframe has an extra unwanted whitespace. So if i do a loop on 'Year' column with a Len() i get

2019  5
2019  5
2018  5
...

this 1 extra whitespace appears throughout DF in every value. I tried running a .strip() on DF but no attribute exists

i tried a 'for each df[column].str.strip() but there are various data types in each column... dtypes: float64(6), int64(4), object(14) , so this errors.

any ideas on how to apply a function for entire dataframe, and if so, what function/method? if not what is best way to handle?



Solution 1:[1]

you should use apply() function in order to do this :

df['Year'] = df['Year'].apply(lambda x:x.strip() )

you can apply this function on each column separately :

for column in df.columns:
    df[column] = df[column].apply(lambda x:x.strip() )

Solution 2:[2]

Handle the error:

for col in df.columns:
    try:
        df[col] = df[col].str.strip()
    except AttributeError:
        pass

Normally, I'd say select the object dtypes, but that can still be problematic if the data are messy enough to store numeric data in an object container.

import pandas as pd

df = pd.DataFrame({'foo': [1, 2, 3], 'bar': ['seven ']*3})
df['foo2'] = df.foo.astype(object)

for col in df.select_dtypes('object'):
    df[col] = df[col].str.strip()
#AttributeError: Can only use .str accessor with string values!

Solution 3:[3]

Try this:

for column in df.columns:
    df[column] = df[column].apply(lambda x: str(x).replace('  ', ' '))

Solution 4:[4]

Why not try this?

for column in df.columns:
    df[column] = df[column].apply(lambda x: str(x).strip())

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mohsen_Fatemi
Solution 2 ALollz
Solution 3 Emad
Solution 4 MING JUN LIM