'Pandas dataframe.read_csv with a comma in the column reads incorectly

I'm trying to read a csv file with the following values [23, " 5,000.00 ", A]

I need to read the number with a comma, but I'm not able to do it even if I pass a decimal and a quotechar.

I'm reading a CSV that df.read_csv(path, delimiter=',', quotechar='"', decimal=',', thousands='.', dtype=str)

I'm getting the following output if I run df.values.tolist() => ['23', ' " 5', '000.00 "', 'A']

What might be the problem?

The column with this type of data is always different. So I cannot transform it based on column name. dtype=str should be left as a string, as it's necessary for the future.



Solution 1:[1]

You have to set skipinitialspace's value.

I have the following code:

import pandas as pd

result = pd.read_csv("data.csv", sep=',', quotechar='"', skipinitialspace=True)
print(result.columns)

And my data.csv looks like this: 23; " 5,000.0 "; A

Running the code above give this result: Index([u'23', u' 5,000.0 ', u'A'], dtype='object')

skipinitialspace will remove strip off the extra spaces from " 5,000.0 ". If you try to remove the extra spaces by hand, then you can leave out this parameter and it will work just fine as well.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Anti