'Remove non-ascii characters from CSV using pandas
I'm querying a table in a SQL Server database and exporting out to a CSV using pandas:
import pandas as pd
df = pd.read_sql_query(sql, conn)
df.to_csv(csvFile, index=False)
Is there a way to remove non-ascii characters when exporting the CSV?
Solution 1:[1]
This was the case I ran into. Here's what worked for me:
import re
regex = re.compile(r'[^\x00-\x7F]+') #regex that matches non-ascii characters
with open(csvFile, 'r') as infile, open('myfile.csv', 'w') as outfile:
for line in infile: #keep looping until we hit EOF (meaning there's no more lines to read)
outfile.write(regex.sub('', line)) #write the current line in the input file to the output file, but if it matches our regex then we replace it with nothing (so it will get removed)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | PJMan0300 |
