'How to write UTF-8 in a CSV file
I am trying to create a text file in csv format out of a PyQt4 QTableWidget. I want to write the text with a UTF-8 encoding because it contains special characters. I use following code:
import codecs
...
myfile = codecs.open(filename, 'w','utf-8')
...
f = result.table.item(i,c).text()
myfile.write(f+";")
It works until the cell contains a special character. I tried also with
myfile = open(filename, 'w')
...
f = unicode(result.table.item(i,c).text(), "utf-8")
But it also stops when a special character appears. I have no idea what I am doing wrong.
Solution 1:[1]
From your shell run:
pip2 install unicodecsv
And (unlike the original question) presuming you're using Python's built in csv module, turn import csv into import unicodecsv as csv in your code.
Solution 2:[2]
Use this package, it just works: https://github.com/jdunck/python-unicodecsv.
Solution 3:[3]
For me the UnicodeWriter class from Python 2 CSV module documentation didn't really work as it breaks the csv.writer.write_row() interface.
For example:
csv_writer = csv.writer(csv_file)
row = ['The meaning', 42]
csv_writer.writerow(row)
works, while:
csv_writer = UnicodeWriter(csv_file)
row = ['The meaning', 42]
csv_writer.writerow(row)
will throw AttributeError: 'int' object has no attribute 'encode'.
As UnicodeWriter obviously expects all column values to be strings, we can convert the values ourselves and just use the default CSV module:
def to_utf8(lst):
return [unicode(elem).encode('utf-8') for elem in lst]
...
csv_writer.writerow(to_utf8(row))
Or we can even monkey-patch csv_writer to add a write_utf8_row function - the exercise is left to the reader.
Solution 4:[4]
The examples in the Python documentation show how to write Unicode CSV files: http://docs.python.org/2/library/csv.html#examples
(can't copy the code here because it's protected by copyright)
Solution 5:[5]
For python2 you can use this code before csv_writer.writerows(rows)
This code will NOT convert integers to utf-8 strings
def encode_rows_to_utf8(rows):
encoded_rows = []
for row in rows:
encoded_row = []
for value in row:
if isinstance(value, basestring):
value = unicode(value).encode("utf-8")
encoded_row.append(value)
encoded_rows.append(encoded_row)
return encoded_rows
Solution 6:[6]
I tried using Bojan's suggestion but it turned all the None cells into the word None rather than blank, and rendered floats as 1.231111111111111e+11, maybe other annoyances. Plus, I want my program to run under both Python3 and Python2. So, I ended up putting at the top of the program:
try:
csv.writer(open(os.devnull, 'w')).writerow([u'\u03bc'])
PREPROCESS = lambda array: array
except UnicodeEncodeError:
logging.warning('csv module cannot handle unicode, patching...')
PREPROCESS = lambda array: [
item.encode('utf8')
if hasattr(item, 'encode') else item
for item in array
]
Then changed all csvout.writerow(row) statements to csvout.writerow(PREPROCESS(row))
I could have used the test if sys.version_info < (3,): instead of the try statement but that violates "duck typing". I may revisit it and write that first one-liner properly with with statements, to get rid of the dangling open file and writer, but then I'd have to use ALL_CAPS variable names or pylint would complain... it should get garbage collected anyway, and in any case only lasts while the script is running.
Solution 7:[7]
A very simple hack is to use the json import instead of csv. For example instead of csv.writer just do the following:
fd = codecs.open(tempfilename, 'wb', 'utf-8')
for c in whatever :
fd.write( json.dumps(c) [1:-1] ) # json dumps writes ["a",..]
fd.write('\n')
fd.close()
Basically, given the list of fields in correct order, the json formatted string is identical to a csv line except for [ and ] at the start and end respectively. And json seems to be robust to utf-8 in python 2.*
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Gijs |
| Solution 3 | Bojan Bogdanovic |
| Solution 4 | Aaron Digulla |
| Solution 5 | pymen |
| Solution 6 | jcomeau_ictx |
| Solution 7 | vpathak |
