'How to write UTF-8 in a CSV file

I am trying to create a text file in csv format out of a PyQt4 QTableWidget. I want to write the text with a UTF-8 encoding because it contains special characters. I use following code:

import codecs
...
myfile = codecs.open(filename, 'w','utf-8')
...
f = result.table.item(i,c).text()
myfile.write(f+";")

It works until the cell contains a special character. I tried also with

myfile = open(filename, 'w')
...
f = unicode(result.table.item(i,c).text(), "utf-8")

But it also stops when a special character appears. I have no idea what I am doing wrong.

Solution 1:^[1]

From your shell run:

pip2 install unicodecsv

And (unlike the original question) presuming you're using Python's built in csv module, turn
import csv into
import unicodecsv as csv in your code.

Solution 2:^[2]

Use this package, it just works: https://github.com/jdunck/python-unicodecsv.

Solution 3:^[3]

For me the UnicodeWriter class from Python 2 CSV module documentation didn't really work as it breaks the csv.writer.write_row() interface.

For example:

csv_writer = csv.writer(csv_file)
row = ['The meaning', 42]
csv_writer.writerow(row)

works, while:

csv_writer = UnicodeWriter(csv_file)
row = ['The meaning', 42]
csv_writer.writerow(row)

will throw AttributeError: 'int' object has no attribute 'encode'.

As UnicodeWriter obviously expects all column values to be strings, we can convert the values ourselves and just use the default CSV module:

def to_utf8(lst):
    return [unicode(elem).encode('utf-8') for elem in lst]

...
csv_writer.writerow(to_utf8(row))

Or we can even monkey-patch csv_writer to add a write_utf8_row function - the exercise is left to the reader.

Solution 4:^[4]

The examples in the Python documentation show how to write Unicode CSV files: http://docs.python.org/2/library/csv.html#examples

(can't copy the code here because it's protected by copyright)

Solution 5:^[5]

For python2 you can use this code before csv_writer.writerows(rows)
This code will NOT convert integers to utf-8 strings

def encode_rows_to_utf8(rows):
    encoded_rows = []
    for row in rows:
        encoded_row = []
        for value in row:
            if isinstance(value, basestring):
                value = unicode(value).encode("utf-8")
            encoded_row.append(value)
        encoded_rows.append(encoded_row)
    return encoded_rows

Solution 6:^[6]

I tried using Bojan's suggestion but it turned all the None cells into the word None rather than blank, and rendered floats as 1.231111111111111e+11, maybe other annoyances. Plus, I want my program to run under both Python3 and Python2. So, I ended up putting at the top of the program:

try:
    csv.writer(open(os.devnull, 'w')).writerow([u'\u03bc'])
    PREPROCESS = lambda array: array
except UnicodeEncodeError:
    logging.warning('csv module cannot handle unicode, patching...')
    PREPROCESS = lambda array: [
        item.encode('utf8')
        if hasattr(item, 'encode') else item
        for item in array
    ]

Then changed all csvout.writerow(row) statements to csvout.writerow(PREPROCESS(row))

I could have used the test if sys.version_info < (3,): instead of the try statement but that violates "duck typing". I may revisit it and write that first one-liner properly with with statements, to get rid of the dangling open file and writer, but then I'd have to use ALL_CAPS variable names or pylint would complain... it should get garbage collected anyway, and in any case only lasts while the script is running.

Solution 7:^[7]

A very simple hack is to use the json import instead of csv. For example instead of csv.writer just do the following:

    fd = codecs.open(tempfilename, 'wb', 'utf-8')  
    for c in whatever :
        fd.write( json.dumps(c) [1:-1] )   # json dumps writes ["a",..]
        fd.write('\n')
    fd.close()

Basically, given the list of fields in correct order, the json formatted string is identical to a csv line except for [ and ] at the start and end respectively. And json seems to be robust to utf-8 in python 2.*

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2	Gijs
Solution 3	Bojan Bogdanovic
Solution 4	Aaron Digulla
Solution 5	pymen
Solution 6	jcomeau_ictx
Solution 7	vpathak

'How to write UTF-8 in a CSV file

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]

Solution 5:[5]

Solution 6:[6]

Solution 7:[7]