'csv reader opening files differently in Django and Apache
I need to parse a csv file inside my Django application. The csv file could have some non-ascii characters that I need to remove before processing. Here's what my code looks like
with open(inputFile, newline='') as f:
reader = csv.reader(f)
row1 = next(reader)
for element in row1:
columnHeader = element.encode("ascii","ignore").decode("ascii").strip()
It works perfectly fine in Django standalone. But I get
"'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)"
when I run it in production (Apache, mod_wsgi, Django). I have tried a slightly different formulation, but no luck.
columnHeader = element.encode("ascii","ignore").decode()
I am new to Apache, Django and Python - so kind of running out of ideas.
(Both environments are on the same machine - Ubuntu).
Update 1 (3 work hours later): I tried to check if somehow a different python or csv module was getting loaded within Apache compared to Django standalone. By printing values of (sys.version) and (csv.'_ version _'). Negative. Same version in both contexts.
I looked at the logs. The failure is actually a couple of lines earlier than I initially suspected:
row1 = next(reader)
Solution 1:[1]
It turns out that for some reason, the Apache + mod_wsgi environment was defaulting to opening files with a different encoding.
Explicitly adding the encoding parameter like this solved my problem.
with open(inputFile, newline='', encoding='utf-8') as f:
In my line of work, I realistically only expect utf-8 or ascii encoded csv files (the users who upload these files use Microsoft excel to generate them). The above solution would work for both encodings.
If anyone has a need to support other encodings, I think the topic gets complicated fairly quickly.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Dr Phil |
