'Keep Getting a UnicodeDecodeError When Trying to Read CSV with Pandas [duplicate]
I am trying to read a csv in python, and keep getting the below error. I tried other csv files that I worked with previously without issue on my other computer, and I get the same error message with those as well. I recently switched computers, but what is also bizarre is that yesterday I read a different csv saved in the same network location without any problems. I have no idea what is causing this but would like to be able to load my previous files if anyone has any ideas.
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
Input In [17], in <module>
1 import pandas as pd
----> 3 df = pd.read_csv(r"C:\Users\nabecker\OneDrive - McDermott Will & Emery LLP\Documents\Parent Data for Analysis.csv")
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
305 if len(args) > num_allow_args:
306 warnings.warn(
307 msg.format(arguments=arguments),
308 FutureWarning,
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:586, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
571 kwds_defaults = _refine_defaults_read(
572 dialect,
573 delimiter,
(...)
582 defaults={"delimiter": ","},
583 )
584 kwds.update(kwds_defaults)
--> 586 return _read(filepath_or_buffer, kwds)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:482, in _read(filepath_or_buffer, kwds)
479 _validate_names(kwds.get("names", None))
481 # Create the parser.
--> 482 parser = TextFileReader(filepath_or_buffer, **kwds)
484 if chunksize or iterator:
485 return parser
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:811, in TextFileReader.__init__(self, f, engine, **kwds)
808 if "has_index_names" in kwds:
809 self.options["has_index_names"] = kwds["has_index_names"]
--> 811 self._engine = self._make_engine(self.engine)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:1040, in TextFileReader._make_engine(self, engine)
1036 raise ValueError(
1037 f"Unknown engine: {engine} (valid options are {mapping.keys()})"
1038 )
1039 # error: Too many arguments for "ParserBase"
-> 1040 return mapping[engine](self.f, **self.options)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py:69, in CParserWrapper.__init__(self, src, **kwds)
67 kwds["dtype"] = ensure_dtype_objs(kwds.get("dtype", None))
68 try:
---> 69 self._reader = parsers.TextReader(self.handles.handle, **kwds)
70 except Exception:
71 self.handles.close()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:542, in pandas._libs.parsers.TextReader.__cinit__()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:642, in pandas._libs.parsers.TextReader._get_header()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:843, in pandas._libs.parsers.TextReader._tokenize_rows()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:1917, in pandas._libs.parsers.raise_parser_error()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 95538: invalid continuation byte
Solution 1:[1]
It seems that you stored your files on OneDrive.
Somethine the network drive change file encoding. For example, whenever I save my file in Dropbox on Window, I face this kind of issues; something get changed so I have to be care of using it on Mac.
There are several ways to deal with this kind of encoding issues:
# Way 1. use "ISO-8859-1" (or "latin-1") encoding when you open the file
f = open('../Resources/' + filename, 'r', encoding="ISO-8859-1")
# Way 2. ignore error when you open the file
f = open('u.item', encoding='utf8', errors='ignore')
Please note that the file are correctly opened and all the characters are clear when you successfully (without an exception) loaded the file.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Park |
