'Keep Getting a UnicodeDecodeError When Trying to Read CSV with Pandas [duplicate]

I am trying to read a csv in python, and keep getting the below error. I tried other csv files that I worked with previously without issue on my other computer, and I get the same error message with those as well. I recently switched computers, but what is also bizarre is that yesterday I read a different csv saved in the same network location without any problems. I have no idea what is causing this but would like to be able to load my previous files if anyone has any ideas.

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Input In [17], in <module>
      1 import pandas as pd
----> 3 df = pd.read_csv(r"C:\Users\nabecker\OneDrive - McDermott Will & Emery LLP\Documents\Parent Data for Analysis.csv")

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:586, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    571 kwds_defaults = _refine_defaults_read(
    572     dialect,
    573     delimiter,
   (...)
    582     defaults={"delimiter": ","},
    583 )
    584 kwds.update(kwds_defaults)
--> 586 return _read(filepath_or_buffer, kwds)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:482, in _read(filepath_or_buffer, kwds)
    479 _validate_names(kwds.get("names", None))
    481 # Create the parser.
--> 482 parser = TextFileReader(filepath_or_buffer, **kwds)
    484 if chunksize or iterator:
    485     return parser

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:811, in TextFileReader.__init__(self, f, engine, **kwds)
    808 if "has_index_names" in kwds:
    809     self.options["has_index_names"] = kwds["has_index_names"]
--> 811 self._engine = self._make_engine(self.engine)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:1040, in TextFileReader._make_engine(self, engine)
   1036     raise ValueError(
   1037         f"Unknown engine: {engine} (valid options are {mapping.keys()})"
   1038     )
   1039 # error: Too many arguments for "ParserBase"
-> 1040 return mapping[engine](self.f, **self.options)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py:69, in CParserWrapper.__init__(self, src, **kwds)
     67 kwds["dtype"] = ensure_dtype_objs(kwds.get("dtype", None))
     68 try:
---> 69     self._reader = parsers.TextReader(self.handles.handle, **kwds)
     70 except Exception:
     71     self.handles.close()

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:542, in pandas._libs.parsers.TextReader.__cinit__()

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:642, in pandas._libs.parsers.TextReader._get_header()

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:843, in pandas._libs.parsers.TextReader._tokenize_rows()

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\_libs\parsers.pyx:1917, in pandas._libs.parsers.raise_parser_error()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 95538: invalid continuation byte


Solution 1:[1]

It seems that you stored your files on OneDrive.

Somethine the network drive change file encoding. For example, whenever I save my file in Dropbox on Window, I face this kind of issues; something get changed so I have to be care of using it on Mac.

There are several ways to deal with this kind of encoding issues:

# Way 1. use "ISO-8859-1" (or "latin-1") encoding when you open the file
f = open('../Resources/' + filename, 'r', encoding="ISO-8859-1")
# Way 2. ignore error when you open the file
f = open('u.item', encoding='utf8', errors='ignore')

Please note that the file are correctly opened and all the characters are clear when you successfully (without an exception) loaded the file.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Park