'How to deal with "_csv.Error: line contains NULL byte"?

I am trying to fix an issue I'm having with null bytes in a CSV files.

The csv_file object is being passed in from a different function in my Flask application:

stream = codecs.iterdecode(csv_file.stream, "utf-8-sig", errors="strict")
dict_reader = csv.DictReader(stream, skipinitialspace=True, restkey="INVALID")


for row in dict_reader:  # Error is thrown here
    ...

The error thrown in the console is _csv.Error: line contains NULL byte.

So far, I have tried:

  • different encoding types (I checked the encoding type and it is utf-8-sig)
  • using .replace('\x00', '')

but I can't seem to get these null bytes to be removed.

I would like to remove the null bytes and replace them with empty strings, but I would also be okay with skipping over the row that contains the null bytes; I am unable to share my csv file.

EDIT: The solution I reached:

    content = csv_file.read()

    # Converting the above object into an in-memory byte stream
    csv_stream = io.BytesIO(content)

    # Iterating through the lines and replacing null bytes with empty 
    string
    fixed_lines = (line.replace(b'\x00', b'') for line in csv_stream)


    # Below remains unchanged, just passing in fixed_lines instead of csv_stream

    stream = codecs.iterdecode(fixed_lines, 'utf-8-sig', errors='strict')

    dict_reader = csv.DictReader(stream, skipinitialspace=True, restkey="INVALID")


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source