'TypeError: cannot use a string pattern on a bytes-like object python3

I have updated my project to Python 3.7 and Django 3.0

Here is code of models.py

def get_fields(self):
        
        fields = []
        
        html_text = self.html_file.read()
        self.html_file.seek(0)
        
        # for now just find singleline, multiline, img editable
        # may put repeater in there later (!!)
        for m in re.findall("(<(singleline|multiline|img editable)[^>]*>)", html_text):
            # m is ('<img editable="true" label="Image" class="w300" width="300" border="0">', 'img editable')
            # or similar
            # first is full tag, second is tag type
            # append as a list
            # MUST also save value in here
            data = {'tag':m[0], 'type':m[1], 'label':'', 'value':None}
            title_list = re.findall("label\s*=\s*\"([^\"]*)", m[0])
            if(len(title_list) == 1):
                data['label'] = title_list[0]
            # store the data
            fields.append(data)
        
        return fields

Here is my error traceback

 File "/home/harika/krishna test/dev-1.8/mcam/server/mcam/emails/models.py", line 91, in get_fields
    for m in re.findall("(<(singleline|multiline|img editable)[^>]*>)", html_text):
  File "/usr/lib/python3.7/re.py", line 225, in findall
    return _compile(pattern, flags).findall(string)
TypeError: cannot use a string pattern on a bytes-like object

How can I solve my issue?



Solution 1:[1]

The thing is that python3's read returns bytes (i.e. "raw" representation) and not string. You can convert between bytes and string if you specify encoding, i.e. how are characters converted to bytes:

>>> '?'.encode('utf8')
b'\xe2\x98\xba'

>>> '?'.encode('utf16')
b'\xff\xfe:&'

the b before string signifies that the value is not string but rather bytes. You can also supply raw bytes if you use that prefix:

>>> bytes_x = b'x'
>>> string_x = 'x'
>>> bytes_x == string_x
False
>>> bytes_x.decode('ascii') == string_x
True
>>> bytes_x == string_x.encode('ascii')
True

Note you can only use basic (ASCII) characters if you are using b prefix:

>>> b'?'
  File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.

So to fix your problem you need to either convert the input to a string with appropriate encoding:

html_text = self.html_file.read().decode('utf-8')  # or 'ascii' or something else

Or -- probably better option -- is to use bytes in the findalls instead of strings:

        for m in re.findall(b"(<(singleline|multiline|img editable)[^>]*>)", html_text):
...
            title_list = re.findall(b"label\s*=\s*\"([^\"]*)", m[0])

(note the b in front of each "string")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Drecker