'TypeError: cannot use a string pattern on a bytes-like object python3
I have updated my project to Python 3.7 and Django 3.0
Here is code of models.py
def get_fields(self):
fields = []
html_text = self.html_file.read()
self.html_file.seek(0)
# for now just find singleline, multiline, img editable
# may put repeater in there later (!!)
for m in re.findall("(<(singleline|multiline|img editable)[^>]*>)", html_text):
# m is ('<img editable="true" label="Image" class="w300" width="300" border="0">', 'img editable')
# or similar
# first is full tag, second is tag type
# append as a list
# MUST also save value in here
data = {'tag':m[0], 'type':m[1], 'label':'', 'value':None}
title_list = re.findall("label\s*=\s*\"([^\"]*)", m[0])
if(len(title_list) == 1):
data['label'] = title_list[0]
# store the data
fields.append(data)
return fields
Here is my error traceback
File "/home/harika/krishna test/dev-1.8/mcam/server/mcam/emails/models.py", line 91, in get_fields
for m in re.findall("(<(singleline|multiline|img editable)[^>]*>)", html_text):
File "/usr/lib/python3.7/re.py", line 225, in findall
return _compile(pattern, flags).findall(string)
TypeError: cannot use a string pattern on a bytes-like object
How can I solve my issue?
Solution 1:[1]
The thing is that python3's read returns bytes (i.e. "raw" representation) and not string. You can convert between bytes and string if you specify encoding, i.e. how are characters converted to bytes:
>>> '?'.encode('utf8')
b'\xe2\x98\xba'
>>> '?'.encode('utf16')
b'\xff\xfe:&'
the b before string signifies that the value is not string but rather bytes. You can also supply raw bytes if you use that prefix:
>>> bytes_x = b'x'
>>> string_x = 'x'
>>> bytes_x == string_x
False
>>> bytes_x.decode('ascii') == string_x
True
>>> bytes_x == string_x.encode('ascii')
True
Note you can only use basic (ASCII) characters if you are using b prefix:
>>> b'?'
File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.
So to fix your problem you need to either convert the input to a string with appropriate encoding:
html_text = self.html_file.read().decode('utf-8') # or 'ascii' or something else
Or -- probably better option -- is to use bytes in the findalls instead of strings:
for m in re.findall(b"(<(singleline|multiline|img editable)[^>]*>)", html_text):
...
title_list = re.findall(b"label\s*=\s*\"([^\"]*)", m[0])
(note the b in front of each "string")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Drecker |
