'How do I convert a unicode text to a text that python can read so that I could find that specific word in webscraping results?
I am trying to scrape text in instagram and check if I could find some keywords in the bio but the user use a special fonts, so I cannot identify the specific word, how can I remove the fonts or formot of a text such that I can search the word?
import re
test="𝙄𝙣𝙝𝙖𝙡𝙚 𝙩𝙝𝙚 𝙛𝙪𝙩𝙪𝙧𝙚 𝙩𝙝𝙚𝙣 𝙚𝙭𝙝𝙖𝙡𝙚 𝙩𝙝𝙚 𝙥𝙖𝙨𝙩. "
x = re.findall(re.compile('past'), test)
if x:
print("TEXT FOUND")
else:
print("TEXT NOT FOUND")
TEXT NOT FOUND
Another example:
import re
test="ғʀᴇᴇʟᴀɴᴄᴇ ɢʀᴀᴘʜɪᴄ ᴅᴇsɪɢɴᴇʀ"
test=test.lower()
x = re.findall(re.compile('graphic'), test)
if x:
print("TEXT FOUND")
else:
print("TEXT NOT FOUND")
TEXT NOT FOUND
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
