'encoding / can't see letters
I'm opening links from array and scraping data from them which I put into array and then convert it into csv file. When i open that finished csv file in PyCharm IDE i see my language letters such as ą,ę,ė,į,š,ų,ū correctly but when i open that csv file from desktop my text is corrupted like this "< Grįžti atgal". I tried saving csv file with encoding UTF-8 but it didn't help.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import pandas as pd
import csv
import urllib.request
from bs4 import BeautifulSoup
import numpy as np
with open('neduplikuotas.csv', newline='') as csvfile:
data1 = list(csv.reader(csvfile))
data = [''.join(ele) for ele in data1]
i = 1
test = data[3550:]
array = []
for element in test:
array.append(i)
html = urllib.request.urlopen(element)
htmlParse = BeautifulSoup(html, 'html.parser')
for paragraph in htmlParse.find_all("p"):
array.append(paragraph.get_text())
i=i+1
df = pd.DataFrame(array)
df.to_csv('viskas5.csv')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|