'How to download a large array of long text files into Python without spliting by lines?
I hope someone could help with this problem. I have a folder with text articles in text form (.txt) and I need to dowload them to python with columns name of a file - whole text from the file in one slot for futher work with them.
So I need smth like that
with open('data.txt', 'r') as file:
data = file.read().replace('\n', '')
But for 11 500 articles to be in 1 dataframe or another form
Solution 1:[1]
You can just read and add them as rows. An example could be like
import pathlib
from pandas import DataFrame
SRC_DIR = pathlib.Path("./test-files") # Use your dir!
data_set = DataFrame()
for file in SRC_DIR.iterdir():
data_set = data_set.append(dict(name=file.name, content=open(file, "r").read()), ignore_index=True)
print(data_set)
This generates a data frame like below
name content
0 a.txt Lorem ipsum dolor sit amet, consectetur adipis...
1 b.txt Lorem ipsum dolor sit amet, consectetur adipis...
2 c.txt Lorem ipsum dolor sit amet, consectetur adipis...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Kris |
