'How to download a large array of long text files into Python without spliting by lines?

I hope someone could help with this problem. I have a folder with text articles in text form (.txt) and I need to dowload them to python with columns name of a file - whole text from the file in one slot for futher work with them.

So I need smth like that

    with open('data.txt', 'r') as file:
        data = file.read().replace('\n', '')

But for 11 500 articles to be in 1 dataframe or another form



Solution 1:[1]

You can just read and add them as rows. An example could be like

import pathlib
from pandas import DataFrame
SRC_DIR = pathlib.Path("./test-files") # Use your dir!
data_set = DataFrame()
for file in SRC_DIR.iterdir():
    data_set = data_set.append(dict(name=file.name, content=open(file, "r").read()), ignore_index=True)

print(data_set)

This generates a data frame like below

    name                                            content
0  a.txt  Lorem ipsum dolor sit amet, consectetur adipis...
1  b.txt  Lorem ipsum dolor sit amet, consectetur adipis...
2  c.txt  Lorem ipsum dolor sit amet, consectetur adipis...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kris