'Vectoring text data of dictionaries' values from pickle file

I'm new to NLP and trying to learn it by myself and I am doing classification.

I have a pickle file with some data like this,

{'food' : {'f1.txt', 'f2.txt', 'f3.txt', 'f4.txt'}, 'sports' : {'s1.txt', 's2.txt', 's3.txt', 's4.txt'}, 'politics' : {'p1.txt', 'p2.txt', 'p3.txt', 'p4.txt'}}

I need to extract these text file, divide them into training and testing data, vectorize the text data, classify it.

I have loaded the pickle file, here is the code

import pickle

topic_file = 'topic_dict.pickle'

with open(topic_file , 'rb') as x:
    topic_folders = pickle.load(x)

I am pretty stuck here, how can I divide them into train-test and vectorize it, I have no experience with this so any kind of help is appreciated.

Thanks in advance.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source