'Out of memory error when converting dictionary to dataframe
I am building a big dictionary in Python where the key is a string and the value is a list of lists of strings. I then need to convert this dictionary into a dataframe (I cannot directly build the dataframe unfortunately, long story). Unfortunately, I keep running out of memory when I'm converting to dataframe. I am not running this job on my machine, but I am submitting remotely to a supercomputer with allocated memory. The error message I receive simply says Out Of Memory after sents = extract(araneum). Any ideas on how to avoid this? Is there a way to pickle this dictionary and then load it item by item (and remove previous item from RAM) so I can (maybe) write to csv line by line and only have one item at a time in my memory? Below is the code (without the implementation of the function creating the dictionary, as it is not relevant and cannot be modified, but I have included an example).
sents = extract(araneum) #sents is the dictionary created by extracting from a corpus
print(sents['I was eating when the phone rang'])
#[[was eating, rang], [2, 7], [imperf.past, perf.past], [eat, ring]]
sentences_dict = dict()
for i, (sent, features) in enumerate(sents.items()):
num_tags = len(features[0])
if num_tags < 18:
sentences_dict[i] = dict()
sentences_dict[i]["Sentence"] = sent
sentences_dict[i]["num_verbs"] = int(num_tags)
for j, (verb, pos, tag, lemma) in enumerate(features):
sentences_dict[i]["verbs_{}".format(j + 1)] = verb
sentences_dict[i]["verbs_lemmas_{}".format(j + 1)] = lemma
sentences_dict[i]["verbs_pos_{}".format(j + 1)] = pos
sentences_dict[i]["verbs_tags_{}".format(j + 1)] = tag
#{sentences[1] : {"verbs_1" : "was eating", "verbs_lemmas_1" : "eat", "verbs_pos_1" : 2, "verbs_tags_1": "imperf.past", "verbs_2" : "rang", "verbs_lemmas_2" : "ring", "verbs_pos_2" : 7, "verbs_tags_2": "perf.past"}
sentences_df = pd.DataFrame.from_dict(sentences_dict, orient="index")
with open(output, "w") as res_csv:
sentences_df.to_csv(res_csv, sep=",", index=False)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
