'Sentences are splitting into letters
I am creating a chatbot and i am new to NLP. I am trying to extract the Action and Sentence title from the csv file. The sentences are being split into letters.
Here is the code and a screenshot of the sentences being split into letters rather than being on a rows.
data = pd.read_csv('dataset.csv')
dataset = pd.DataFrame(columns=['Action', 'Sentence', 'Category'])
for index, item in data.iterrows():
intent = item.Action
for t, r in zip(item['Sentence'], item['Category']):
# print(t,r)
row = {'Action': intent, 'Sentence': t, 'Category':r}
dataset = dataset.append(row, ignore_index=True)
dataset
Any help is greatly appreciated please.
Solution 1:[1]
If I understand it correctly you want to have one sentence per row, right?
Here is your problem:
for index, item in data.iterrows():
You iterate over rows, so now each item is a Series with only this row, where the columns within this row are accessible by their column names.
for t, r in zip(item['Sentence'], item['Category']):
You take item['Sentence'] and item['Category'] which both are strings, put them in a zip and iterate over them, so that you get an iteration over the zipped letters within those strings.
How to solve it: Just remove the inner iteration.
for index, item in data.iterrows():
row = {'Action': item['Action'], 'Sentence': item['Sentence'], 'Category':item['Category']}
dataset = dataset.append(row, ignore_index=True)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ewz93 |
