'Sentences are splitting into letters

I am creating a chatbot and i am new to NLP. I am trying to extract the Action and Sentence title from the csv file. The sentences are being split into letters.

Here is the code and a screenshot of the sentences being split into letters rather than being on a rows.

data = pd.read_csv('dataset.csv')

dataset = pd.DataFrame(columns=['Action', 'Sentence', 'Category'])
for index, item in data.iterrows():
    intent = item.Action
    for t, r in zip(item['Sentence'], item['Category']):
        # print(t,r)
        row = {'Action': intent, 'Sentence': t, 'Category':r}
        dataset = dataset.append(row, ignore_index=True)
dataset

enter image description here

Any help is greatly appreciated please.



Solution 1:[1]

If I understand it correctly you want to have one sentence per row, right?

Here is your problem:

for index, item in data.iterrows():

You iterate over rows, so now each item is a Series with only this row, where the columns within this row are accessible by their column names.

    for t, r in zip(item['Sentence'], item['Category']):

You take item['Sentence'] and item['Category'] which both are strings, put them in a zip and iterate over them, so that you get an iteration over the zipped letters within those strings.

How to solve it: Just remove the inner iteration.

for index, item in data.iterrows():
    row = {'Action': item['Action'], 'Sentence': item['Sentence'], 'Category':item['Category']}
    dataset = dataset.append(row, ignore_index=True)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ewz93