'How to parse a csv with python, when one column has multiple lines

I have a csv file that is "name, place, thing". the thing column often has "word\nanotherword\nanotherword\n" I'm trying to figure out how to parse this out into individual lines instead of multiline entries in a single column. i.e.

name, place, word

name, place, anotherword

name, place , anotherword

I'm certain this is simple, but im having a hard time grasping what i need to do.



Solution 1:[1]

Without going into the code, essentially what you want to do is check to see if there are any newline characters in your 'thing'. If there are, you need to split them on the newline characters. This will give you a list of tokens (the lines in the 'thing') and since this is essentially an inner loop, you can use the original name and place along with your new thing_token. A generator function lends itself well to this.

This is brings me to kroolik's answer. However, there's a slight error in kroolik's answer:

If you want to go with the column_wrapper generator, you will need to account for the fact that the csv reader escapes backslash in the newlines, so they look like \\n instead of \n. Also, you need to check for blank 'things'.

def column_wrapper(reader):
    for name, place, thing in reader:
        for split_thing in thing.strip().split('\\n'):
            if split_thing:
                yield name, place, split_thing

Then you can obtain the data like this:

with open('filewithdata.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    data = [[data, name, thing] for data, name, thing in column_wrapper(reader)]

OR (without column_wrapper):

data = []
with open('filewithdata.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        name, place, thing = tuple(row)
        if '\\n' in thing:
            for item in thing.split('\\n'):
                if item != '\n':
                    data.append([name, place, item)]

I recommend using column_wrapper as generators are more generic and pythonic.

Be sure to add import csv to the top of your file (although I'm sure you knew that already). Hope that helps!

Solution 2:[2]

Wrap your csv reader with this column_wrapper:

def column_wrapper(reader):
    for name, place, thing in reader:
        for split_thing in thing.strip().split('\n'):
            yield name, place, split_thing

And you will be golden.

Solution 3:[3]

You could always the file read line by line

#! /usr/bin/env python2.7.2
file = open("demo.csv", "r+");
for line in file:
   line =  line.replace(",", " ")
   words = line.split()
   print(words[0])
   print(words[1])
   print(words[2])   
file.close()

Assuming the file content is

name1,place1,word1
name2,place2,anotherword2
name3,place3,anotherword3

Solution 4:[4]

If someone runs into this with the same issue I had. If you have multiline strings in on of your cells, use the quotechar field as specified in this answer:

how to read a csv file that has multiple lines within the same cell?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jcomo
Solution 2 Maciej Gol
Solution 3 ELavicount
Solution 4 Makogan