'Working with csv files in python - sort and fill values
I would like to populate the missing info: MissingAuthor and MissingLanguage based on BookID - if the BookID is the same copy and fill in the info from where you find it.
input CSV file:
input.csv
BookName,BookID,BookLocation,BookAuthor,BookLanguage
BookA,19821,C216,Guido van Rossum,English
BookB,19821,C216,AuthorMissing,English
BookC,15351,C126,Eric Matthes,LanguageMissing
BookD,15551,C446,AuthorMissing,English
BookE,15351,C126,AuthorMissing,English
BookF,15551,C446,Al Sweigart,LanguageMissing
BookG,21221,C556,AuthorMissing,English
BookH,21221,C556,Mark Lutz,English
BookI,14421,C656,Eric Matthes,English
BookJ,14421,C656,Eric Matthes,LanguageMissing
BookK,55521,C776,AuthorMissing,English
BookL,55521,C776,Mark Lutz,LanguageMissing
BookM,16721,C886,Al Sweigart,English
BookN,16721,C886,AuthorMissing,English
BookO,16721,C996,Al Sweigart,LanguageMissing
As an example:
input:
BookA,19821,C216,Guido van Rossum,English
BookB,19821,C216,AuthorMissing,English
output:
BookA,19821,C216,Guido van Rossum,English
BookB,19821,C216,Guido van Rossum,English <--- AuthorMissing
Code:
import csv
with open('input.csv', 'r') as incsv:
reader = csv.DictReader(incsv, delimiter=',')
header = next(reader)
sortedbyid = sorted(reader, key=lambda row: (row['BookID']), reverse=False) # sort by BookID
for line in sortedbyid:
print(line)
{'BookName': 'BookI', 'BookID': '14421', 'BookLocation': 'C656', 'BookAuthor': 'Eric Matthes', 'BookLanguage': 'English'}
{'BookName': 'BookJ', 'BookID': '14421', 'BookLocation': 'C656', 'BookAuthor': 'Eric Matthes', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookC', 'BookID': '15351', 'BookLocation': 'C126', 'BookAuthor': 'Eric Matthes', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookE', 'BookID': '15351', 'BookLocation': 'C126', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookD', 'BookID': '15551', 'BookLocation': 'C446', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookF', 'BookID': '15551', 'BookLocation': 'C446', 'BookAuthor': 'Al Sweigart', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookM', 'BookID': '16721', 'BookLocation': 'C886', 'BookAuthor': 'Al Sweigart', 'BookLanguage': 'English'}
{'BookName': 'BookN', 'BookID': '16721', 'BookLocation': 'C886', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookO', 'BookID': '16721', 'BookLocation': 'C996', 'BookAuthor': 'Al Sweigart', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookB', 'BookID': '19821', 'BookLocation': 'C216', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookG', 'BookID': '21221', 'BookLocation': 'C556', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookH', 'BookID': '21221', 'BookLocation': 'C556', 'BookAuthor': 'Mark Lutz', 'BookLanguage': 'English'}
{'BookName': 'BookK', 'BookID': '55521', 'BookLocation': 'C776', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookL', 'BookID': '55521', 'BookLocation': 'C776', 'BookAuthor': 'Mark Lutz', 'BookLanguage': 'LanguageMissing'}
I need to loop here, match the BookID, store the info if present, and fill it in somehow.
# create a new file
with open('output.csv', 'w') as outcsv:
fieldnames = header
writer = csv.DictWriter(outcsv, fieldnames=fieldnames, delimiter=',', lineterminator='\n')
writer.writeheader()
for row in sortedbyid:
write.writerow(row)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
