'Assign many values to one key value - Python For Loop
I am practicing with a dataset with customers. Each customer has a first name, last name, city, age, gender and invoice number.
I want to create a dictionary with the customers first and last name as the key value and append the rest of the information to the key value. There can be many invoices per customer, so that customer should only be counted once and have many invoice numbers.
City FirstName LastName Gender Age InvoiceNum
NYC Jane Doe Female 35 1023
NYC Jane Doe Female 35 6523
Jersey City John Smith Male 54 6985
Houston Kay Johnson Female 45 2357
To do so, I want to create a for loop.
class Customers:
city = ""
age = 0
invoices = []
f = open("customers".csv)
import csv
reader = csv.reader (f)
next(reader)
customers = {}
for row in reader:
This is where I am stuck. For every row in reader, I want to check if the customer already exists. If it exists, I want to add the repeating invoice numbers. If it does not exist, this will be a new customer where I will have to append the other values (city, gender, age, single invoice number).
Desired Output:
There are 3 customers. 2 are female, 1 is male. their average age is xxxx.
The count of customers does not repeat Jane Doe. the count of female does not repeat for Jane Doe. The average age will not sum Jane Doe's age twice.
Solution 1:[1]
I came up with this:
from collections import defaultdict
from dataclasses import dataclass, field
from typing import List
@dataclass
class Customer:
first_name: str = ''
last_name: str = ''
city: str = ''
age: int = 0
invoices: List = field(init=False, default_factory=list)
def process_entry(self, **row):
self.first_name = row['FirstName']
self.last_name = row['LastName']
self.city = row['City']
self.age = row['Age']
self.invoices.append(row['InvoiceNum'])
fake_reader = [
{
'FirstName': 'John',
'LastName': 'Doe',
'City': 'New York',
'Age': 30,
'InvoiceNum': 1
},
{
'FirstName': 'John',
'LastName': 'Doe',
'City': 'New York',
'Age': 30,
'InvoiceNum': 2
},
{
'FirstName': 'Clark',
'LastName': 'Kent',
'City': 'Metropolis',
'Age': 35,
'InvoiceNum': 3
}
]
customers = defaultdict(Customer)
for row in fake_reader:
customers[(row['FirstName'], row['LastName'])].process_entry(**row)
print(customers)
Output:
defaultdict(<class '__main__.Customer'>, {('John', 'Doe'): Customer(first_name='John', last_name='Doe', city='New York', age=30, invoices=[1, 2]), ('Clark', 'Kent'): Customer(first_name='Clark', last_name='Kent', city='Metropolis', age=35, invoices=[3])})
The "trick" here is to define the Customer class with default values, this way the real values can get filled using the process_entry method.
Solution 2:[2]
I think you're looking for something of the sort:
if name not in customers:
customers[name] = [invoice]
else:
customers[name].append(invoice)
This creates a key-value pair, with the value as an array which can then be appended to every time the for loop finds a new invoice for that name.
Edit: update to match your csv file
customers = {}
# [1:] to ignore file header
for row in reader[1:]:
City, FirstName, LastName, Gender, Age, InvoiceNum = row.split().strip()
newEntry = {'InvoiceNum': int(InvoiceNum), 'City': City, 'Gender': Gender, 'Age': int(Age)}
if (FirstName, LastName) not in customers:
customers[(FirstName, LastName)] = [newEntry]
else:
customers[(FirstName, LastName)].append(newEntry)
Immutable types can be dictionary keys, so I choose a tuple of the first and last name.
Edit: I'm hoping my answer takes you in the right direction, I left the 'csv' details to you, as your row may not correspond to what I did there.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | DevLounge |
| Solution 2 | marc_s |
