'Pandas: lookup values in DataFrame, where source column has multiple members
I have a DF with data and a DF representing a database for querying and returning data. I can't use merge because some rows contain multiple lookups.
Data:
df_data = pd.DataFrame([[1000, 'Jerry', 'BR1001, BR1003, BR9009','',''],
[1001, 'Buck', 'BR1010, BR1011','',''],
[1002, 'Melanie', 'BR3009','','DPT2002'],
[1003, 'Perry','BR4009','',''],
[1004, 'Perry2','','DIST1000',''],
[1005, 'Eloise','','','DPT9009'],
[1005, 'Sharon','','','DPT9009']],
columns=['ID', 'Name', 'School Number','District Number','Dept. Number'])
Given the School Number, I need to be able to pull all associated District Numbers and Dept. Numbers. I'd like to just focus on pulling the District Numbers. The issue is how to iterate over members in a field where there is more than one.
Data to query:
df_DB = pd.DataFrame([['DIST1000', 'BR1001', 'DPT9009','Physics'],
['DIST1000', 'BR1003', 'DPT1010','Biology'],
['DIST1000', 'BR1003', 'DPT1011','Sociology'],
['DIST1000', 'BR1010', 'DPT1012','Philosophy'],
['DIST1000', 'BR1011', 'DPT1013','Pre-K'],
['DIST1000', 'BR1012', 'DPT1014','Geology'],
['DIST1001', 'BR9009', 'DPT2001', 'Math'],
['DIST1001', 'BR3009', 'DPT2002', 'Physics'],
['DIST1001', 'BR9009', 'DPT2003', 'Pre-K'],
['DIST1001', 'BR4009', 'DPT2004', 'Economics']],
columns=['District Number', 'School Number', 'Dept. Number','Name'])
Ex., Note the first record in the data above, Jerry. He has 3 School Numbers assigned to his record.
Desired output (Ex.):
1000, 'Jerry', 'BR1001, BR1003, BR9009','DIST1001, DIST1000','DPT9009, DPT1010, DPT1011, DPT2001, DPT2003'
Do I need a function for this? I think I can figure out Department if I can land the District Numbers.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
