'Pandas - combine series with unique values, matching across rows
I'll start by dropping in my code and then explain what I'm trying to accomplish:
names = [
'ABX-B767-200BDSF (767-3A)',
'ABX-B767-200BDSF (DAR 767-3A)',
'ABX-B767-200BDSF (DAR 767-4)',
]
i1 = pd.read_csv(f'{path}/{files[0]}', skiprows=4, nrows=1)
i2 = pd.read_csv(f'{path}/{files[1]}', skiprows=4, nrows=1)
i3 = pd.read_csv(f'{path}/{files[2]}', skiprows=4, nrows=1)
df_list = [i1, i2, i3]
cl1 = pd.Series(df_list[0].columns.values, name=names[0])
cl2 = pd.Series(df_list[1].columns.values, name=names[1])
cl3 = pd.Series(df_list[2].columns.values, name=names[2])
col_list = [cl1, cl2, cl3]
final_frame = pd.concat([df for df in col_list], axis=1)
final_frame.fillna('', inplace=True)
Yes I know this is inefficient and long, please work with me.
My actual data has more than these 3 series, I've got 11 but for the sake of simplicity I'm only showing the code as if I had three.
Right now, my output looks like this:
ABX-B767-200BDSF (767-3A) ABX-B767-200BDSF (DAR 767-3A) ABX-B767-200BDSF (DAR 767-4)
0 Offset Offset Offset
1 A/C NUMBER A/C NUMBER 10 MI RANGE SEL CAPT
2 A/C TYPE A/C TYPE 10 MI RANGE SEL F/O
3 A/P CAUTION FCC-C A/P CAUTION FCC-C 160 MI RANGE SEL CAPT
4 A/P CAUTION FCC-L A/P CAUTION FCC-L 160 MI RANGE SEL F/O

and I need it to look like this:
ABX-B767-200BDSF (767-3A) ABX-B767-200BDSF (DAR 767-3A) ABX-B767-200BDSF (DAR 767-4)
0 OFFSET OFFSET OFFSET
1 10 MI RANGE SEL CAPT
2 10 MI RANGE SEL F/O
3 160 MI RANGE SEL CAPT
…
13 A/C NUMBER A/C NUMBER
14 A/C TYPE A/C TYPE

Where values are matched across the the rows, and rows that don't have a match are filled with white space. I've tried using a merge with wide variance of arguments and I've simply had no luck. Any ideas how to get my desired result here?
As requested, .head(10)s of my cl1, 2, 3:
cl1.head(10).to_dict()
{0: 'Offset',
1: 'A/C NUMBER',
2: 'A/C TYPE',
3: 'A/P CAUTION FCC-C',
4: 'A/P CAUTION FCC-L',
5: 'A/P CAUTION FCC-R',
6: 'A/P CMD C ENGA FCC-C',
7: 'A/P CMD C ENGA FCC-L',
8: 'A/P CMD C ENGA FCC-R',
9: 'A/P CMD C ENGA MCP'}
cl2.head(10).to_dict()
{0: 'Offset',
1: 'A/C NUMBER',
2: 'A/C TYPE',
3: 'A/P CAUTION FCC-C',
4: 'A/P CAUTION FCC-L',
5: 'A/P CAUTION FCC-R',
6: 'A/P CMD C ENGA FCC-C',
7: 'A/P CMD C ENGA FCC-L',
8: 'A/P CMD C ENGA FCC-R',
9: 'A/P CMD C ENGA MCP'}
cl3.head(10).to_dict()
{0: 'Offset',
1: '10 MI RANGE SEL CAPT',
2: '10 MI RANGE SEL F/O',
3: '160 MI RANGE SEL CAPT',
4: '160 MI RANGE SEL F/O',
5: '20 MI RANGE SEL CAPT',
6: '20 MI RANGE SEL F/O',
7: '40 MI RANGE SEL CAPT',
8: '40 MI RANGE SEL F/O',
9: '5 MI RANGE SEL CAPT'}
Solution 1:[1]
- Create a master list that consists of all the value from your series.
reindexeach series to the master listconcatthe reindexed series
srs_list = [cl1, cl2, cl3]
master = pd.concat(srs_list).drop_duplicates()
parsed_list = [srs.to_frame().set_index(srs).reindex(master) for srs in srs_list]
output = pd.concat(parsed_list,axis=1).reset_index(drop=True)
>>> output
ABX-B767-200BDSF (767-3A) ABX-B767-200BDSF (DAR 767-3A) \
0 Offset Offset
1 A/C NUMBER A/C NUMBER
2 A/C TYPE A/C TYPE
3 A/P CAUTION FCC-C A/P CAUTION FCC-C
4 A/P CAUTION FCC-L A/P CAUTION FCC-L
5 A/P CAUTION FCC-R A/P CAUTION FCC-R
6 A/P CMD C ENGA FCC-C A/P CMD C ENGA FCC-C
7 A/P CMD C ENGA FCC-L A/P CMD C ENGA FCC-L
8 A/P CMD C ENGA FCC-R A/P CMD C ENGA FCC-R
9 A/P CMD C ENGA MCP A/P CMD C ENGA MCP
10 NaN NaN
11 NaN NaN
12 NaN NaN
13 NaN NaN
14 NaN NaN
15 NaN NaN
16 NaN NaN
17 NaN NaN
18 NaN NaN
ABX-B767-200BDSF (DAR 767-4)
0 Offset
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 10 MI RANGE SEL CAPT
11 10 MI RANGE SEL F/O
12 160 MI RANGE SEL CAPT
13 160 MI RANGE SEL F/O
14 20 MI RANGE SEL CAPT
15 20 MI RANGE SEL F/O
16 40 MI RANGE SEL CAPT
17 40 MI RANGE SEL F/O
18 5 MI RANGE SEL CAPT
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | not_speshal |
