'Handle missing data when flattening nested array field in pandas dataframe

We need to flatten this into a standard 2D DataFrame:

arr = [
    [{ 'id': 3, 'abbr': 'ORL', 'record': { 'win': 3, 'loss': 7 }}, 
     { 'id': 5, 'abbr': 'ATL', 'record': { 'win': 3, 'loss': 7 }}],
    [{ 'id': 7, 'abbr': 'NYK', 'record': { 'win': 3, 'loss': 7 }}, 
     { 'id': 9, 'abbr': 'BOS', 'record': { 'win': 3, 'loss': 7 }}]
]
pd.DataFrame(data = {'name': ['nick', 'tom'], 'arr': arr })

Here's our code, which is working just fine for this dummy example

for i in range(len(mydf)):
    output_list = []
    for i in range(len(mydf)):
        team1 = mydf['arr'][i][0]
        team2 = mydf['arr'][i][1]
        zed = { 't1': team1['abbr'], 't2': team2['abbr'] }
        output_list.append(zed)

    output_df = pd.DataFrame(output_list)
    
final_df = pd.concat([mydf, output_df], axis=1)
final_df.pop('arr')
final_df

    name    t1  t2
0   nick    ORL ATL
1   tom     NYK BOS

Our source of data is not reliable and ma have missing values, and our code seems fraught with structural weaknesses. In particular, errors are thrown when either of these are the raw data (missing field, missing dict):

# missing dict
arr = [
    [{ 'id': 3, 'abbr': 'ORL', 'record': { 'win': 3, 'loss': 7 }}],
    [{ 'id': 7, 'abbr': 'NYK', 'record': { 'win': 3, 'loss': 7 }}, 
     { 'id': 9, 'abbr': 'BOS', 'record': { 'win': 3, 'loss': 7 }}]
]
mydf = pd.DataFrame(data = {'name': ['nick', 'tom'], 'arr': arr })

# missing "abbr" field
arr = [
    [{ 'id': 3, 'record': { 'win': 3, 'loss': 7 }}, 
     { 'id': 5, 'abbr': 'ATL', 'record': { 'win': 3, 'loss': 7 }}],
    [{ 'id': 7, 'abbr': 'NYK', 'record': { 'win': 3, 'loss': 7 }}, 
     { 'id': 9, 'abbr': 'BOS', 'record': { 'win': 3, 'loss': 7 }}]
]
mydf = pd.DataFrame(data = {'name': ['nick', 'tom'], 'arr': arr })

Is it possible to (a) replace the for-loop with a more structurally sound approach (apply), and (b) handle the missing data concerns?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source