'deduplicating arrays in columns with mixed data types (Python)

I have a dataframe with the mixed column datatypes that contain strings, arrays, ints. All the arrays are dtype=object.

>>> test = pd.DataFrame({'id': ['a','b','c', 'd'],
            'state': ['Arizona', np.array(['Texas', 'Texas', 'Texas']), 'Texas', np.array(['Texas', 'California'])],
            'zip': [91239, 21939, np.array([12941,13511]), np.array([11111, 11111, 11111])]})
    
>>> test
      id                  state                    zip
    0  a                Arizona                  91239
    1  b  [Texas, Texas, Texas]                  21939
    2  c                  Texas         [12941, 13511]
    3  d    [Texas, California]  [11111, 11111, 11111]

My desired output is to deduplicate arrays wherever they exist and when there are more than one different items in an array, to replace it with a string that says 'Multiple'

desired_output
  id     state       zip
0  a   Arizona     91239
1  b     Texas     21939
2  c     Texas  Multiple
3  d  Multiple     11111

I've tried to follow weird logic where I first create temp columns that count the number of unique items within a column, or that check if all() items in an array match the first indexed item, but these are all breaking. Thanks for any help!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'deduplicating arrays in columns with mixed data types (Python)

Sources

Related Questions