'Creating a new dataframe column based on operations applied to nested arrays in another column?

Let me start off by saying this unfortunately cannot be solved by doing something as simple as df[A] = df[B] - df[C].

I have a column containing arrays (let's call it df[A]). I want to z-score the items in each array (with respect to only the values in that array), then store this new array of z-scored values in the corresponding row of a new column.

To hopefully make it a bit clearer, each entry in df[A] looks like [[1, 2, 3, ..., 4170945]] and is of length 4170945. (The nesting is due to how the arrays are loaded into the dataframe, and not important.) I have 69 rows of such entries (example image below).

I then want each row of df['zscores'] to contain a corresponding array of (row[A][0] - row[A][0].mean()) / row[A][0].std().

enter image description here

I have tried the following:

1.

df['zscores'] = (df['A'] - df['A'].mean()) / df['A'].std()

This gives the following error:

ValueError: operands could not be broadcast together with shapes (69,) (1,4170945) 

My suspicion is that it's returning a single series where the first item of each row of df[A] is z-scored, then the second, etc., essentially iterating item-wise through each row.

2.

for idx, row in df.iterrows():
    if idx == 1:
        _series = pd.Series((row['A'][0] - row['A'][0].mean()) / row['A'][0].std())
    else:
        _ = pd.Series((row['A'][0] - row['A'][0].mean()) / row['A'][0].std())
        _series.append(_)

My aim was to extract each array, operate on it, and append it to a series. I then wanted to something like df['zscores'] = _series.

My ideal result looks like this:

    A                                               zscores
0   [[43.7916, 10.7261, 30.9748, ...    [[2.5077,  2.1846,  2.2108, ...
1   [[53.8916, 16.7261, 3.5668, ...     [[1.0177,  5.1846,  0.2108, ...
...


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source