'How to randomly create null values in pandas dataframe and store original values in the replaced cells?

Lets say I have a dataframe df:

A B C
1 2 6
3 4 5
5 6 2
2 3 3

and I want to create random null values say 25% per column something like this:

A B C
1 null null
null 4 5
5 null 2
null 3 null

Now i want to save original values of these null values may be as an array or dict?

so that I have original values of only replaced nulls.

Original ={'A2': 3,'A4': 2, 'B1':2, 'B3':6}

Replaced ={'A2': null,'A4': null, 'B1':null, 'B3':null}

Ideally I want to have original values for the replaced cells in an array.



Solution 1:[1]

Starting data: df

   A  B  C
0  1  2  6
1  3  4  5
2  5  6  2
3  2  3  3

Get a random sample of indices/columns: df_nulls = df.apply(lambda x: x.sample(frac=0.25))

     A    B    C
0  NaN  2.0  NaN
2  5.0  NaN  NaN
3  NaN  NaN  3.0

Note: NaN represent unselected values

Using numpy you can get the coordinates for non-null values: rows, cols = np.where(df_nulls.notnull())

print(rows)
# [0 1 2]
print(cols)
# [1 0 2]

Build your Original dict: Original = {f"{df.columns[c]}{r}": df.iloc[r, c] for r, c in zip(rows, cols)}

print(Original)
# {'B0': 2, 'A1': 3, 'C2': 2}

Build your Replaced dict: Replaced = {k: float("nan") for k in Original}

print(Replaced)
# {'B0': nan, 'A1': nan, 'C2': nan}

Complete the replacement:

for r, c in zip(rows, cols):
    df.iloc[r, c] = float("nan")

print(df)
     A    B    C
0  1.0  NaN  6.0
1  NaN  4.0  5.0
2  5.0  6.0  NaN
3  2.0  3.0  3.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1