'predicting values using a dataframe and model.predict()

I have this simple dataframe:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

d = {'x': [7,8, 10,15], 'y': [15,17,20,24], 'z': [15,np.nan,20,np.nan]}
df = pd.DataFrame(data=d)
df

with which I set up this simple model:

df_m=df.dropna()
X = df_m.loc[:, df_m.columns != 'z']
y=df_m['z']

X_train, X_test, y_train, y_test = train_test_split(X, y)

LR=LinearRegression()
LR.fit(X_train,y_train)
LR.predict(X_test)

now I want to make a function which goes through the dataframe and replaces the Nan of column Z with the predicted value of the model:

def fill_z(df,LR):
    for i, row in df.iterrows():
        if pd.isnull(row['z']):
            print(row['x'],row['y'])
            df.at(i,'z') = LR.predict(row['x'],row['y'])

I get an error message:

  File "<ipython-input-243-7de7d76520a1>", line 24
    df.at(i,'z')=LR.predict(row['x'],row['y'])
    ^
SyntaxError: can't assign to function call


Solution 1:[1]

No need to iterate through each row to find the nan and then predict the value after each iteration.

You can feed in your features that you want to predict on. Then get those predictions and merge it back into the dataframe.

Then update your dataframe.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

d = {'x': [7,8, 10,15], 'y': [15,17,20,24], 'z': [15,np.nan,20,np.nan]}
df = pd.DataFrame(data=d)

print(df)


df_m=df.dropna()
X = df_m.loc[:, df_m.columns != 'z']
y=df_m['z']

X_train, X_test, y_train, y_test = train_test_split(X, y)

LR=LinearRegression()
LR.fit(X_train,y_train)


to_predict = df[df['z'].isna()]
print('Predict:')
print(to_predict)

print('\nGet Predictions:')
predictions = LR.predict(np.array(to_predict[['x','y']]))
print(predictions)

print('\nMerge it back:')
to_predict['z'] = predictions
print(to_predict)

print('\nUpdate df:')
df.update(to_predict)
print(df)

Output:

print(df)
    x   y     z
0   7  15  15.0
1   8  17   NaN
2  10  20  20.0
3  15  24   NaN

Predict:
    x   y   z
1   8  17 NaN
3  15  24 NaN

Get Predictions:
[15. 15.]

Merge it back:
    x   y     z
1   8  17  15.0
3  15  24  15.0

Update df:
      x     y     z
0   7.0  15.0  15.0
1   8.0  17.0  15.0
2  10.0  20.0  20.0
3  15.0  24.0  15.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1