'Issues with caching a dataframe in Streamlit
I am loading a pandas dataframe as .csv. I am using a @st.cache decorator to cache this dataframe. I want to predict a classification by using a predefined classification model (RandomForest, XGBoost).
Essentially a column will be added to the original dataframe and stored in a new variable.
However, I am having issues caching this new dataframe.
import pandas as pd
import numpy as np
from xgboost import XGBClassifier
import streamlit as st
def main():
st.set_page_config(layout="wide")
st.title('Classification Problem on Home Equity dataset')
if __name__ == '__main__':
main()
#Load prediction data
@st.cache
def load_predict():
data= pd.read_csv("hmeq_Predict_2.csv") #Currently on my local machine
return data
df_predict = load_predict()
# Predict on data
@st.cache
def predictor_func():
y_pred_nd = pd.Series(model.predict(df_predict),name='BAD')
Predicted_X = pd.concat([df_predict,y_pred_nd],axis=1)
#This is the Dataframe that I want cache
return Predicted_X
#Run XGBoost classification , I have loaded X_train and y_train also, not shown in this example
if classifier == "XGBoost":
if st.sidebar.button("Run Classification", key="Classification"):
model = XGBClassifier()
model.fit(X_train,y_train)
#I want this function to return the cached dataframe.
Predicted_X=predictor_func()
# This command will correctly display the Dataframe, meaning that the predictor_func() ran correctly
st.write(Predicted_X)
#However, when I want to display the dataframe, Predicted_X, only when I click this button
if st.sidebar.button("Run Prediction on new Data", key="Prediction"):
st.subheader('Check last column for prediction. ')
st.write(Predicted_X)
This is the error I get:
NameError: name 'Predicted_X' is not defined
Traceback: File "C:\Users\vchaubal\Anaconda3\envs\Jupyter_Project_2\lib\site-packages\streamlit\script_runner.py", line 379, in _run_script
exec(code, module.__dict__) File "C:\Users\vchaubal\Downloads\Streamlit_project.py", line 328, in <module>
st.write(Predicted_X)
Am I missing a key concept here?
Also, is there a way to cache a model from sklearn?
Solution 1:[1]
NameError: name 'Predicted_X' is not defined means that you are calling a variable Predicted_X that has not been instantiated (meaning, there is a Predicted_X = .... missing before.
In your code, at
if st.sidebar.button("Run Prediction on new Data", key="Prediction"):
st.subheader('Check last column for prediction. ')
st.write(Predicted_X)
there is no garanty that Predicted_X = ... from the previous lines have been executed.
Your code should look like this:
Predicted_X = None # Instantiate Predicted_X
if classifier == "XGBoost":
if st.sidebar.button("Run Classification", key="Classification"):
model = XGBClassifier()
model.fit(X_train, y_train)
Predicted_X = predictor_func()
st.write(Predicted_X)
if st.sidebar.button("Run Prediction on new Data", key="Prediction"):
st.subheader('Check last column for prediction. ')
# Show Predicted_X only if it has been computed
if Predicted_X is None:
st.write("Predicted_X has not been yet computed")
else:
st.write(Predicted_X)
As for your other question
Also, is there a way to cache a model from
sklearn?
There is a way:
@st.cache()
def load_xgboost_model():
model = XGBClassifier()
model.fit(X_train, y_train)
return model
@st.cache():
def load_sklearn_model(path_to_sklearn_model):
import pickle
model = pickle.load(open(path_to_sklearn_model, "rb"))
return model
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | vinzee |
