'Docker - how to use a saved file created in the container

Objective: train a machine learning model in a .py (train_model.py) file, save the model to a .joblib file (Inference_xgb.joblib), load the model into another .py (Inference.py) file, use the model to make predictions and save the output.

Issue: Inference.py cannot find the Inference_xgb.joblib file.

Relevant code snippets:

Training (train_model.py):

#!/usr/bin/python3

import pandas as pd
from xgboost import XGBClassifier
from joblib import dump

def train():
    # load in and read training data
    training = './train.csv'
    data_train = pd.read_csv(training)
    label = data_train['2020 Failure'] # what we want to predict
    features = data_train.drop(['2020 Failure', 'FACILITYID'], axis =1, inplace=False) # what we train on the model to learn
    features = features.drop('Unnamed: 0', axis=1)
    x_train = features
    y_train = label

    # XGBoost model training
    xgb_model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
    xgb_model.fit(x_train, y_train)
    # save model
    dump(xgb_model, 'Inference_xgb.joblib')

if __name__== '__main__':
    train()

Testing (Inference.py):

#!/usr/bin/python3

import pandas as pd
from joblib import load
from sklearn.metrics import confusion_matrix
import os

def inference():
    # load and read in test data
    testing = './test.csv'
    data_test = pd.read_csv(testing)

    label = data_test['2020 Failure'] # what we want to predict
    features = data_test.drop(['2020 Failure', 'FACILITYID'], axis =1 ) # what we train on the model to learn
    features = features.drop('Unnamed: 0', axis=1)
    IDS = data_test['FACILITYID']
    x_test = features
    y_test = label

    # run model
    xgb_model = load('Inference_xgb.joblib')
    y_label = xgb_model.predict(x_test)
    cm = confusion_matrix(y_test,y_label)
    print("Confusion Matrix: ")
    print(cm)

    # write results
    dirpath = os.getcwd()
    print('CURRENT PATH: ', dirpath)
    output_path = os.path.join(dirpath, 'output.csv')
    output_df = pd.DataFrame(y_label, columns=['Prediction'])
    output_df.insert(0, "FACILITYID", IDS.values)
    output_df.to_csv(output_path)
    print('OUTPUT DF')
    print(output_df)

if __name__ == "__main__":
    inference()

Dockerfile:

FROM jupyter/scipy-notebook 

RUN pip install joblib
RUN pip install xgboost==1.5.0

USER root

WORKDIR /scaleable-model

COPY train.csv ./train.csv
COPY test.csv ./test.csv

COPY train_model.py ./train_model.py
COPY inference.py ./inference.py

RUN python3 train_model.py

Comments, observations, and what I've tried:

I've noticed that removing WORKDIR /scaleable-model fixes the issue, but I want to keep the WORKDIR to /scaleable-model so I can mount the .csv output to my host machine.

I am running docker build in the scaleable-model directory on my host machine. That is, I cd to /home/user/pathto/scaleable-model and run docker build -t scaleable-model -f Dockerfile .

I then call docker run and specify I want to call Inference.py, this is how the error is generated.

I've tried hardcoded paths as well but this did not help. I also created a Inference_xgb.joblib on my host machine in the same directory where I am building the container, but this did nothing either.

I suspect that either:

  • the Inference_xgb.joblib file is not being created properly in the container
  • I am messing up the directory structure somehow inside the container and thus Inference.py cannot find the file.

To quote Michael Burry, "I guess when someone's wrong, they never know how". I'd like to try to understand the how here.

EDIT: Checking the contents of the container, the file (Inference_xgb.joblib) IS being created in the directory that I want (/scaleable-model). Therefore, it must be an issue with Inference.py` not detecting the file for some reason.



Solution 1:[1]

To verify if the model file is being created in the container, you can -

  • Create a container and start a bash terminal

    docker run -it <image_name> bash

  • Check the current directory - this should be scalable-model

    pwd

  • List the contents of the directory - this should show the model file

    ls

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 krskara