'Use AWS Lambda to execute a jupyter notebook on AWS Sagemaker

I made a classifier in Python that uses a lot of libraries. I have uploaded the model to Amazon S3 as a pickle (my_model.pkl). Ideally, every time someone uploads a file to a specific S3 bucket, it should trigger an AWS Lambda that would load the classifier, return predictions and save a few files on an Amazon S3 bucket.

I want to know if it is possible to use a Lambda to execute a Jupyter Notebook in AWS SageMaker. This way I would not have to worry about the dependencies and would generally make the classification more straight forward.

So, is there a way to use an AWS Lambda to execute a Jupyter Notebook?



Solution 1:[1]

Scheduling notebook execution is a bit of a SageMaker anti-pattern, because (1) you would need to manage data I/O (training set, trained model) yourself, (2) you would need to manage metadata tracking yourself, (3) you cannot run on distributed hardware and (4) you cannot use Spot. Instead, it is recommended for scheduled task to leverage the various SageMaker long-running, background job APIs: SageMaker Training, SageMaker Processing or SageMaker Batch Transform (in the case of a batch inference).

That being said, if you still want to schedule a notebook to run, you can do it in a variety of ways:

  • in the SageMaker CICD Reinvent 2018 Video, Notebooks are launched as Cloudformation templates, and their execution is automated via a SageMaker lifecycle configuration.
  • AWS released this blog post to document how to launch Notebooks from within Processing jobs

But again, my recommendation for scheduled tasks would be to remove them from Jupyter, turn them into scripts and run them in SageMaker Training

No matter your choices, all those tasks can be launched as API calls from within a Lambda function, as long as the function role has appropriate permissions

Solution 2:[2]

I agree with Olivier. Using Sagemaker for Notebook execution might not be the right tool for the job.

Papermill is the framework to run Jupyter Notebooks in this fashion.

You can consider trying this. This allows you to deploy your Jupyter Notebook directly as serverless cloud function and uses Papermill behind the scene.

Disclaimer: I work for Clouderizer.

Solution 3:[3]

It totally possible, not an anti-pattern at all. It really depends on your use-case. AWs actually made a great article describing it, which includes a lambda

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Olivier Cruchant
Solution 2 Prakash Gupta
Solution 3 Dolf Andringa