'Automate the execution of a .ipynb file in SageMaker

I want to automate Jupyter's work.

I created a function in AWS Lambda that when the S3 bucket receives a .csv file, it opens the determined instance of Jupyter and it works fine.

Now I want to execute the .ipynb file that does all the work.

I have tried using the Jupyter Configuration Lifecycle.

But it always fails. Would it be possible to do it in the same lambda function?

jupyter nbconvert --execute --to notebook
                  --inplace /home/ec2-user/SageMaker/Scikit.ipynb
                  --ExecutePreprocessor.kernel_name=python3
                  --ExecutePreprocessor.timeout=1500

When you run the file .ipynb does not put in running, it executes it in terminal.

I would like you to run it in online mode.

In the file .ipynb I call Sagemaker to bring the role and one of the errors that AWS CloudWatch shows is the following:

ModuleNotFoundError: No module named 'sagemaker' <-- Appears in CloudWatch


Solution 1:[1]

Thank you for using Amazon SageMaker.

There is no official way to execute code on Notebook Instance from Lambda but below is somewhat scrappy workaround.

On side note, if using Lambda is not a hard requirement then you can use some kind of cron job on your Notebook Instance to execute jupyter notebooks periodically.

Since you already figured out a way to start your Notebook Instance from Lambda, you can use following code (replace notebook_instance_name with your Notebook Instance name) to connect to your InService Notebook Instance and execute command on it including the one you provided to run the jupyter notebooks.

import boto3
import time
from botocore.vendored import requests
import websocket

def lambda_handler(event, context):
    sm_client = boto3.client('sagemaker')
    notebook_instance_name = 'test'
    url = sm_client.create_presigned_notebook_instance_url(NotebookInstanceName=notebook_instance_name)['AuthorizedUrl']

    url_tokens = url.split('/')
    http_proto = url_tokens[0]
    http_hn = url_tokens[2].split('?')[0].split('#')[0]

    s = requests.Session()
    r = s.get(url)
    cookies = "; ".join(key + "=" + value for key, value in s.cookies.items())

    ws = websocket.create_connection(
        "wss://{}/terminals/websocket/1".format(http_hn),
        cookie=cookies,
        host=http_hn,
        origin=http_proto + "//" + http_hn
    )

    ws.send("""[ "stdin", "jupyter nbconvert --execute --to notebook --inplace /home/ec2-user/SageMaker/Scikit.ipynb --ExecutePreprocessor.kernel_name=python3 --ExecutePreprocessor.timeout=1500\\r" ]""")
    time.sleep(1)
    ws.close()
    return None

Please note that the following code involve python websocket module websocket-client which doesn't come by default so you'll need to package it with your lambda code and upload to lambda. I followed the Lambda documentation for packing dependencies.

Edit:

Vendored version of requests was removed from botocore, so from botocore.vendored import requests should be replaced by import requests

For some reason the websockets server is returning websocket._exceptions.WebSocketBadStatusException: Handshake status 500 Internal Server Error when "User-Agent" header is not present.

To include the header, we can include the header parameter on the create_connection method:

    ws = websocket.create_connection(
        "wss://{}/terminals/websocket/5".format(http_hn),
        cookie=cookies,
        host=http_hn,
        origin=http_proto + "//" + http_hn,
        header = [
            "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36"
        ]
    )

Solution 2:[2]

A few things, just to clarify, your questions are: -How can you run a jupyter notebook online instead of via CLI -Why are you getting the error you displayed

So to get at the first question, what is the point of running the .ipynb file in lambda, as opposed to on an EC2 instance? If you just deploy the file there and install anaconda/jupyter, you can execute in a cloud environment and obviously use the GUI, all on AWS resources.

Second question, it doesn't look like a role error but I assume the lambda can access sagemaker? also, how are you accessing sagemaker, through boto?

Edit: this link might help https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-root-access.html?sc_channel=sm&sc_campaign=Docs&sc_publisher=LINKEDIN&sc_country=Global&sc_geo=GLOBAL&sc_outcome=awareness&trk=Docs_LINKEDIN&sc_content=Docs&linkId=66519000

Solution 3:[3]

It is not clear what you want to achieve with your question. Risking that I got it wrong, I'll describe the way that I'm using the jupyter notebooks that I develop in SageMaker Jupyter notebook instance. Afterall, the notebooks service is designed for development and not for production or automation of python execution.

First, you want to have your python and ipynb files in git or other source control systems. This is easy to achieve using the git integration of SageMaker: https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-git-repo.html

Second, you want to check out the open source project of Netflix, papermill. You can read about the full stack of jupyter support in Netflix in this blog post: https://medium.com/netflix-techblog/notebook-innovation-591ee3221233

I'm using papermill to schedule, pass parameters, execute the notebook and monitor its output in s3, using something like:

$ papermill s3://bkt/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rodrigo Torres
Solution 2
Solution 3 Biranchi