Bring your own deep learning framework to Amazon SageMaker with Model Server for Apache MXNet
Deep learning (DL) frameworks enable machine learning (ML) practitioners to build and train ML models. However, the process of deploying ML models in production to serve predictions (also known as inferences) in real time is more complex. It requires that ML practitioners build a scalable and performant model server, which can host these models and handle inference requests at scale.
Model Server for Apache MXNet (MMS) was developed to address this hurdle. MMS is a highly scalable, production-ready inference server. MMS was designed in a ML/DL framework agnostic way to host models trained in any ML/DL framework.
In this post, we showcase how you can use MMS to host a model trained using any ML/DL framework or toolkit in production. We chose Amazon SageMaker for production hosting. This PaaS solution does a lot of heavy lifting to provide infrastructure and allows you to focus on your use cases.
For this solution, we use the approach outlined in Bring your own inference code with Amazon SageMaker hosting. This post explains how you can bring your models together with all necessary dependencies, libraries, frameworks, and other components. Compile them in a single custom-built Docker container and then host them on Amazon SageMaker.
To showcase the ML/DL framework-agnostic architecture of MMS, we chose to launch a model trained with the PaddlePaddle framework into production. The steps for taking a model trained on any ML/DL framework to Amazon SageMaker using an MMS bring your own (BYO) container are illustrated in the following diagram:
As this diagram shows, you need two main components to bring your ML/DL framework to Amazon SageMaker using an MMS BYO container:
- Model artifacts/model archive: These are all the artifacts required to run your model on a given host.
-
- Model files: Usually symbols and weights. They are the artifacts of training a model.
- Custom service file: Contains the entry point that is called every time an inference request is received and served by MMS. This file contains the logic to initialize the model in a particular ML/DL framework, preprocess the incoming request, and run inference. It also post-processes the logic that takes the data coming out of the framework’s inference method and converts it to end-user consumable data.
- MANIFEST : The interface between the custom service file and MMS. This file is generated by running a tool called the model-archiver, which comes as a part of MMS distribution.
- Container artifact: To load and run a model written in a custom DL framework on Amazon SageMaker, bring a container to be run on Amazon SageMaker. In this post, we show you how to use the MMS base container and extend it to support custom DL frameworks and other model dependencies. The MMS base container is a Docker container that comes with a highly scalable and performant model-server, which is readily launchable in Amazon SageMaker.
In the following sections, we describe each of the components in detail.
Preparing a model
The MMS container is ML/DL framework agnostic. Write models in a ML/DL framework of your choice and bring it to Amazon SageMaker with an MMS BYO container to get the features of scalability and performance. We show you how to prepare a PaddlePaddle model in the following sections.
Preparing model artifacts
Use the Understand Sentiment example that is available and published in the examples section of the PaddlePaddle repository.
First, create a model following the instructions provided in the PaddlePaddle/book repository. Download the container and run the training using the notebook provided as part of the example. We used the Stacked Bidirectional LSTM network for training, and trained the model for 100 epochs. At the end of this training exercise, we got the following list of trained model artifacts.
$ ls
embedding_0.w_0 fc_2.w_0 fc_5.w_0 learning_rate_0 lstm_3.b_0 moment_10 moment_18 moment_25 moment_32 moment_8
embedding_1.w_0 fc_2.w_1 fc_5.w_1 learning_rate_1 lstm_3.w_0 moment_11 moment_19 moment_26 moment_33 moment_9
fc_0.b_0 fc_3.b_0 fc_6.b_0 lstm_0.b_0 lstm_4.b_0 moment_12 moment_2 moment_27 moment_34
fc_0.w_0 fc_3.w_0 fc_6.w_0 lstm_0.w_0 lstm_4.w_0 moment_13 moment_20 moment_28 moment_35
fc_1.b_0 fc_3.w_1 fc_6.w_1 lstm_1.b_0 lstm_5.b_0 moment_14 moment_21 moment_29 moment_4
fc_1.w_0 fc_4.b_0 fc_7.b_0 lstm_1.w_0 lstm_5.w_0 moment_15 moment_22 moment_3 moment_5
fc_1.w_1 fc_4.w_0 fc_7.w_0 lstm_2.b_0 moment_0 moment_16 moment_23 moment_30 moment_6
fc_2.b_0 fc_5.b_0 fc_7.w_1 lstm_2.w_0 moment_1 moment_17 moment_24 moment_31 moment_7
These artifacts constitute a PaddlePaddle model.
Writing custom service code
You now have the model files required to host the model in production. To take this model into production with MMS, provide a custom service script that knows how to use these files. This script must also know how to pre-process the raw request coming into the server and how to post-process the responses coming out of the PaddlePaddle framework’s infer method.
Create a custom service file called paddle_sentiment_analysis.py
. Here, define a class called PaddleSentimentAnalysis
that contains methods to initialize the model and also defines pre-processing, post-processing, and inference methods. The skeleton of this file is as follows:
$ cat paddle_sentiment_analysis.py
import ...
class PaddleSentimentAnalysis(object):
def __init__(self):
...
def initialize(self, context):
"""
This method is used to initialize the network and read other artifacts.
"""
...
def preprocess(self, data):
"""
This method is used to convert the string requests coming from client
into tensors.
"""
...
def inference(self, input):
"""
This method runs the tensors created in preprocess method through the
DL framework's infer method.
"""
...
def postprocess(self, output, data):
"""
Here the values returned from the inference method is converted to a
human understandable response.
"""
...
_service = PaddleSentimentAnalysis()
def handle(data, context):
"""
This method is the entrypoint "handler" method that is used by MMS.
Any request coming in for this model will be sent to this method.
"""
if not _service.initialized:
_service.initialize(context)
if data is None:
return None
pre = _service.preprocess(data)
inf = _service.inference(pre)
ret = _service.postprocess(inf, data)
return ret
To understand the details of this custom service file, see paddle_sentiment_analysis.py. This custom service code file allows you to tell MMS what the lifecycle of each inference request should look like. It also defines how a trained model-artifact can initialize the PaddlePaddle framework.
Now that you have the trained model artifacts and the custom service file, create a model-archive that can be used to create your endpoint on Amazon SageMaker.
Creating a model-artifact file to be hosted on Amazon SageMaker
To load this model in Amazon SageMaker with an MMS BYO container, do the following:
- Create a MANIFEST file, which is used by MMS as a model’s metadata to load and run the model.
- Add the custom service script created earlier and the trained model-artifacts, along with the MANIFEST file, to a .tar.gz file.
Use the model-archiver tool to do this. Before you use the tool to create a .tar.gz artifact, put all the model artifacts in a separate folder, including the custom service script mentioned earlier. To ease this process, we have made all the artifacts available for you. Run the following commands:
$ curl https://s3.amazonaws.com/model-server/blog_artifacts/PaddlePaddle_blog/artifacts.tgz | tar zxvf -
$ ls -R artifacts/sentiment
paddle_artifacts paddle_sentiment_analysis.py word_dict.pickle
artifacts/sentiment/paddle_artifacts:
embedding_0.w_0 fc_2.b_0 fc_4.w_0 fc_7.b_0 lstm_1.b_0 lstm_4.w_0 moment_12 moment_19 moment_25 moment_31 moment_6
embedding_1.w_0 fc_2.w_0 fc_5.b_0 fc_7.w_0 lstm_1.w_0 lstm_5.b_0 moment_13 moment_2 moment_26 moment_32 moment_7
fc_0.b_0 fc_2.w_1 fc_5.w_0 fc_7.w_1 lstm_2.b_0 lstm_5.w_0 moment_14 moment_20 moment_27 moment_33 moment_8
fc_0.w_0 fc_3.b_0 fc_5.w_1 learning_rate_0 lstm_2.w_0 moment_0 moment_15 moment_21 moment_28 moment_34 moment_9
fc_1.b_0 fc_3.w_0 fc_6.b_0 learning_rate_1 lstm_3.b_0 moment_1 moment_16 moment_22 moment_29 moment_35
fc_1.w_0 fc_3.w_1 fc_6.w_0 lstm_0.b_0 lstm_3.w_0 moment_10 moment_17 moment_23 moment_3 moment_4
fc_1.w_1 fc_4.b_0 fc_6.w_1 lstm_0.w_0 lstm_4.b_0 moment_11 moment_18 moment_24 moment_30 moment_5
Now you are ready to create the artifact required for hosting in Amazon SageMaker, using the model-archiver tool. The model-archiver tool is a part of the MMS toolkit. To get this tool, run these commands in a Python virtual environment because it provides isolation from the rest of the working environment.
The model-archiver tool comes preinstalled when you install mxnet-model-server.
# Create python virtual environment
$ virtualenv py
$ source py/bin/activate
# Lets install model-archiver tool in the python virtual environment
(py) $ pip install model-archiver
# Run the model-archiver tool to generate a model .tar.gz, which can be readily hosted
# on Sagemaker
(py) $ mkdir model-store
(py) $ model-archiver -f --model-name paddle_sentiment
--handler paddle_sentiment_analysis:handle
--model-path artifacts/sentiment --export-path model-store --archive-format tgz
This generates a file called sentiment.tar.gz in the /model-store directory. This file contains all the artifacts of the models and the manifest file.
(py) $ ls model-store
paddle_sentiment.tar.gz
You now have all the model artifacts that can be hosted on Amazon SageMaker. Next, look at how to build a container and bring it into Amazon SageMaker.
Building your own BYO container with MMS
In this section, you build your own MMS-based container (also known as a BYO container) that can be hosted in Amazon SageMaker.
To help with this process, every released version of MMS comes with a corresponding MMS base CPU and GPU containers hosted on DockerHub, which can be hosted on Amazon SageMaker.
For this example, use a container tagged awsdeeplearningteam/mxnet-model-server:base-cpu-py3.6
. To host the model created in the earlier section, install the PaddlePaddle and numpy packages in the container. Create a Dockerfile that extends from the base MMS image and installs the Python packages. The artifacts that you downloaded earlier come with the sample Dockerfile necessary to install required packages:
(py) $ cat artifacts/Dockerfile.paddle.mms
FROM awsdeeplearningteam/mxnet-model-server:base-cpu-py3.6
RUN pip install --user -U paddlepaddle
&& pip install --user -U numpy
Now that you have the Dockerfile that describes your BYO container, build it:
(py) $ cd artifacts && docker build -t paddle-mms -f Dockerfile.paddle.mms .
# Verify that the image is built
(py) $ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
paddle-mms latest 864796166b63 1 minute ago 1.62GB
You have the BYO container with all of the model artifacts in it, and you’re ready to launch it in Amazon SageMaker.
Creating an Amazon SageMaker endpoint with the PaddlePaddle model
In this section, you create an Amazon SageMaker endpoint in the console using the artifacts created earlier. We also provide an interactive Jupyter Notebook example of creating an endpoint using the Amazon SageMaker Python SDK and AWS SDK for Python (Boto3). The notebook is available on the mxnet-model-server GitHub repository.
Before you create an Amazon SageMaker endpoint for your model, do some preparation:
- Upload the model archive sentiment.tar.gz created earlier to an Amazon S3 bucket. For this post, we uploaded it to an S3 bucket called paddle_paddle.
- Upload the container image created earlier, paddle-mms, to an Amazon ECR repository. For this post, we created an ECR repository called “paddle-mms” and uploaded image there.
Creating the Amazon SageMaker endpoint
Now that the model and container artifacts are uploaded to S3 and ECR, you can create the Amazon SageMaker endpoint. Complete the following steps:
- Create a model configuration.
- Create an endpoint configuration.
- Create a user endpoint.
- Test the endpoint.
Create a model configuration
First, create a model configuration.
- On the Amazon SageMaker console, choose Models, Create model.
- Provide values for Model name, IAM role, location of inference code image (or the ECR repository), and Location of model artifacts (which is the S3 bucket where the model artifact was uploaded).
- Choose Create Model.
Create endpoint configuration
After you create the model configuration, create an endpoint configuration.
- In the left navigation pane, choose Endpoint Configurations, Create endpoint configuration.
- Give an endpoint configuration name, choose Add model, and add the model that we created earlier. Then choose create endpoint configuration.
Now we go to the final step, which is creating endpoint for users to send the inference requests to.
Create user endpoint
- In the left navigation pane, choose Endpoints, Create endpoint.
- For Endpoint name, enter a value such as sentiment and select the endpoint configuration that you created earlier.
- Choose Select endpoint configuration, Create endpoint.
You have created an endpoint called “sentiment” on Amazon SageMaker with an MMS BYO container to host a model built with the PaddlePaddle DL framework.
Now test this endpoint and make sure that it can indeed serve inference requests.
Testing the endpoint
Create a simple test client using the Boto3 library. Here is a small test script that sends a payload to the Amazon SageMaker endpoint and retrieves its response:
$ cat paddle_test_client.py
import boto3
runtime = boto3.Session().client(service_name='sagemaker-runtime',region_name='us-east-1')
endpoint_name="sentiment"
content_type="application/text"
payload="This is an amazing movie."
response = runtime.invoke_endpoint(EndpointName=endpoint_name,
ContentType=content_type,
Body=payload)
print(response['Body'].read())
The corresponding output from running this script is as follows:
b'Prediction : This is a Positive review'
Conclusion
In this post, we showed you how to build and host a PaddlePaddle model on Amazon SageMaker using an MMS BYO container. This flow can be reused with minor modifications to build BYO containers serving inference traffic on Amazon SageMaker endpoints with MMS for models built using many ML/DL frameworks, not just PaddlePaddle.
For a more interactive example to deploy the above PaddlePaddle model into Amazon SageMaker using MMS, see Amazon SageMaker Examples. To learn more about the MMS project, see the mxnet-model-server GitHub repository.
About the Authors
Vamshidhar Dantu is a Software Developer with AWS Deep Learning. He focuses on building scalable and easily deployable deep learning systems. In his spare time, he enjoy spending time with family and playing badminton.
Denis Davydenko is an Engineering Manager with AWS Deep Learning. He focuses on building Deep Learning tools that enable developers and scientists to build intelligent applications. In his spare time he enjoys spending time with his family, playing poker and video games.