Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Bring your own deep learning framework to Amazon SageMaker with Model Server for Apache MXNet

Deep learning (DL) frameworks enable machine learning (ML) practitioners to build and train ML models. However, the process of deploying ML models in production to serve predictions (also known as inferences) in real time is more complex. It requires that ML practitioners build a scalable and performant model server, which can host these models and handle inference requests at scale.

Model Server for Apache MXNet (MMS) was developed to address this hurdle. MMS is a highly scalable, production-ready inference server. MMS was designed in a ML/DL framework agnostic way to host models trained in any ML/DL framework.

In this post, we showcase how you can use MMS to host a model trained using any ML/DL framework or toolkit in production. We chose Amazon SageMaker for production hosting. This PaaS solution does a lot of heavy lifting to provide infrastructure and allows you to focus on your use cases.

For this solution, we use the approach outlined in Bring your own inference code with Amazon SageMaker hosting. This post explains how you can bring your models together with all necessary dependencies, libraries, frameworks, and other components. Compile them in a single custom-built Docker container and then host them on Amazon SageMaker.

To showcase the ML/DL framework-agnostic architecture of MMS, we chose to launch a model trained with the PaddlePaddle framework into production. The steps for taking a model trained on any ML/DL framework to Amazon SageMaker using an MMS bring your own (BYO) container are illustrated in the following diagram:

As this diagram shows, you need two main components to bring your ML/DL framework to Amazon SageMaker using an MMS BYO container:

  1. Model artifacts/model archive: These are all the artifacts required to run your model on a given host.
    • Model files: Usually symbols and weights. They are the artifacts of training a model.
    • Custom service file: Contains the entry point that is called every time an inference request is received and served by MMS. This file contains the logic to initialize the model in a particular ML/DL framework, preprocess the incoming request, and run inference. It also post-processes the logic that takes the data coming out of the framework’s inference method and converts it to end-user consumable data.
    • MANIFEST : The interface between the custom service file and MMS. This file is generated by running a tool called the model-archiver, which comes as a part of MMS distribution.
  1. Container artifact: To load and run a model written in a custom DL framework on Amazon SageMaker, bring a container to be run on Amazon SageMaker. In this post, we show you how to use the MMS base container and extend it to support custom DL frameworks and other model dependencies. The MMS base container is a Docker container that comes with a highly scalable and performant model-server, which is readily launchable in Amazon SageMaker.

In the following sections, we describe each of the components in detail.

Preparing a model

The MMS container is ML/DL framework agnostic. Write models in a ML/DL framework of your choice and bring it to Amazon SageMaker with an MMS BYO container to get the features of scalability and performance. We show you how to prepare a PaddlePaddle model in the following sections.

Preparing model artifacts

Use the Understand Sentiment example that is available and published in the examples section of the PaddlePaddle repository.

First, create a model following the instructions provided in the PaddlePaddle/book repository. Download the container and run the training using the notebook provided as part of the example. We used the Stacked Bidirectional LSTM network for training, and trained the model for 100 epochs. At the end of this training exercise, we got the following list of trained model artifacts.

$ ls
embedding_0.w_0    fc_2.w_0    fc_5.w_0    learning_rate_0    lstm_3.b_0    moment_10    moment_18    moment_25    moment_32    moment_8
embedding_1.w_0    fc_2.w_1    fc_5.w_1    learning_rate_1    lstm_3.w_0    moment_11    moment_19    moment_26    moment_33    moment_9
fc_0.b_0    fc_3.b_0    fc_6.b_0    lstm_0.b_0    lstm_4.b_0    moment_12    moment_2    moment_27    moment_34
fc_0.w_0    fc_3.w_0    fc_6.w_0    lstm_0.w_0    lstm_4.w_0    moment_13    moment_20    moment_28    moment_35
fc_1.b_0    fc_3.w_1    fc_6.w_1    lstm_1.b_0    lstm_5.b_0    moment_14    moment_21    moment_29    moment_4
fc_1.w_0    fc_4.b_0    fc_7.b_0    lstm_1.w_0    lstm_5.w_0    moment_15    moment_22    moment_3    moment_5
fc_1.w_1    fc_4.w_0    fc_7.w_0    lstm_2.b_0    moment_0    moment_16    moment_23    moment_30    moment_6
fc_2.b_0    fc_5.b_0    fc_7.w_1    lstm_2.w_0    moment_1    moment_17    moment_24    moment_31    moment_7

These artifacts constitute a PaddlePaddle model.

Writing custom service code

You now have the model files required to host the model in production. To take this model into production with MMS, provide a custom service script that knows how to use these files. This script must also know how to pre-process the raw request coming into the server and how to post-process the responses coming out of the PaddlePaddle framework’s infer method.

Create a custom service file called Here, define a class called PaddleSentimentAnalysis that contains methods to initialize the model and also defines pre-processing, post-processing, and inference methods. The skeleton of this file is as follows:

$ cat

import ...
class PaddleSentimentAnalysis(object):
    def __init__(self):

    def initialize(self, context):
    This method is used to initialize the network and read other artifacts.
    def preprocess(self, data):
    This method is used to convert the string requests coming from client 
    into tensors. 

    def inference(self, input):
    This method runs the tensors created in preprocess method through the 
    DL framework's infer method.

    def postprocess(self, output, data):
    Here the values returned from the inference method is converted to a 
    human understandable response.

_service = PaddleSentimentAnalysis()

def handle(data, context):
This method is the entrypoint "handler" method that is used by MMS.
Any request coming in for this model will be sent to this method.
    if not _service.initialized:

    if data is None:
        return None

    pre = _service.preprocess(data)
    inf = _service.inference(pre)
    ret = _service.postprocess(inf, data)
    return ret

To understand the details of this custom service file, see This custom service code file allows you to tell MMS what the lifecycle of each inference request should look like. It also defines how a trained model-artifact can initialize the PaddlePaddle framework.

Now that you have the trained model artifacts and the custom service file, create a model-archive that can be used to create your endpoint on Amazon SageMaker.

Creating a model-artifact file to be hosted on Amazon SageMaker

To load this model in Amazon SageMaker with an MMS BYO container, do the following:

  1. Create a MANIFEST file, which is used by MMS as a model’s metadata to load and run the model.
  2. Add the custom service script created earlier and the trained model-artifacts, along with the MANIFEST file, to a .tar.gz file.

Use the model-archiver tool to do this. Before you use the tool to create a .tar.gz artifact, put all the model artifacts in a separate folder, including the custom service script mentioned earlier. To ease this process, we have made all the artifacts available for you. Run the following commands:

$ curl | tar zxvf -
$ ls -R artifacts/sentiment
paddle_artifacts    word_dict.pickle
embedding_0.w_0    fc_2.b_0    fc_4.w_0    fc_7.b_0    lstm_1.b_0    lstm_4.w_0    moment_12    moment_19    moment_25    moment_31    moment_6
embedding_1.w_0    fc_2.w_0    fc_5.b_0    fc_7.w_0    lstm_1.w_0    lstm_5.b_0    moment_13    moment_2    moment_26    moment_32    moment_7
fc_0.b_0    fc_2.w_1    fc_5.w_0    fc_7.w_1    lstm_2.b_0    lstm_5.w_0    moment_14    moment_20    moment_27    moment_33    moment_8
fc_0.w_0    fc_3.b_0    fc_5.w_1    learning_rate_0    lstm_2.w_0    moment_0    moment_15    moment_21    moment_28    moment_34    moment_9
fc_1.b_0    fc_3.w_0    fc_6.b_0    learning_rate_1    lstm_3.b_0    moment_1    moment_16    moment_22    moment_29    moment_35
fc_1.w_0    fc_3.w_1    fc_6.w_0    lstm_0.b_0    lstm_3.w_0    moment_10    moment_17    moment_23    moment_3    moment_4
fc_1.w_1    fc_4.b_0    fc_6.w_1    lstm_0.w_0    lstm_4.b_0    moment_11    moment_18    moment_24    moment_30    moment_5

Now you are ready to create the artifact required for hosting in Amazon SageMaker, using the model-archiver tool. The model-archiver tool is a part of the MMS toolkit. To get this tool, run these commands in a Python virtual environment because it provides isolation from the rest of the working environment.

The model-archiver tool comes preinstalled when you install mxnet-model-server.

# Create python virtual environment
$ virtualenv py
$ source py/bin/activate
# Lets install model-archiver tool in the python virtual environment
(py) $ pip install model-archiver
# Run the model-archiver tool to generate a model .tar.gz, which can be readily hosted
# on Sagemaker
(py) $ mkdir model-store
(py) $ model-archiver -f --model-name paddle_sentiment 
--handler paddle_sentiment_analysis:handle 
--model-path artifacts/sentiment --export-path model-store --archive-format tgz

This generates a file called sentiment.tar.gz in the /model-store directory. This file contains all the artifacts of the models and the manifest file.

(py) $ ls model-store

You now have all the model artifacts that can be hosted on Amazon SageMaker. Next, look at how to build a container and bring it into Amazon SageMaker.

Building your own BYO container with MMS

In this section, you build your own MMS-based container (also known as a BYO container) that can be hosted in Amazon SageMaker.

To help with this process, every released version of MMS comes with a corresponding MMS base CPU and GPU containers hosted on DockerHub, which can be hosted on Amazon SageMaker.

For this example, use a container tagged awsdeeplearningteam/mxnet-model-server:base-cpu-py3.6. To host the model created in the earlier section, install the PaddlePaddle and numpy packages in the container. Create a Dockerfile that extends from the base MMS image and installs the Python packages. The artifacts that you downloaded earlier come with the sample Dockerfile necessary to install required packages:

(py) $ cat artifacts/Dockerfile.paddle.mms
FROM awsdeeplearningteam/mxnet-model-server:base-cpu-py3.6

RUN pip install --user -U paddlepaddle 
    && pip install --user -U numpy

Now that you have the Dockerfile that describes your BYO container, build it:

(py) $ cd artifacts && docker build -t paddle-mms -f Dockerfile.paddle.mms .
# Verify that the image is built
(py) $ docker images
REPOSITORY      TAG        IMAGE ID            CREATED             SIZE
paddle-mms     latest     864796166b63        1 minute ago        1.62GB

You have the BYO container with all of the model artifacts in it, and you’re ready to launch it in Amazon SageMaker.

Creating an Amazon SageMaker endpoint with the PaddlePaddle model

In this section, you create an Amazon SageMaker endpoint in the console using the artifacts created earlier. We also provide an interactive Jupyter Notebook example of creating an endpoint using the Amazon SageMaker Python SDK and AWS SDK for Python (Boto3). The notebook is available on the mxnet-model-server GitHub repository.

Before you create an Amazon SageMaker endpoint for your model, do some preparation:


  1. Upload the model archive sentiment.tar.gz created earlier to an Amazon S3 bucket. For this post, we uploaded it to an S3 bucket called paddle_paddle.
  2. Upload the container image created earlier, paddle-mms, to an Amazon ECR repository. For this post, we created an ECR repository called “paddle-mms” and uploaded image there.

Creating the Amazon SageMaker endpoint

Now that the model and container artifacts are uploaded to S3 and ECR, you can create the Amazon SageMaker endpoint. Complete the following steps:

  1. Create a model configuration.
  2. Create an endpoint configuration.
  3. Create a user endpoint.
  4. Test the endpoint.

Create a model configuration

First, create a model configuration.

  1. On the Amazon SageMaker console, choose Models, Create model.
  2. Provide values for Model name, IAM role, location of inference code image (or the ECR repository), and Location of model artifacts (which is the S3 bucket where the model artifact was uploaded).

  3. Choose Create Model.

Create endpoint configuration

After you create the model configuration, create an endpoint configuration.

  1. In the left navigation pane, choose Endpoint Configurations, Create endpoint configuration.
  2. Give an endpoint configuration name, choose Add model, and add the model that we created earlier. Then choose create endpoint configuration.

Now we go to the final step, which is creating endpoint for users to send the inference requests to.

Create user endpoint

  1. In the left navigation pane, choose Endpoints, Create endpoint.
  2. For Endpoint name, enter a value such as sentiment and select the endpoint configuration that you created earlier.
  3. Choose Select endpoint configuration, Create endpoint.

You have created an endpoint called “sentiment” on Amazon SageMaker with an MMS BYO container to host a model built with the PaddlePaddle DL framework.

Now test this endpoint and make sure that it can indeed serve inference requests.

Testing the endpoint

Create a simple test client using the Boto3 library. Here is a small test script that sends a payload to the Amazon SageMaker endpoint and retrieves its response:

$ cat

import boto3

runtime = boto3.Session().client(service_name='sagemaker-runtime',region_name='us-east-1')

payload="This is an amazing movie."
response = runtime.invoke_endpoint(EndpointName=endpoint_name,


The corresponding output from running this script is as follows:

b'Prediction : This is a Positive review'


In this post, we showed you how to build and host a PaddlePaddle model on Amazon SageMaker using an MMS BYO container. This flow can be reused with minor modifications to build BYO containers serving inference traffic on Amazon SageMaker endpoints with MMS for models built using many ML/DL frameworks, not just PaddlePaddle.

For a more interactive example to deploy the above PaddlePaddle model into Amazon SageMaker using MMS, see Amazon SageMaker Examples. To learn more about the MMS project, see the mxnet-model-server GitHub repository.

About the Authors

Vamshidhar Dantu is a Software Developer with AWS Deep Learning. He focuses on building scalable and easily deployable deep learning systems. In his spare time, he enjoy spending time with family and playing badminton.




Denis Davydenko is an Engineering Manager with AWS Deep Learning. He focuses on building Deep Learning tools that enable developers and scientists to build intelligent applications. In his spare time he enjoys spending time with his family, playing poker and video games.





Next Meetup




Plug yourself into AI and don't miss a beat


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.