Deploy TensorFlow models with Amazon Elastic Inference using a flexible new Python API available in EI-enabled TensorFlow 1.12
Amazon Elastic Inference (EI) now supports the latest version of TensorFlow–1.12. It provides EIPredictor, a new easy-to-use Python API function for deploying TensorFlow models using EI accelerators. You can now use this new Python API function within your inference scripts as an alternative to using TensorFlow Serving when running TensorFlow models with EI. EIPredictor allows for easy experimentation and lets you compare performance with and without EI. This blog post shows you how to use EIPredictor to deploy your models on EI.
Let me start with some background. Amazon Elastic Inference is a new capability we launched at re:Invent 2018. EI provides a new, significantly more cost-effective way to apply acceleration to your deep learning inference workloads than using standalone GPU instances. EI lets you attach accelerators to any Amazon SageMaker or Amazon EC2 instance type and provides you the low latency, high throughput benefits of GPU acceleration at a much lower cost (up to 75%). You can use EI to deploy TensorFlow, Apache MXNet, and ONNX models for inference.
Using TensorFlow Serving to run models on EI
At the launch of Amazon EI we introduced EI-enabled TensorFlow Serving, which provides an easy way to run your TensorFlow models with EI accelerators without having to make any code changes. Just start a model server with EI-enabled TensorFlow Serving with your trained TensorFlow SavedModel, and make calls to it. EI-enabled TensorFlow Serving uses the same API as normal TensorFlow Serving. The only difference is that the entry point is a different binary named AmazonEI_TensorFlow_Serving_v1.12_v1. Here is an example command that you can use to launch the server:
You can find EI-enabled TensorFlow Serving in the AWS Deep Learning AMIs (here’s a tutorial), or you can download the package from this Amazon S3 bucket so you can build it into your own custom Amazon Machine Image (AMI) or Docker container. EI-enabled TensorFlow Serving extends TensorFlow’s high performance model serving system to work seamlessly with EI. It automates accelerator discovery, secures your inference requests over the network with TLS encryption, and restricts access with AWS Identity and Access Management (IAM) policies.
Using EIPredictor to run models on EI
EIPredictor is a simple Python function for performing inference on a pretrained model. It is a new API function available within EI-enabled TensorFlow. It’s also available in the Deep Learning AMI and for download using Amazon S3. You can use EIPredictor in the following ways:
- You can use EIPredictor with a saved model or a frozen graph. It’s similar to TF predictor. Please see EI’s documentation for using EIPredictor with these model formats.
- You can disable usage of EI by using the
use_eiflag which is defaulted to
True. This is useful to see how your model performs with and without EI acceleration.
- EIPredictor can also be created from a TensorFlow Estimator. Given a trained Estimator, you first export a SavedModel. Refer to the SavedModel documentationfor more details. Example usage:
The following code sample shows the available parameters for this function:
Example for running a model with EI Predictor
Here’s an example you can try for serving a ResNet using a Single Shot Detector (SSD) model using EI Predictor. This example assumes that you’ve launched an EC2 instance with an EI accelerator. We’re going to use the latest Deep Learning AMI here for this example.
- The first step is to activate the TensorFlow Elastic Inference Note that this is specific to the Deep Learning AMI. You don’t need this step if you built the EI-enabled TensorFlow library with your own custom AMI. You can choose between the Python 2 and Python 3 TensorFlow EI environments. I’ll use Python 2 for this example:
- Download the ResNet SSD model example from Amazon S3.
- Unzip the model. Again, you may skip this step if you already have the model.
- Download a picture of three dogs to your current directory.
- Now open a text editor, such as vim, and paste the following inference script. Save the file as ssd_resnet_predictor.py
- Run the inference script.
You now have two convenient ways, depending on your preference, to run your TensorFlow models on cost efficient accelerators. Give it a try and let us know what you think at firstname.lastname@example.org
About the Author
Dominic Divakaruni is the Product Manager for Amazon Elastic Inference. He builds services that help customers scale production machine learning applications. In this spare time he enjoys drumming with his son and working on cars.