Learn About Our Meetup

4200+ Members

Month: October 2018

Now use Pipe mode with CSV datasets for faster training on Amazon SageMaker built-in algorithms

Amazon SageMaker built-in algorithms now support Pipe mode for fetching datasets in CSV format from Amazon Simple Storage Service (S3) into Amazon SageMaker while training machine learning (ML) models.

With Pipe input mode, the data is streamed directly to the algorithm container while model training is in progress. This is unlike File mode, which downloads data to the local Amazon Elastic Block Store (EBS) volume prior to starting the training. Using Pipe mode your training jobs start faster, use significantly less disk space and finish sooner. This reduces your overall cost to train machine learning models. In some of our internal benchmarks that trained a regression model with the Amazon SageMaker Linear Learner algorithm on a 3.9 GB CSV dataset, the overall time to train the model was reduced by up to 40 percent by using Pipe mode instead of File mode. You can read more about Pipe mode and its benefits in this blog post.

Using Pipe mode with Amazon SageMaker Built-in Algorithms

Earlier this year when we first released the Pipe input mode for the built-in Amazon SageMaker algorithms, it supported data only in protobuf recordIO format. This is a special format designed specifically for high-throughput training jobs. With today’s release we are extending the performance benefits of the Pipe input mode to your training datasets in CSV format as well. The following Amazon SageMaker built-in algorithms now have full support for training with datasets in CSV format using Pipe input mode:

  • Principal Component Analysis (PCA)
  • K-Means Clustering
  • K-Nearest Neighbors
  • Linear Learner (Classification and Regression)
  • Neural Topic Modelling
  • Random Cut Forest

To start benefiting from this new feature in your training jobs just specify the Amazon S3 location of your CSV dataset as usual and pick “Pipe” instead of “File” as your input mode. Your CSV datasets will be streamed seamlessly with no data formatting or code changes required at your end.

Faster Training using CSV optimized Pipe Mode

The new Pipe mode implementation for datasets in CSV format is a highly optimized, high throughput process. To demonstrate performance gains from using Pipe input mode, we trained the Amazon SageMaker Linear Learner algorithms over two synthetic CSV datasets.

The first dataset – a 3.9 GB CSV file– contained 2 million records, each record having 100 comma-separated, single-precision floating-point values. The following is a comparison of the overall training job execution time and model training times between Pipe mode and File mode while training the Amazon SageMaker Linear Learner algorithm with a batch size of 1000.

As you can see, using Pipe input mode with CSV datasets reduces the total time-to-train the model by up to 40 percent across few of the instance types supported by Amazon SageMaker.

Our second dataset – a 1 GB CSV file–had only 400 records, however each record had 100,000 comma-separated single-precision floating-point values. We repeated the earlier training benchmarks with a batch size of 10.

This time the performance gain from using Pipe mode is even more significant, to an order of 75 percent reduction in total-time-to-train the model.

Both experiments clearly show that using Pipe input mode brings a dramatic performance improvement. Your training jobs can avoid any startup delays caused by downloading datasets to the training instances, and they can have a much higher data read throughput.

Get started with Amazon SageMaker

You can easily get started with Amazon SageMaker using our sample notebooks. You can also look at our developer guide for more resources and subscribe to our discussion forum for new launch announcements.


About the Authors

Can Balioglu is a Software Development Engineer on the AWS AI Algorithms team where he is specialized in high-performance computing. In his spare time he loves to play with his homemade GPU cluster.




Sumit Thakur is a Senior Product Manager for AWS Machine Learning Platforms where he loves working on products that make it easy for customers to get started with machine learning on cloud. He is product manager for Amazon SageMaker and AWS Deep Learning AMI. In his spare time, he likes connecting with nature and watching sci-fi TV series.




Model Server for Apache MXNet v1.0 released

AWS recently released Model Server for Apache MXNet (MMS) v1.0, featuring a new API for managing the state of the service, which includes the ability to dynamically load models during runtime, to lower latency, and to have higher throughput. In this post, we will explore the new features and showcase the performance gains of the MMS v1.0.

What is Model Server for Apache MXNet (MMS)?

MMS is an open-source model serving framework, designed to simplify the task of serving deep learning models for inference at scale. The following is an architectural diagram of the MMS scalable deployment.

Here are some key features of MMS v1.0:

  • Designed to serve MXNet, Gluon, and ONNX neural network models.
  • Gives you the ability to customize every step in the inference execution pipeline using custom code packaged into the model archive.
  • Comes with a preconfigured stack of services that is light and scalable, including REST API endpoints.
  • Exposes a management API that allows model loading, unloading, and scaling at runtime.
  • Provides prebuilt and optimized container images for serving inference at scale.
  • Includes real-time operational metrics to monitor health, performance, and load of the system and APIs.

Quick start with MMS

MMS 1.0 requires Java 8 or higher. Here’s how to install it on the supported platforms:

# Ubuntu/Debian based distributions
sudo apt-get install openjdk-8-jre
# Fedora, Redhat based distributions
sudo yum install java-1.8.0-openjdk
# On Mac devices
brew tap caskroom/versions
brew cask install java8

MMS is currently not supported on Windows.

To install MMS v1.0 package:

pip install mxnet-model-server==1.0

MMS v1.0 doesn’t depend on any specific deep learning engine, but in this blog post we will focus doing inference using the MXNet engine.

# Install mxnet
pip install mxnet

To verify the installation:


This should produce the output:

[INFO ] main -
MMS Home: pip_directory/mxnet-model-server
Current directory: <your-current-directory>
Temp directory: /temp/directory
Log dir: cur_dir/logs
Metrics dir: cur_dir/logs
[INFO ] main — Initialize servers with: KQueueServerSocketChannel.
[INFO ] main — Inference API listening on port: 8080
[INFO ] main — Management API listening on port: 8081
Model server started.

The previous step will fail if no Java runtime is found, or if MMS is not properly installed by pip.

To stop the server, run:

mxnet-model-server --stop

Now that we have verified the MMS installation, let’s go infer some cat breeds.

Running inference

To allow you to get started quickly, we’ll show how you can start MMS, load a pre-trained model for inference, and scale the model at runtime.

We’ll start by running a model server serving SqueezeNet, a light-weight image classification model:

# Start and load squeezenet while doing so
mxnet-model-server --models squeezenet=

Let’s download a cat image and send it to MMS to get an inference result that identifies the cat breed.

# Download the cat image
curl -O

To send requests to the prediction API we use port 8080.

# Predict on squeezenet
$ curl -X POST -T kitten.jpg

# Response
      "class":"n02124075 Egyptian cat"
      "class":"n02123045 tabby, tabby cat"
      "class":"n02123159 tiger cat"
      "class":"n02128385 leopard, Panthera pardus"
      "class":"n02127052 lynx, catamount"

As you can see, the model has identified the little Egyptian cat rightly, and MMS has delivered the result. Now the SqueezeNet worker is up, running, and able to predict.

Model Management API

MMS v1.0 features a new management API enabling registration, loading, and unloading of models at runtime. This is especially useful in production environments where deep learning (DL)/machine learning (ML) models are often consumed from external model-building pipelines. The MMS model management API provides a convenient REST interface to ensure that inference is served without downtime that otherwise would be necessary to populate new models into the running model server. This API consists of resources to register the new model, load it into running MMS instance (scale the model up) and unload it from the MMS instance when it’s no longer needed (scale the model down). All of these resources are available at runtime and don’t cause MMS downtime because they don’t perform bounce of MMS instance.

For security reasons, there is a separate port to access the management API that can only be accessed from the local host. By default, port 8081 is for the management API and port 8080 is for the prediction API. Let’s register and load the Network in Network (NIN) image classification model at runtime.

# To register and load a model, NiN
$ curl -X POST ""
# Response
    "status": "Workers scaled"

This makes MMS aware of a model and where to load it from. Then it starts a single worker process, in a synchronous fashion. In order to spawn more workers for a registered model:

# To spawn more workers
$ curl -X PUT ""
# Response
    "status": "Workers scaled"

We now have two workers for the NIN model. The min_workers parameter defines the minimum number of workers that should be up for the model. If this is set to 0 existing workers will be killed. In general, if it is set to ‘N’ (where N >= 0), ‘N – current_workers’ indicates that more additional workers will be spawned (or deleted if result is a negative number). The synchronous parameter ensures the request-response cycle is synchronous. For details on parameters see the MMS REST API specification.

Now MMS is ready to take inference requests for the NIN model as well. We’ll download an image of a tabby cat.

# Download tabby cat image
curl -O

# Prediction on tabby cat
curl -X POST -T tabby.jpg
# Response
    "probability": 0.8571610450744629,
    "class": "n02123045 tabby, tabby cat"
    "probability": 0.1162034347653389,
    "class": "n02123159 tiger cat"
    "probability": 0.025708956643939018,
    "class": "n02124075 Egyptian cat"
    "probability": 0.0009223946835845709,
    "class": "n02127052 lynx, catamount"
    "probability": 3.3365624858561205e-06,
    "class": "n03958227 plastic bag"

The NIN identified the tabby cat with 85% probability.

For more details on available APIs for management and prediction see the API documentation.

In the following section we’ll cover performance gains with MMS v1.0.

Performance improvements

MMS 1.0 introduces improved scalability and performance, the output of a newly designed architecture. To measure performance we use a CPU machine (EC2 c5.4xlarge instance, mxnet-mkl 1.3.0 package installed), to run inference on CPU. We use a GPU machine (EC2 p3.16xlarge, mxnet-cu90mkl 1.3.0 package installed) to run inference on GPU.

In addition to considering time spent in inference, you should also be aware of the overhead that is incurred from request handling. This overhead affects latency as the number of concurrent requests increases. Two addends comprise inference latency. There is the latency of running a forward pass on the model, and there is the latency of infra steps. The latency of infra steps consists of preparing data and handling results for/of forward pass, collecting metrics, passing data between frontend and backend, and switching between contexts while dealing with a greater number of concurrent users. To measure the latency of infra steps we run a test on a CPU machine using a specially devised no operation (no-op) model. This model doesn’t include running a forward pass, which means that the inference latency that is captured includes only the cost of the infra steps. This test demonstrated that MMS v1.0 has 4x better latency overhead with 100 concurrent requests and 7x better latency overhead with 200 concurrent requests. Latency on bigger models like Resnet-18 (where the actual inference on the engine is the hotspot) showed improvement as well. The inference latency with single concurrent request has improved 1.15x running resnet-18 on a GPU machine with a 128×128 image. With an increase to 100 concurrent requests GPU tests show up to 2.2x improvement in MMS v1.0 performance.

For throughput there is a 1.11x gain on a CPU machine, while running on GPU machine results in a 1.35x gain.

Another important performance metric is success rate, as the number of concurrent requests increases. As load increases and pushes towards hardware limits, the service starts to error on requests. The following graph shows that MMS v1.0 makes improvements in the request success rate:

On a GPU machine, the load is shared with CPU. Here MMS v0.4 holds up until 200 concurrent requests, but starts showing errors as it moves towards more concurrent requests. On a CPU machine, the success rate of MMS v0.4 drops on fewer concurrent requests.

The tests for success rates of requests show that the load handling capacity on a single node has significantly improved.

Learn more and contribute

This is a preview of improvements and updates introduced in MMS v1.0. To learn more about MMS v1.0, start with our examples and documentation in the repository’s model zoo and documentation folder.

We welcome community participation including questions, requests, and contributions, as we continue to improve MMS. If you are using MMS already, we welcome your feedback via the repository’s GitHub issues. Head over to awslabs/mxnet-model-server to get started!


About the Authors

Denis Davydenko is an Engineering Manager with AWS Deep Learning. He focuses on building Deep Learning tools that enable developers and scientists to build intelligent applications. In his spare time he enjoys spending time with his family, playing poker and video games.




Frank Liu is a Software Engineer for AWS Deep Learning. He focuses on building innovative deep learning tools for software engineers and scientists. In his spare time, he enjoys hiking with friends and family.




Vamshidhar Dantu is a Software Developer with AWS Deep Learning. He focuses on building scalable and easily deployable deep learning systems. In his spare time, he enjoy spending time with family and playing badminton.




Rakesh Vasudevan is a Software Development Engineer with AWS Deep Learning. He is passionate about  building scalable deep learning systems. In spare time, he enjoys gaming, cricket and hanging out with friends and family.





Using deep learning on AWS to lower property damage losses from natural disasters

Natural disasters like the 2017 Santa Rosa fires and Hurricane Harvey cost hundreds of billions of dollars in property damages every year, wreaking economic havoc in the lives of homeowners. Insurance companies do their best to evaluate affected homes, but it could take weeks before assessments are available and salvaging and protecting the homes can begin. EagleView, a property data analytics company, is tackling this challenge with deep learning on AWS.

“Traditionally, the insurance companies would send out adjusters for property damage evaluation, but that could take several weeks because the area is flooded or otherwise not accessible,” explains Shay Strong, director of data science and machine learning at EagleView. “Using satellite, aerial, and drone images, EagleView runs deep learning models on the AWS Cloud to make accurate assessments of property damage within 24 hours. We provide this data to both large national insurance carriers and small regional carriers alike, to inform the homeowners and prepare next steps much more rapidly.”

Often, this quick turnaround can save millions of dollars in property damages. “During the flooding in Florida from Hurricane Irma, our clients used this timely data to learn where to deploy tarps so they could cover some of the homes to prevent additional water damage,” elaborates Strong.

Matching the accuracy of human adjusters in property assessments requires EagleView to use a rich set of images covering the entire multi-dimensional space (spatial, temporal, and spectral) of a storm-affected region. To solve this challenge, EagleView captures ultra-high resolution aerial images across the U.S. at sub-1” pixel resolution using a fleet of 120+ aircraft. The imagery is then broken down into small image tiles—often parcel-specific tiles or generic 256×256 TMS tiles—to run through deep learning image classifiers, object detectors, and semantic segmentation architectures. Each image tile can be associated with the corresponding geospatial and time coordinates, which are kept as additional metadata and maintained throughout the learning and inference processes. Post-inference, tiles can be stitched together using the geospatial data to form a geo-registered map of information of the area of interest, including the neural network predictions. The predictions can also be aggregated to a property-level database for persistent storage, maintained in the AWS Cloud.

The following image demonstrates the accuracy of damage predictions by EagleView’s deep learning model for a portion of Rockport, Texas, after Hurricane Harvey in 2017. The green blobs in the image on the left are the properties where catastrophic structural damage occurred, based on human analysis. The pink blobs in the image on the right are segmented damage predictions that the model made. For this data, the model has a per-address accuracy of 96% compared to human analysis.

“We also use deep learning for interim pre-processing capabilities to determine such things as whether the image is of good quality (e.g., not cloudy or blurry) prior to generating address-level attributes and whether the image even contains the correct property of interest. We daisy-chain together intermediate neural nets to pre-process the imagery to improve the efficiency and accuracy of the neural nets generating the property attributes,” adds Strong.

EagleView built its deep learning models using the Apache MXNet framework. The models are trained using Amazon EC2 P2, P3, and G3 GPU instances on AWS.  Once ready, the models are deployed onto a massive fleet of Amazon ECS containers to process the terabytes of aerial imagery that EagleView collects daily. The company has accumulated petabytes of property-focused aerial imagery in total, all of which is stored on Amazon S3, which the deep learning models process. The results are stored in a combination of Amazon RedshiftAmazon Aurora, and Amazon S3, based on data type. For example, deep learning imagery products like segmented raster maps are stored on S3 and referenced as a function of street address in Amazon Redshift databases. The resulting information is served to EagleView’s clients using APIs or custom user interfaces.

As to why EagleView chose MXNet over other deep learning frameworks, Strong says, “It’s the flexibility, scalability, and the pace of innovation that led us to adopt MXNet. With MXNet, we can train the models on the powerful P3 GPU instances, allowing us to quickly iterate and build the model. We can then deploy them to low-cost CPU instances for inference. MXNet can also handle the kind of scale that we require for operation, which includes petabytes of image storage and associated data. Lastly, the pace of innovation around MXNet makes it easy for us to keep up with the advances in the deep learning space.”

One of EagleView’s next steps is to use Gluon, an open-source deep learning interface, to translate R&D models developed natively in TensorFlow, PyTorch, or other frameworks into MXNet. Then EagleView can bring machine learning models developed by either its data scientists or other open-source authors in these other frameworks into MXNet for running inference in production at a large scale.

“The affordability and scalability of AWS makes it possible these days to run deep learning models to the level of accuracy that humans can achieve for many tasks, such as insurance assessments, but with a level of consistency never seen before. For EagleView’s insurance clients, consistency, accuracy, and scale is imperative,” concludes Strong. “This has the potential to disrupt traditional industries like insurance, adding tremendous value.”

To get started with deep learning, you can try MXNet as a fully managed experience in the Amazon SageMaker ML service.


About the Author

Chander Matrubhutam is a principal product marketing manager at AWS, helping customers understand and adopt deep learning.







Amazon Translate now offers 113 new language pairs

Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. Today, we are launching 113 new language pairs. Customers can now translate between currently supported languages, such as French to Spanish for example, with a single API request. With this update, we are expanding the number of supported language pairs from 24 to 137. All supported language pairs are based on state-of-the-art neural machine translation algorithms. See the full list of supported language pairs on this documentation page.

Previously, if you had been translating between X<>Y pairs (where neither X nor Y are English) you had to perform two subsequent translation calls. You had to translate from X to English and then you had to translate the English output into Y. This meant you had to perform two translations to receive the one you actually needed. With this launch, we are removing this extra step, effectively reducing the cost of X<>Y translations by 50 percent and making them faster.

Clevy, a Paris-based startup, offers a platform that enables organizations to create, deploy, and maintain chatbots that automatically answer their employees’ most frequently asked questions. These questions are often about internal subjects like HR, IT support, and change management. Through its bots, Clevy serves over 1 million employees worldwide. Clevy has been using Amazon Translate behind the scenes to make its chatbots multilingual. Bots are created in a single language, but users can ask questions in as many as ten other languages currently. When a question comes in, Clevy detects the source language with Amazon Comprehend. Next, Clevy uses Amazon Translate to translate the question to the bot’s language for handling by its proprietary Natural Language Processing (NLP) algorithms. Finally, the answer is returned to the user in their original language. For example, a customer that creates a bot in English can still enable their employees worldwide to ask questions and receive answers in ten other languages.

Adding multilingual processing to its bots has been a game changer for Clevy. It opens opportunities to automatically manage requests from more people in more places using more languages, at a very low cost and with no human effort involved. Many of Clevy’s customers are large, global companies, and this was by far the feature that they requested most.

For example, one of Clevy’s customers is headquartered in Portugal, with additional offices in Italy, France, and Spain. Their knowledge base is in Portuguese, but many employees search for answers in French, Italian, or Spanish. For this customer, using X <> Y language translations will have two main benefits: It cuts the cost of the feature in half, and it provides a better user experience by reducing latency. Without X <> Y language translations, all requests need to be translated from the source language (e.g., Italian) to English first, then from English to the target language (in this case Portuguese). This implies extra networking time with two round-trips between the application and Amazon Translate, twice the cost with two API calls instead of one, and extra code for handling those cases.

Clevy expects to rely more and more on this feature to expand its customer base in non-English speaking countries, especially in Europe and South America. In 2019, Clevy plans to expand to new countries and expects over 30 percent of its bots to be multilingual. Half of these bots will have a base language other than English and directly benefit from the X <> Y language translations. “These companies represent a very important customer base for us because we do not primarily target the North American market,” said François Falala-Sechet, Clevy’s CTO. “Combined with the growing number of languages available in Amazon Translate, this new feature helps us grow and serve more customers in more countries.”

To use new language pairs, simply select any supported language pair in your API request or on the AWS Management Console Try Amazon Translate page. To get started with Amazon Translate go to Getting Started with Amazon Translate or check out this 10 minute video tutorial.


About the Author

Yoni Friedman is a Sr. Technical Product Manager in the AWS Artificial Intelligence team where he leads product management for Amazon Translate. He spends his free time reading, running, playing ball, and doing other stuff his two toddlers ask him to.







Understanding Amazon SageMaker notebook instance networking configurations and advanced routing options

An Amazon SageMaker notebook instance provides a Jupyter notebook app through a fully managed machine learning (ML) Amazon EC2 instance. Amazon SageMaker Jupyter notebooks are used to perform advanced data exploration, create training jobs, deploy models to Amazon SageMaker hosting, and test or validate your models.

The notebook instance has a variety of networking configurations available to it. In this blog post we’ll outline the different options and discuss a common scenario for customers.

The basics

Amazon SageMaker notebook instances can be launched with or without your Virtual Private Cloud (VPC) attached. When launched with your VPC attached, the notebook can either be configured with or without direct internet access.

IMPORTANT NOTE: Direct internet access means that the Amazon SageMaker service is providing a network interface that allows for the notebook to talk to the internet through a VPC managed by the service.

Using the Amazon SageMaker console, these are the three options:

  1. No customer VPC is attached.
  2. Customer VPC is attached with direct internet access.
  3. Customer VPC is attached without direct internet access.

What does it really mean?

Each of the three options automatically configures the network interfaces on the managed EC2 instance with a set of routing configurations. In certain situations, you might want to modify these settings to route specific IP address ranges to a different network interface. Next, we’ll step through each of these default configurations:

  1. No attached customer VPC (1 network interface)

    In this configuration, all the traffic goes through the single network interface.  The notebook instance is running in an Amazon SageMaker managed VPC as shown in the above diagram.
  2. Customer attached VPC with direct internet access (2 network interfaces)

    In this configuration, the notebook instance needs to decide which network traffic should go down either of the two network interfaces.
    If we look at an example where we launched into a VPC with a CIDR range and look at the OS-level route information, we see this.

    Looking at this route table, we can see:

    • traffic will use eth2 interface.
    • Some Docker and metadata routes.
    • All other traffic will use the eth0 interface.

    For simplicity, we’ll focus on the eth0 and eth2 configurations and not the other Docker/ec2 metadata entries. This shows us the following configuration:

    This default setting uses the internet network interface (eth0) for all traffic except for the CIDR range for the customer attached VPC (eth2). This setting sometimes needs to be overwritten when interacting with either on-premises or peered-VPC resources.

  3. Customer attached VPC without direct internet access.
    IMPORTANT NOTE: In this configuration, the notebook instance can still be configured to access the internet. The network interface that gets launched only has a private IP address. What that means is that it needs to either be in a private subnet with a NAT or to access the internet back through a virtual private gateway. If launched into a public subnet, it won’t be able to speak to the internet through an internet gateway (IGW).

Common customer patterns

Accessing on-premises resources from an Amazon SageMaker instance with direct internet access:

Suppose we have the following configuration:

If we try to access the on-premises resource in the CIDR range, it will get routed by the OS through the eth0 internet interface. This interface doesn’t have the connection back on-premises and won’t allow us to communicate with on-premises resources.

To route back on premises, we’ll want to update the route table to have the following:

To do this, we can perform the following commands from a terminal on the Amazon SageMaker notebook instance:

Now if we look at the route table by entering “route -n” we see the route:

We see a route for with mask (which is the same as going through the VPC routing IP address (

There is still one issue.

If we restart the notebook the changes won’t persist.  Only the changes made to the ML storage volume are persisted with a stop/start. Generally, for the package and files to be persisted, they need to be under “/home/ec2-user/SageMaker”. In this case, we’ll use a different feature: lifecycle configuration to add the route any time the notebook starts.

We can create a lifecycle policy as shown in the following diagram:

Now we can create our notebooks with this lifecycle configuration:

With this setup when the notebook gets created or when it’s stopped and restarted, we will have the networking configuration we expect:


Amazon SageMaker Jupyter notebooks are used to perform advanced data exploration, create model training jobs, deploy models to Amazon SageMaker hosting, and test or validate your models.  This drives the need for the notebook instances to have various networking configurations available to it.  Knowing how these configurations can be adapted allows you to integrate with existing resources in your organization and enterprise.

About the Author

Ben Snively is an AWS Public Sector Specialist Solutions Architect. He works with government, non-profit, and education customers on big data/analytical and AI/ML projects, helping them build solutions using AWS.




Amazon SageMaker Batch Transform now supports Amazon VPC and AWS KMS-based encryption

Amazon SageMaker now supports running Batch Transform jobs in Amazon Virtual Private Cloud (Amazon VPC) and using AWS Key Management Service (AWS KMS). Amazon VPC allows you to control access to your machine learning (ML) model containers and data so that they are private and aren’t accessible over the internet. AWS KMS enables you to encrypt data on the storage volume attached to the ML compute instances that run the Batch Transform job. This ensures that the model artifacts, logs, and temporary files used in your Batch Transform jobs are always secure. In this blog, I show you how to apply these features to Batch Transform jobs.

Amazon SageMaker Batch Transform is ideal for scenarios where you have large batches of data, need to pre-process and transform training data, or don’t need sub-second latency. You can use Batch Transform on a range of data sets, from petabytes of data to very small data sets. Existing machine learning models work seamlessly with this new capability, without any changes. Amazon SageMaker manages the provisioning of resources at the start of Batch Transform jobs. It then releases them when the jobs are complete, so you pay only for the resources used during the execution of the jobs.

Using a VPC helps to protect your model containers and the AWS resources they access such as the Amazon S3 buckets where you store your data and model artifacts because you can configure your VPC so that it is private and not connected to the internet. When you use a VPC, you can also monitor all network traffic in and out of your model containers by using VPC flow logs. If you don’t specify a VPC, Amazon SageMaker will run Batch Transform jobs in a VPC by default.

Amazon SageMaker Batch Transform already supports Amazon S3 SSE encryption of input and output data. Now, using AWS KMS, you can protect the storage volumes used during Batch Transform jobs with the encryption keys that you control. You can take advantage of AWS KMS features, such as centralized key management, key usage audit logging, and master key rotation, when you are running inferences or transforming batches of data. You can create, import, rotate, disable, delete, define usage policies for, and audit the use of encryption keys used to encrypt your data. AWS KMS is also integrated with AWS CloudTrail to provide you with logs of all key usage to help meet your regulatory and compliance needs.

How to create a Batch Transform job

Let’s take a look at how we run a Batch Transform job for the built-in Object Detection algorithm. I followed this example notebook to train my object detection model.

Now I’ll go to the Amazon SageMaker console and open the Models page to create a model and associate my private subnets and security groups.

From there I can create a model, using the object detection image and the model artifact generated from my training.

Here I name my model and specify the IAM role that provides the required permissions. Here I also specify the subnets and security groups in my private VPC to control access to my container and Amazon S3 data. As shown in the following screenshot, after I choose the VPC, Amazon SageMaker will auto-populate the subnets and security groups in that VPC. I have provided multiple subnets and security groups from my private VPC in this example. Amazon SageMaker Batch Transform will choose a single subnet to create elastic network interfaces (ENIs) in my VPC and attaches them to the model containers. These ENIs provide your model containers with a network connection so that my Batch Transform jobs can connect to resources in my private VPC without going over the internet.

Additionally, Amazon SageMaker will associate all security groups that I provide with the network interfaces created in my private VPC.

Separately, I have created a S3 VPC endpoint that allows my model containers to access the Amazon S3 bucket where my input data is stored, even if internet access is disabled. I recommend that you also create a custom policy that allows only requests from your private VPC to access to your S3 buckets. For more information, see Endpoints for Amazon S3.

I now proceed with specifying the inference container image Amazon Elastic Container Registry (Amazon ECR location) and the location of my model artifact from my object detection training job.

That’s it! We now have a model that is configured to run securely within a VPC.

Now, I will create a new Batch Transform job using the previously trained object detection model on the Selfies Dataset. Let’s go the Batch Transform page in the Amazon SageMaker console.

First, open the Amazon SageMaker console, choose Batch Transform, and choose Create Batch Transform job.

In the Batch Transform job configuration window, complete the form as shown in the following example. For more information about completing this form, see Using Batch Transform (Console)

Note that I have also specified the encryption key, using the new AWS KMS feature. Amazon SageMaker uses it to encrypt data on the storage volumes attached to the ML compute instances that run the Batch Transform job.

Next specify the input location. I can use either a manifest file or load all the files in an S3 location. Because I’m dealing with images, I specified my input content-type.

I also created a VPC endpoint that allows my model containers to access the Amazon S3 bucket where my input data is stored, even offline. I also recommend that you create a custom policy that allows requests only from your private VPC to access to your S3 buckets. For more information, see Endpoints for Amazon S3.

Finally, I’ll configure the output location and start the job, as shown in the following example. As mentioned before, you can specify a KMS encryption key for output, so that your Batch Transform outputs are encrypted server side with S3 KMS SSE.

After you start your transform job, you can open the job details page and follow the links to the metrics and the logs in Amazon CloudWatch.

Using Amazon CloudWatch, I can see that the transform job is running. If I look at my results in Amazon S3 I can see the predicted labels for each image, as shown in the following example:

The transform job creates a JSON file for each input file that contains the detected objects.

In your S3 bucket, it’s efficient to create a table for a bucket in AWS Glue. You then either query the results with Amazon Athena or visualize them using Amazon QuickSight.

It’s also possible to start transform jobs programmatically using the high-level Python library or the low-level SDK (boto3). For more information about how to use Batch Transform using your own containers, see Getting Inferences by Using Amazon SageMaker Batch Transform.

Batch Transform is ideal for use cases when you have large batches of data that need predictions, don’t need sub-second latency, or need to pre-process and transform training data. In this blog, we discussed how you can use Amazon VPC and AWS KMS to increase the security of your Batch Transform jobs. Amazon VPC and AWS KMS for Batch Transform are available in all AWS Regions where Amazon SageMaker is available. For more information, see the Amazon SageMaker developer guide.

About the Authors

Urvashi Chowdhary is a Senior Product Manager for Amazon SageMaker. She is passionate about working with customers and making machine learning more accessible. In her spare time, she loves sailing, paddle boarding, and kayaking.




Jeffrey Geevarghese is a Senior Engineer in Amazon AI where he’s passionate about building scalable infrastructure for deep learning. Prior to this he was working on machine learning algorithms and platforms and was part of the launch teams for both Amazon SageMaker and Amazon Machine Learning.

Use AWS DeepLens to give Amazon Alexa the power to detect objects via Alexa skills

People are using Alexa for all types of activities in their homes, such as checking their bank balances, ordering pizza, or simply listening to their music from their favorite artists. For the most part, the primary interaction with the Echo has been your voice. In this blog post, we’ll show you how to build a new Alexa skill that will integrate with AWS DeepLens so when you ask “Alexa, what do you see?” Alexa returns objects detected by the AWS DeepLens device.

Object detection is an important topic in the AI deep learning world. For example, in autonomous driving, the camera on the vehicle needs to be able to detect objects (people, cars, signs, etc.) on the road first before making any decisions to turn, slow down, or stop.

AWS DeepLens was developed to put deep learning in the hands of developers. It ships with a fully programmable video camera, tutorials, code, and pre-trained models. It was designed so that you can have your first Deep Learning model running on the device within about 10 minutes after opening the box. For this blog post we’ll use one of the built-in object detection models included with AWS DeepLens. This enables AWS DeepLens to perform real-time object detection using the built-in camera. After the device detects objects, it sends information about the objects detected to the AWS IoT platform.

We’ll also show you how to push this data into an Amazon Kinesis data stream, and use Amazon Kinesis Data Analytics to aggregate duplicate objects detected in the stream and push them into another Kinesis data stream. Finally, you’ll create a custom Alexa skill with AWS Lambda to retrieve the detected objects from the Kinesis stream and have Alexa verbalize the result back to the user.

Solution overview

The following diagram depicts a high-level overview of this solution.

Amazon Kinesis Data Streams

You can use Amazon Kinesis Data Streams to build your own streaming application. This application can process and analyze real-time, streaming data by continuously capturing and storing terabytes of data per hour from hundreds of thousands of sources.

Amazon Kinesis Data Analytics

Amazon Kinesis Data Analytics provides an easy and familiar standard SQL language to analyze streaming data in real time. One of its most powerful features is that there are no new languages, processing frameworks, or complex machine learning algorithms that you need to learn.

AWS Lambda

AWS Lambda lets you run code without provisioning or managing servers. With AWS Lambda, you can run code for virtually any type of application or backend service – all with zero administration. Just upload your code and AWS Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web app, mobile app, or, in this case, an Alexa skill.

Amazon Alexa

Alexa is the Amazon cloud-based voice service available on tens of millions of devices from Amazon and third-party device manufacturers. With Alexa, you can build natural voice experiences that offer customers a more intuitive way to interact with the technology they use every day. Our collection of tools, APIs, reference solutions, and documentation make it easy for anyone to build with Alexa.

Solution summary

The following is a quick walkthrough of the solution that’s illustrated in the diagram:

  1. First set up the AWS DeepLens device and download the object detection model onto the device. Then it will load the model, perform local inference, and send detected objects as MQTT messages to the AWS IoT platform.
  2. The MQTT messages are then sent to a Kinesis data stream by configuring an IoT rule.
  3. By using Kinesis Data Analytics on the Kinesis data stream, detected objects are aggregated and put into another Kinesis data stream for the Alexa custom skill to query.
  4. Upon the user’s request, the custom Alexa skill will invoke an AWS Lambda function, which will query the final Kinesis data stream and return list of objects detected by the AWS DeepLens device.

Implementation steps

The following sections walk through the implementation steps in detail.

Setting up DeepLens and deploying the built-in object detection model

  1. Open the AWS DeepLens console.
  2. Register your AWS DeepLens device if it’s not registered. You can follow this link for a step by step guide to register the device.
  3. Choose Create new project on the Projects page. On the Choose project type page, choose the Use a project template option, and select Object detection in the Project templates section.
  4. Choose Next to move to the Review
  5. In the Project detail page, give the project a name and then choose Create.
  6. Back on the Projects page, select the project that you created earlier, and click Deploy to device.
  7. Make sure the AWS DeepLens device is online. Then select the device to deploy, and choose Review. Review the deployment summary and then choose Deploy.
  8. The page will redirect to device detail page and at the top the page, deployment status is displayed. This process will take a few minutes, wait until the deployment is successful.
  9. After the deployment is complete, on the same page, scroll to the device details section and copy the MQTT topic from AWS IoT that the device is registered to. As mentioned, any object detected by the AWS DeepLens device will send the information to this topic.
  10. In the same section, choose the AWS IoT console link.
  11. On the AWS IoT MQTT Client test page, paste the MQTT topic copied earlier and choose Subscribe to topic.


  1. You should now see detected object messages flowing on the screen.

Setting up Kinesis Streams and Kinesis Analytics

  1. Open the Amazon Kinesis Data Streams console.
  2. Create a new Kinesis Data Stream. Give it a name that indicates it’s for raw incoming stream data—for example, RawStreamData. For Number of shards, type 1.
  3. Go back to the AWS IoT console and in the left navigation pane choose Act. Then choose Create to set up a rule to push MQTT messages from the AWS DeepLens device to the newly created Kinesis data stream.
  4. On the create rule page, give it a name. In the Message source section, for Attribute enter *. For Topic filter, enter the DeepLens device MQTT topic. Choose Add action.
  5. Choose Sends message to an Amazon Kinesis Stream, then click Configuration action. Select the Kinesis data stream created earlier, and for Partition key type ${newuuid()} in the text box. Then choose Create a new role or Update role and choose Add action. Choose Create rule to finish the setup.
  6. Now that the rule is set up, messages will be loaded into the Kinesis data stream. Now we can use Kinesis Data Analytics to aggregate the data and load the result to the final Kinesis data stream.
  7. Open the Amazon Kinesis Data Streams console.
  8. Create another new Kinesis Data Stream (follow instruction steps 1 and 2). Give it a name that indicates that it’s for aggregated incoming stream data—for example, AggregatedDataOutput. For Number of shards, type 1.
  9. In the Amazon Kinesis Data Analytics console, create a new application.
  10. Give it a name and choose Create application. In the source section, choose Connect stream data.
  11. Select Kinesis stream as Source and select the source stream created in step 2. Choose Discover schema to let Kinesis Data Streams to auto-discover the data schema.
  12. The Object Detection model deployment can detect up to 20 objects. However, AWS DeepLens might only detect a few of them depending on what’s in front of the camera. For example, in the screenshot below, the device only detects a chair and a person. Therefore, Kinesis auto discovery only detects the two objects as columns. You can add the other 18 objects manually to the schema by choosing the Edit schema button.
  13. On the schema page, add the rest of the objects then choose Save schema and update stream. Wait for the update to complete then click Exit (done).
  14. Scroll down to the bottom of the page then choose Save and continue.
  15. Back on the application configuration page, choose Go to SQL editor.
  16. Copy and paste the following SQL statement to the Analytics SQL window, and then choose Save and run SQL. After SQL finishes saving and running, choose Close at the bottom of the page. The SQL script aggregates each object detected in a 10 second tumbling window and stores them in the destination stream.
    CREATE OR REPLACE STREAM "TEMP_STREAM" ("objectName" varchar (40), "objectCount" integer);
    -- Creates an output stream and defines a schema
    CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ("objectName" varchar (40), "objectCount" integer);
    SELECT STREAM 'person', COUNT(*) AS "objectCount" FROM "SOURCE_SQL_STREAM_001"
    -- Uses a 10-second tumbling time window
    SELECT STREAM 'tvmonitor', COUNT(*) AS "objectCount" FROM "SOURCE_SQL_STREAM_001"
    -- Uses a 10-second tumbling time window
    SELECT STREAM 'sofa', COUNT(*) AS "objectCount" FROM "SOURCE_SQL_STREAM_001"
    -- Uses a 10-second tumbling time window
    SELECT STREAM 'dog', COUNT(*) AS "objectCount" FROM "SOURCE_SQL_STREAM_001"
    -- Uses a 10-second tumbling time window
    SELECT STREAM 'chair', COUNT(*) AS "objectCount" FROM "SOURCE_SQL_STREAM_001"
    -- Uses a 10-second tumbling time window
    SELECT STREAM 'cat', COUNT(*) AS "objectCount" FROM "SOURCE_SQL_STREAM_001"
    -- Uses a 10-second tumbling time window
    SELECT STREAM 'bottle', COUNT(*) AS "objectCount" FROM "SOURCE_SQL_STREAM_001"
    -- Uses a 10-second tumbling time window
    SELECT STREAM "objectName", count(*) as "objectCount" FROM "TEMP_STREAM"
    having count(*) >1;

  17. Back on application configuration page, choose Connect to a destination. In the Analytics destination section, make sure the Kinesis data stream (for example, AggregatedDataOutput) created in step 8 is selected, and enter DESTINATION_SQL_STREAM as the In-application stream name and select JSON as the output format of. Note that you can also easily have Kinesis Data Analytics send data directly to a Lambda function. That Lambda function would write to other data stores, such as Amazon DynamoDB. Then have the Alexa read from DynamoDB upon user request.
  18. Messages are now aggregated and loaded into the final Kinesis data stream (for example, AggregatedDataOutput) that was created in step 8. Next, you will create an Alexa custom skill with AWS Lambda.

Creating Alexa custom skill with AWS Lambda

  1. Open the AWS Lambda console and create a new function.
  2. The easiest way to create an Alexa skill is to create the function from the existing blue print provided by AWS Lambda and then overwrite the code with your own.
  3. It is a security best practice to enable Alexa Skill ID when using a Lambda function. If you have not created a skill for this yet, you can disable it for now and then re-enable it later by adding another Alexa trigger to the Lambda function.
  4. Copy the following Python code and replace the sample code provided by the blueprint. This code reads data from the Kinesis data stream and returns the result back to Alexa. Note: Change the default Northern Virginia AWS Region (us-east-1) in the code if you are following along in other Regions. Look for the following code to change region, kinesis = boto3.client(‘kinesis’, region_name=’us-east-1′)
    from __future__ import print_function
    import boto3
    import time
    import json
    import datetime   
    from datetime import timedelta
    def build_speechlet_response(title, output, reprompt_text, should_end_session):
        return {
            'outputSpeech': {
                'type': 'PlainText',
                'text': output
            'card': {
                'type': 'Simple',
                'title': "SessionSpeechlet - " + title,
                'content': "SessionSpeechlet - " + output
            'reprompt': {
                'outputSpeech': {
                    'type': 'PlainText',
                    'text': reprompt_text
            'shouldEndSession': should_end_session
    def build_response(session_attributes, speechlet_response):
        return {
            'version': '1.0',
            'sessionAttributes': session_attributes,
            'response': speechlet_response
    def get_welcome_response():
        """ If we wanted to initialize the session to have some attributes we could
        add those here
        session_attributes = {}
        card_title = "Welcome"
        speech_output = "Hello, I can see now. You can ask me what I see around you right now."
        reprompt_text = "You can ask me what I see around you right now."
        should_end_session = False
        return build_response(session_attributes, build_speechlet_response(
            card_title, speech_output, reprompt_text, should_end_session))
    def handle_session_end_request():
        card_title = "Session Ended"
        speech_output = "Ok, I will close my eyes now. Have a nice day! "
        should_end_session = True
        return build_response({}, build_speechlet_response(
            card_title, speech_output, None, should_end_session))
    def create_object_names_attributes(objectNames, sequenceNumber):
        return {"objectNames": objectNames, "sequenceNumber":sequenceNumber}
    def get_objects(intent, session):
        session_attributes = {}
        reprompt_text = None
        kinesis = boto3.client('kinesis', region_name='us-east-1')
        shard_id = 'shardId-000000000000'
        if session.get('attributes', {}) and "objectNames" in session.get('attributes', {}):
            objectNames = session['attributes']['objectNames']
        if session.get('attributes', {}) and "sequenceNumber" in session.get('attributes', {}):
                sequenceNumber = session['attributes']['sequenceNumber']
        if len(objectNames) ==0:
            if len(sequenceNumber)==0:   
                delta = timedelta(minutes=40)
       - delta   
                shard_it = kinesis.get_shard_iterator(
                print("GOT SEQUENCE NUMBER, GETTING OBJECTS DETECTED - " + sequenceNumber)
                shard_it = kinesis.get_shard_iterator(
            while len(objectNames)<5:
                out = kinesis.get_records(ShardIterator=shard_it, Limit=2)
                shard_it = out["NextShardIterator"]
                if (len(out["Records"])) > 0:
                    theObject = json.loads(out["Records"][0]["Data"])
                    sequenceNumber = out["Records"][0]["SequenceNumber"]
                    #print ('sequenceNumber is ' + sequenceNumber)
                    objectName = theObject["objectName"]
                    objectCount = theObject["objectCount"]
                    if not (objectName in objectNames):
                    speech_output = "I currently do not see anything, is my deeplens on?"
        if len(objectNames) >0:
            objectName = objectNames[0]
            session_attributes = create_object_names_attributes(objectNames, sequenceNumber)
            speech_output = "I see " + objectName + "."
        # Setting reprompt_text to None signifies that we do not want to reprompt
        # the user. If the user does not respond or says something that is not
        # understood, the session will end.
        return build_response(session_attributes, build_speechlet_response(
            intent['name'], speech_output, reprompt_text, should_end_session))
    # --------------- Events ------------------
    def on_session_started(session_started_request, session):
        """ Called when the session starts """
        print("on_session_started requestId=" + session_started_request['requestId']
              + ", sessionId=" + session['sessionId'])
    def on_launch(launch_request, session):
        """ Called when the user launches the skill without specifying what they
        print("on_launch requestId=" + launch_request['requestId'] +
              ", sessionId=" + session['sessionId'])
        # Dispatch to your skill's launch
        return get_welcome_response()
    def on_intent(intent_request, session):
        """ Called when the user specifies an intent for this skill """
        print("on_intent requestId=" + intent_request['requestId'] +
              ", sessionId=" + session['sessionId'])
        intent = intent_request['intent']
        intent_name = intent_request['intent']['name']
        # Dispatch to your skill's intent handlers
        if intent_name == "WhatDoYouSee":
            return get_objects(intent, session)
        elif intent_name == "AMAZON.HelpIntent":
            return get_welcome_response()
        elif intent_name == "AMAZON.CancelIntent" or intent_name == "AMAZON.StopIntent":
            return handle_session_end_request()
            raise ValueError("Invalid intent")
    def on_session_ended(session_ended_request, session):
        """ Called when the user ends the session.
        Is not called when the skill returns should_end_session=true
        print("on_session_ended requestId=" + session_ended_request['requestId'] +
              ", sessionId=" + session['sessionId'])
        # add cleanup logic here
    # --------------- Main handler ------------------
    def lambda_handler(event, context):
        """ Route the incoming request based on type (LaunchRequest, IntentRequest,
        etc.) The JSON body of the request is provided in the event parameter.
        print("event.session.application.applicationId=" +
        Uncomment this if statement and populate with your skill's application ID to
        prevent someone else from configuring a skill that sends requests to this
        # if (event['session']['application']['applicationId'] !=
        #         "[unique-value-here]"):
        #     raise ValueError("Invalid Application ID")
        if event['session']['new']:
            on_session_started({'requestId': event['request']['requestId']},
        if event['request']['type'] == "LaunchRequest":
            return on_launch(event['request'], event['session'])
        elif event['request']['type'] == "IntentRequest":
            return on_intent(event['request'], event['session'])
        elif event['request']['type'] == "SessionEndedRequest":
            return on_session_ended(event['request'], event['session'])


Setting up an Alexa custom skill with AWS Lambda

  1. Open the Amazon Alexa Developer Portal, and choose Create a new custom skill.
  2. You can upload the JSON document in the Alexa Skill JSON Editor to automatically configure intents and sample utterances for the custom skill. Be sure to click Save Model to apply the changes.
        "interactionModel": {
            "languageModel": {
                "invocationName": "deep lens demo",
                "intents": [
                        "name": "WhatDoYouSee",
                        "slots": [],
                        "samples": [
                            "anything else",
                            "do you see anything else",
                            "what else do you see",
                            "what else",
                            "how about now",
                            "what do you see around me",
                            "deep lens",
                            "what do you see",
                            "tell me what you see",
                            "what are you seeing",
                            "yes please"
                        "name": "AMAZON.HelpIntent",
                        "samples": []
                        "name": "AMAZON.StopIntent",
                        "samples": []
                "types": []

  3. Finally, in the endpoint section, enter the AWS Lambda function’s Amazon Resource Name (ARN) that’s created. Your Alexa Skill is set and ready to tell you the objects detected by the AWS DeepLens device. You can wake Alexa up by saying “Alexa, open Deep Lens demo” and then ask Alexa questions such as “What do you see?” Alexa should return an answer such as “I see a person” or “I see a chair,” etc.


In this blog post, you learned how to set up the AWS DeepLens device and deploy the built-in object detection model from the DeepLens console. Then, you could use AWS DeepLens to perform real-time object detection. Objects detected on the AWS DeepLens device were sent to the AWS IoT platform which then forwarded them to an Amazon Kinesis data stream. You also learned how to use Amazon Kinesis Data Analytics to aggregate duplicate objects detected in the stream and push them into another Kinesis data stream. Finally, you created a custom Alexa skill with AWS Lambda to retrieve the detected objects in the Kinesis data stream and return the result back to the users from Alexa.

About the authors

Tristan Li is a Solutions Architect with Amazon Web Services. He works with enterprise customers in the US, helping them adopt cloud technology to build scalable and secure solutions on AWS.





Grant McCarthy is an Enterprise Solutions Architect with Amazon Web Services, based in Charlotte, NC. His primary role is assisting his customers move workloads to the cloud securely and ensuring that the workloads are architected in way that aligns with AWS best practices.

Amazon Comprehend introduces new Region availability and language support for French, German, Italian, and Portuguese

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. The service does the following for you:

  • Identifies the language of the text.
  • Extracts key phrases, places, people, brands, or locations.
  • Understands how positive or negative the text is.
  • Analyzes text using tokenization and parts of speech.
  • Automatically organizes a collection of text files by topic.

In today’s global marketplace, many companies require the ability to engage with their customers in a variety of languages, and Amazon Comprehend makes it easy to analyze those interactions. Today, we are pleased to announce that Amazon Comprehend supports four additional languages: French, German, Italian, and Portuguese. AWS customers can now analyze feedback and articles, and they can even organize information with native support for these languages.

Starting today, the service is also available in the AWS Europe (Frankfurt) Region. We also recently announced availability in the Asia Pacific (Sydney) Region. Amazon Comprehend continues to expand its regional availability enabling more AWS customers to analyze text in their native language, co-located with their data. To view an entire list of available Regions in which Amazon Comprehend is available, see our Region Table.

To get started, Comprehend provides a simple, no-code portal that allows customers to easily try the APIs with their own text, and now in additional languages. The following example shows the analysis of text containing named entities, in Italian, in the Amazon Comprehend console:

Amazon Comprehend provides synchronous and asynchronous APIs that allow you to analyze text based on the use cases that are best for your application. The following example shows a CLI call analyzing Italian text for the named entities:

Visit the AWS Management Console today to get started. Find out more information about free tier and pricing from this page.

About the Author

Woo Kim is a Product Marketing Manager for AWS machine learning services. He spent his childhood in South Korea and now lives in Seattle, WA. In his spare time, he enjoys playing volleyball and tennis.






Track the number of coffees consumed using AWS DeepLens

AWS DeepLens is a deep-learning-enabled video camera for developers. It enables you to expand your deep learning skillsets through the use of a fully programmable video camera, tutorials, code, and pre-trained models.

The goal with this blog post is to show you how to get started with the AWS DeepLens and how this device facilitates the introduction of IoT and Deep Learning, putting it the hands of developers. In this blog post, we’ll show you how to build a simple face detection application that counts the number of cups of coffee that people drink and displays the tally on a leaderboard.

We will go through the following steps:

  • Step 1: Deploy a sample project
  • Step 2: Change the inference AWS Lambda function
  • Step 3: Create a coffee detection backend
  • Step 4: Deploy the app to AWS Elastic Beanstalk

Project Overview

Let’s review the following architectural diagram for the project. The AWS DeepLens device enables you to run deep learning on the edge. It detects a scene and runs it against a face detection model.

When the model detects a face, it uploads a frame to Amazon S3. An AWS Lambda function then runs the frame against AWS Rekognition to detect a mug in the scene and check if a face has been detected before or if is it a new face. After a face is registered or recognized, it’s stored in Amazon DynamoDB, which is used as an incremental counter for a web application.

Following this post, you’ll be able to replicate the architecture and get the necessary information to build an application like this.

Step 1: Deploy a sample project

To deploy the project you first need to register the AWS DeepLens device, if you haven’t already. See Register Your AWS DeepLens Device for more information.

For the project type, make sure Use a project template is highlighted and select Face detection from the project templates.

From there you can specify the project name and add a description, leave everything else at the default, and choose Create.

2) After the project is created, we need to deploy it to the AWS DeepLens device. On the Projects page, select your project name, and then choose Deploy to device. On the target device page, select your registered AWS DeepLens device.

Choose Review.

After you have reviewed, finalize by choosing Deploy.

3) Navigate to the IAM console. Add permissions for AmazonS3FullAccess to AWSDeepLensGreengrassGroupRole.

Next you need to make sure that the project was successfully deployed. Connect the AWS DeepLens device to a monitor, mouse, and keyboard. Sign in to the device using the password that you set when you registered the device.

Start your terminal and run the following command:

mplayer —demuxer lavf -lavfdopts format=mjpeg:probesize=32 /tmp/results.mjpeg

This command shows the project stream. You will now see each frame being run against the model as the inference Lambda function is processed.

Step 2: Change the inference Lambda function

After you deploy the sample project, you need to change the inference Lambda function running on the device to upload images to Amazon S3. For this use case, we also added some messages on the screen to make the process more intuitive.

1) Create an Amazon S3 bucket where the images will be uploaded. Use the default settings when setting up the bucket and choose the same AWS Region as the rest of your infrastructure.

2) Go to the AWS Lambda console and open the deeplens-face-detection function. Remove the function code and replace it with the lambda_inference code here. (Replace the bucket_name variable with your bucket name.)

With this step, we are changing the code to upload images to Amazon S3 when a face is detected. Plus we are adding features such as a cooldown period between uploads and a countdown before taking a picture.

3) Save the AWS Lambda function and publish a new version of the code. This allows you to go to your AWS DeepLens project and update the function on the device.

Access your project in the AWS DeepLens console and edit the project updating the version of your AWS Lambda function and the timeout to 600 seconds:

4) Redeploy the project to the device by selecting the project and choosing Deploy.

You should also install botocore to the AWS DeepLens by using this command:

sudo -H pip install botocore

This allows the frame to be uploaded to S3.

Step 3: Create a coffee detection backend

To recognize faces, we will use Amazon Rekognition collections. A collection is a container for persisting faces detected by the IndexFaces API action. Amazon Rekognition doesn’t store copies of the analyzed images. Instead, it stores face feature vectors as the mathematic representation of a face within the collection.

You can use the facial information in a collection to search for known faces in images, stored videos, and streaming videos.

To create a collection you will first need to configure the AWS CLI:

  1. Install the AWS CLI.
  2. Configure the AWS CLI.

1) Create an empty Amazon Rekognition collection using this CLI command:

aws rekognition create-collection --collection-id "Faces" --region us-east-1 

2) Now, we are going to create an Amazon DynamoDB table for storing unique face feature vectors generated by Amazon Rekognition and the number of coffees each person had.

DynamoDB works well for this use case. As a fully managed service, you don’t need to worry about the elasticity and scalability of the database because there is no limit to the amount of data that can be stored in a DynamoDB table. As the size of the data set grows, DynamoDB automatically spreads the data over sufficient machine resources to meet storage requirements. If you weren’t using DynamoDB, the incremental count added to the table would require you to scale accordingly. As for pricing, with DynamoDB you only pay for the resources you provision. For this use case, though, it is quite possible to remain within the AWS Free Tier pricing model or have the project running at a low DynamoDB price point.

To create the table in DynamoDB, in the AWS Management Console, navigate to the DynamoDB console and create a table. Use Faces as the table name and faceID as the primary key, and leave the other settings as defaults.

We’ll also create a table named logs for storing the logs of your Lambda function. For this table use unixtime as the primary key.

3) Create an AWS Lambda function that calls Amazon Rekognition. First, go to the IAM console and create a role for AWS Lambda function. Apply the following managed policies to this role:

  • AmazonRekognitionFullAccess
  • AmazonDynamoDBFullAccess
  • AmazonS3FullAccess
  • AWSLambdaExecute

You should follow AWS IAM best practices for production implementations.

4) Finally, navigate to the Lambda console. Create a Lambda function with Python 3.6 as a runtime. Set the name of the S3 bucket that you configured before as your Lambda trigger. Configure the Event type, Prefix, and Suffix as shown in the following screenshot. This ensures that your Lambda function is triggered only when new .jpg objects that start with a key matching the images/ pattern are created within the bucket.

Replace the template Lambda code with the code you downloaded from GitHub. Modify the Lambda timeout to 1 minute.

Copy the code from the GitHub repository and paste in in the code box. Let’s inspect the Lambda code to understand what it’s doing:

  response = rekognition.detect_labels(Image=image, MaxLabels=123, MinConfidence=50)    
    for object in response["Labels"]:
        if object["Name"] == "Coffee Cup" or object["Name"] == "Cup":
            coffee_cup_detected = True
        message = detect_faces(image, bucket, key)    

This part of code uses AWS Rekognition to detect the labels in the image. It checks if “Cup” or “Coffee Cup” is found in a response. If it finds any of these labels, it calls a face detection function, which searches the face collection to find if there is a matching face:

faces = rekognition.search_faces_by_image(CollectionId=face_collection, Image=image,
                                              FaceMatchThreshold=face_match_threshold, MaxFaces=1)

If no matching faces are found in the collection, the face is indexed and added to the collection:

faces = rekognition.index_faces(Image=image, CollectionId=face_collection)

To test the function, you can upload an image to your S3 bucket and check your DynamoDB table to see the result.

Step 4: Deploy the app to AWS Elastic Beanstalk

Now it’s time to deploy the leaderboard application using AWS Elastic Beanstalk. Elastic Beanstalk automatically orchestrates the required resources needed to deploy the web application. All you have to do is upload the code.

1) Go to the IAM console, and on the IAM roles page, attach the AmazonS3FullAccess and DynamoDBFullAccess managed policies to the aws-elasticbeanstalk-ec2-role. This allows Amazon EC2 instances provisioned by Elastic Beanstalk to access Amazon S3 and Amazon DynamoDB.

2) Go to the AWS Elastic Beanstalk console, and create a new application. Create a new web server environment. Enter a domain name, select Python as a platform, and upload a ZIP file from GitHub that includes the Flask application and requirements.txt file. Wait until Elastic Beanstalk provisions your environment.

The URL of your application should be visible on the top of your screen. Click the URL to view your coffee leaderboard!

Important: Deploying a project incurs costs for the various AWS services that are used.


You are now able to track the number of coffees each individual person drinks. While this project focuses on a simple coffee leaderboard, the backbone of this architecture can be used for any application.

This project showcases the power of the AWS DeepLens device in introducing developers to machine learning and IoT. Using a combination of AWS services, we were able to build this app in a short amount of time, and so can you!

About the Authors

João Coelho is a Solutions Architect at Amazon Web Services in London. He helps customers leverage the AWS platform to build scalable and resilient architectures on the cloud and is especially interested in serverless technologies. Outside of work, he enjoys playing tennis and traveling.




Laurynas Tumosa is a Technical Researcher at AWS in London. He enjoys building on the platform using AWS Machine Learning Services. He is passionate about making AI technologies accessible for everyone. Outside of work, Laurynas enjoys finding new interesting podcasts, playing guitar, and reading.




Lalit Dayalani is a Solution Architect at Amazon Web Services based in London. He helps AWS customers to provide guidance and technical assistance helping them understand and improve the value of their solutions on AWS. In his spare time, he loves spending time with family, going on hikes and spends way too much time indulging in too much television.





Shopper Sentiment: Analyzing in-store customer experience

Retailers have been using in-store video to analyze customer behaviors and demographics for many years.  Separate systems are commonly used for different tasks.  For example, one system would count the number of customers moving through a store, in which part of the store those customers linger and near which products.  Another system will hold the store layout, whilst yet another may record transactions.  Historically, for a retailer to join these data sources to gain insights which could drive more sales by following a strategy has required complex algorithms and data structures that also require significant investment to deliver and incur ongoing maintenance costs.

In this blog post, we will demonstrate how to simplify this process using AWS services (Amazon Rekognition, AWS Lambda, Amazon S3, AWS Glue, AWS Athena and AWS QuickSight) to build an end-to-end solution for in-store video analytics. We will focus on the analysis of still images leveraging an existing loss prevention store camera to produce data for the retail in-store experience.

The following diagram shows the overall architecture and the AWS services involved.

Using the Machine Learning services on AWS like Amazon Rekognition and applying them to motion video or still images from your store, it is possible to derive insights from customer behavior (i.e. which area of the store is frequently visited), demographic segmentation of store traffic (i.e. such as gender or approximate age) while also analyzing patterns of customer sentiment. This practice is already common in the industry, and our proposed solution makes it easier, faster, and more accurate. Sentiment analysis can be used, for example, to get insights into how customers respond to brand content and signage, end cap displays or advertising campaigns while presenting these insights using dashboards similar to the examples shown below.

The overall solution can be decomposed into four main steps, collect, store, process and analyze.  Let’s describe each of these components:


The purpose of this stage is to collect images or motion video of your customers in-store experience from the camera.   This is possible by making use of various cameras such as an existing CCTV or IP Camera system, a (configured) Raspberry Pi with an attached camera module, an AWS DeepLens,  or any other similar camera.   These still images or motion video files are stored in an Amazon S3 bucket for further processing.

For this example, we used a Raspberry Pi with the motion package installed. This package helps to collect images when there is an interesting event that limits the amount of data needed to be processed. This package also detects motion, creates still images in a local folder, and this folder can be easily synced (in a realtime or batch manner) to the input S3 bucket. After installing the AWS Command Line (instructions here), here is one example of syncing the motion folder to an S3 bucket and deleting locally the file after successful synchronisation (need to adapt the destination bucket to your specific bucket).

aws s3 sync /var/lib/motion/ s3://retail_instore_demo_source/`hostname` && sudo find /var/lib/motion/ -type f -mmin +1 -delete


We propose using Amazon S3 object store so we can benefit from its virtually unlimited storage, high availability and event triggering capabilities.  After creating this bucket, we enable the Amazon S3 event notification capability to publish events to AWS Lambda for every new file in the input folder, then an invoked Lambda function will pass the event data (i.e., incoming data as a parameter to be processed).


To process the incoming images, we use AWS Lambda to read the image and use the Amazon Rekognition APIs to gather all of the relevant information provided by Rekognition for each image (such as facial landmarks that include the coordinates for eye and mouth), gender, age, presence of beard, sunglasses etc) and put the resulting information to a Amazon Kinesis Data Firehose which will publish the data to an Amazon S3 bucket. Amazon Kinesis Data Firehose simplifies the data management because it automatically handles the encryption, the folder structure (year/month/day/hour), optional data transformation, and compression.

The resulting dataset is a set of JSON files that contain the output from Rekognition, representing customers captured on these images. For effective querying from S3 we recommend the files to be in columnar format. One option is to use Amazon Firehose data Data Transformation feature, another is to convert the JSON using AWS Lambda or AWS Glue. Querying small datasets of JSON files is fine but as the table grows with thousands of files it will become less optimal. In this demo, we will be using JSON format to keep it simple.


All the resulting information is then stored in a new Amazon S3 bucket.   Per the process step, the information is stored in a JSON format and therefore allows to be queried with Amazon Athena.  Therefore we can use AWS Glue Crawlers to automatically infer the data schema based on the data sitting in S3 and use the shared AWS Glue Catalog for Amazon Athena to query the data.  Amazon Athena is a service that allows you to query data directly in S3 using standard ANSI SQL commands and without needing to spin up any infrastructure.  This allows any data visualization / dashboard tool (i.e. Tableu, Superset, or Amazon QuickSight) to connect to Athena and visualize the data.  For our example, we will show how we can use Amazon QuickSight to create a dashboard for this data.

Build this solution yourself

Now that we have described the components of this solution, let’s bring it all together.

We have provided a CloudFormation template which will deploy all the necessary components shown in the architecture diagram except Amazon QuickSight and also the devices IP Camera / Raspberry Pi.  In the following section, we will explain key parts of the solution and show how we can make sense of all the analyzed data using Amazon QuickSight while building the dashboard manually.

Note: Cloud Formation template is tested only on eu-west-1 (Ireland) region and it may not work in other regions. Some of the resources deployed by the stack incur costs as long as they’re in use.

To get started deploying the CloudFormation template, take these steps:

  1. Choose the Launch Stack button. It automatically launches the AWS CloudFormation service in eu-west-1 region on your AWS account with a template. You’re prompted to sign in if needed.

    Input Value
    Region eu-west-1
    CFN Template
  2. Choose Next to proceed with the following settings.
  3. Specify the name of the CloudFormation stack and the required parameters. The default bucket name on the template might have been used, please change the bucket names to a unique name and click Next.
    A B
    1 Parameter Description
    2 SourceBucketName Unique bucket name where the images files will be uploaded for processing
    3 ProcessedBucketName Unique bucket name where the processed files will be stored
    4 ArchivedBucketName Unique bucket name where the images files will be archived after processing

  4. On the Review page, acknowledge that CloudFormation creates AWS Identity and Access Management (IAM)resources and with custom names as a result of launching the stack.
  5. Custom names are required because the template uses serverless transforms. Choose Create Change Set to check the resources that the transforms add, then choose Execute.
  6. After the CloudFormation stack is launched, wait until the status changes from CREATE_IN_PROGRESS to CREATE_COMPLETE. Usually it takes around 7 to 9 minutes to provision all the required resources.
  7. When the launch is finished, you’ll see a set of resources that we’ll use throughout this blog post:

Test the functionality

Once AWS CloudFormation template has created the required resources, lets use some pre-captured sample images to process, and AWS Glue crawlers to automatically discover the schema of our processed image data.

  1. Download the pre-captured images from the S3 bucket
  2. Open the source bucket (retail-instore-demo-source), and upload an image or multiple images with multiple people in it. For this example, use few of the downloaded images from the step 1 as you could upload other images later each hour to get different time interval graphs.
  3. Lambda will be triggered, image analyzed using Amazon Rekognition and the results will be put in the S3 processed bucket (retail-instore-demo-processed) for further processing by Athena and QuickSight. You can monitor the files being processed either by watching the processed images being dropped in the processed bucket or by monitoring the AWS Lambda executions in the Lambda Monitoring Console.
  4. In order to query using Athena, first we need to create the tables. We will leverage the AWS Glue crawlers which will automatically discover the schema of our data and create the appropriate table definition in AWS Glue Data Catalog. More details can be found in the documentation for Crawlers with AWS Glue.

By launching a stack from the CloudFormation template, we have only created a Crawler and configured it to “run on demand” as it’s charged hourly. Therefore, we need to run the Crawler manually when we have new data sources to construct the data catalog using pre-built classifiers. To do this, Go to AWS Glue, under Crawlers, run the crawler (retail-instore-demo-glue-s3-crawler).

The Crawler connects to the S3 bucket, progresses through a prioritized list of classifiers to determine the schema for your data, and then creates metadata tables in AWS Glue Data Catalog which will be used to query the processed content in S3 using Athena on top of which we will build a QuickSight dashboard.

Note: Make sure you upload pictures at different time and repeat step 2 to 4, so that you can see graph projection at different timelines.

Building QuickSight dashboards

First let’s the configure the QuickSight with the dataset.

  1. Login into the QuickSight console –
  2. If it’s the first time you access QuickSight, follow the instruction below. If not, go to step 3.
    1. click on “Sign up for QuickSight”
    2. select the “Standard” edition
    3. Enter a QuickSight Account name
    4. Enter a valid Email
    5. Select the EU (Ireland) Region
    6. Tick the checkbox next to “Amazon Athena”
    7. Tick the checkbox next to “Amazon S3”
    8. click “choose s3 buckets” and select the retail_instore_demo_processed bucket
  3. Choose Manage data from the top right.
  4. Choose New Data Set.
  5. Create a Data Set from Athena Data Source as “retail-instore-analysis-blog”
  6. Choose the “retail-instore-demo-db” database, select the “retail_instore_demo_processed” table and click on Select
  7. Leave it on default Import to SPICE for quicker analytics and then click Visualize.
  8. Import complete will appear with the total rows imported to SPICE (which is the in-memory storage component used by QuickSight)Now you can easily start to build some dashboards. We will step through creating a dashboard.
  9. We will first add a custom date field, so that we can have graphs with time axis. Click on “Add” and Add calculated field.
  10. Provide a name for the Calculated field name like “DateCalculated”
  11. Select “epochDate” from Function list, select “retailtimestamp” from Fieldlist, and Create.

  12. This should create the calculated field.
  13. Resize the visual windows as necessary, and select “Vertical stacked bar chart” under Visual types.
  14. From Fields list, drag and drop the “DateCalculated” to X axis, “emotion” to Value and “emotion” to Group/Color on the Field wells. Graph will be populated based on the emotions captured from the images.
  15. Click on drop down arrow on “DateCacluated” in X axis of Field wells, and scroll over “Aggregate: Day” and select “Hour”
  16. This graph will display the emotions of people in different timestamps. For example, this can be a dashboard to display emotions captured at Aisle 23 while tracking a new product response on that aisle.  Another example, this allows to better count and qualify customers in front of a specific store endcap.
  17. Lets add few more visuals. Click on “Add” and Add visual.
  18. Select “pie chart” from Visual types and drag and drop “emotion” from fields list to Group/Color on Field wells
  19. You can further enrich you dashboard by adding meaningful heading, and also adding other visual types with fields list as below,
  • add a new Visual, Select “Vertical bar char”, Set X axis as DateCalculated(DAY), Value as emotion(count) in Field wells:


The QuickSight dashboard above shows analysis of image capture events in a store.  For examples, you can see analysis of overall customer sentiment. You could understand the customer age range, and how many people visited each day from these metrics. Once you’re done with your analysis, you can easily share these insights with others in your team or organization by publishing it as a dashboard.

Using Amazon Rekognition you can better segment your customers.  You can obtain feedback on customer experience for a specific area of the store.

Shutting down

When you have finished experimenting, you can remove all the resources by following these steps.

  1. Delete the contents inside all the S3 buckets that the CloudFormation template created.
  2. Delete the CloudFormation stack.

Conclusions and next steps

This post showed you how simple it is to gain insights into customers’ in-store behaviour using AWS. Using your existing cctv systems or any camera, you can quickly build this solution and adapt it to your needs.

Some additional concepts that leverage this solution are:

  • In-store video streams analysis:Using Kinesis Videostreams it’s possible to stream your existing videos, which allows for additional use cases such as capturing the path of customers (and using heatmaps) and where they spent time in the store.
  • Customer loyalty and engagement programs utilizing facial recognition: Another very interesting possibility is the ability to recognize customers who have opted-in and shared a profile photo of themselves as part of a loyalty program, incentive program, or other customer benefit. Using the volunteered data, those customers could be recognized and offered a more elevated and personalized customer experience at a retail location.

If you have questions or suggestions, please comment below.

Additional Reading

About the Authors

Bastien Leblanc is a Solutions Architect with AWS. He helps Retail customers adopt the AWS Platform, focusing on Data & Analytics workloads, he likes to work with customers helping to solve retail problems and drive innovation.





Imran Dawood is a Solutions Architect with AWS. He works with Retail customers helping them build solutions on AWS with architectural guidance to achieve success in the AWS cloud. In his spare time, Imran enjoys playing table tennis and spending time with family.









Next Meetup




Plug yourself into AI and don't miss a beat


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.