Category: Amazon

DXC Technology automates triage of support tickets using AWS machine learning

Written on February 4, 2019. Posted in Amazon.

DXC Technology is a global IT service leader providing end-to-end services on Digital Transformation to businesses and governments. They also provide service management to their clients on-premises and in the cloud. The incident tickets raised as part of the process need to be resolved quickly to meet their service level agreements (SLA). DXC has goals to reduce human effort, reduce incident resolution time, enhance knowledge management, and enhance consistency of incident resolution. With these goals in mind, DXC developed a knowledge management (KM) article prediction mechanism.

In this blog post, we’ll discuss how DXC uses machine learning on AWS to automatically identify a KM article, which in turn can be automated with the orchestration runbook for ticket resolution to make IT support more efficient.

The DXC solution on AWS

First: Build a data lake on Amazon S3

DXC customers submit incident tickets to IT Service Management Tools (ITSM). Tickets can be user generated or machine generated. Then data is pushed or pulled to Amazon S3 buckets. Amazon S3 provides low cost, highly durable object storage that can store any form or format of data.

Second: Choose the right machine learning tool and algorithm

Typically, the problem is how to classify text. AWS offers a variety of choices for customers to do text classifications. DXC evaluated the following AWS services.

Amazon SageMaker with its built-in algorithm called BlazingText.
Amazon Comprehend custom classification.

The Amazon Comprehend custom classification API was good choice since it is built ground-up for text classification. With Amazon Comprehend, we didn’t have to pick an algorithm, tune it and re-train our model looking for the highest accuracy – the API did this automatically. We plan to re-evaluate it when it supports synchronous calls (today it provide batch-mode classification).

Amazon SageMaker BlazingText implements the fastText algorithm and keep the right balance between scalability and accuracy.

Third: Train the model

Training data preparations:

Training the model is the most important part of the ML process. Training of supervised models requires labeled data. The DXC team wanted to label a significant amount of historical data for this purpose. In the pre-processing step, the text data was tokenized using NLTK (Python library) and stored in CSV format in Amazon S3 for the training. The training is done once a month with the historical data.

The tokenized training data looks like this. It is used as input to the training job.

Training job with hyperparameter optimization (HPO)

We use the automatic model tuning feature of Amazon SageMaker to automate and accelerate the search of hyperparameters for the BlazingText algorithm.

Initially, we set static hyperparameters that we don’t need to change across training jobs, and we also define ranges for the hyperparameters that need optimizations.

Note: All the parameter values mentioned in the code below are sample values. You need to test and use your own values based on your requirements.

# set static hyperparameters
hyperparameters = dict(mode="supervised",
                            early_stopping=True,
                            patience=5,
                            min_epochs=30) 

#Set ranges for hyperparameters
hyperparameter_ranges = {
                         'epochs': IntegerParameter(50, 300),
                         'learning_rate': ContinuousParameter(0.005, 0.05),
                         'min_count': IntegerParameter(10, 300),
                         'vector_dim': IntegerParameter(64, 500),
                         'buckets': IntegerParameter(1000000, 10000000),
                         'word_ngrams': IntegerParameter(2, 5)
                        }

Next, we instantiated the estimator and the HPO tuner. Then we triggered the training job using training data available on Amazon S3.

# Instantiating Estimator
bt_model = sagemaker.estimator.Estimator(container,
                                         role, 
                                         train_instance_count=1, 
                                         train_instance_type='ml.XXX',
                                         train_volume_size = 20,
                                         train_max_run = 360000,
                                         input_mode= 'File',
                                         output_path=s3_output_location,
                                         hyperparameters=hyperparameters,
                                         sagemaker_session=sess)


#Setting objective of HPO on maximizing validation accuracy
objective_metric_name = 'validation:accuracy'
objective_type = 'Maximize'

# Setting HPO tuner
tuner = HyperparameterTuner(bt_model,
                            objective_metric_name,
                            hyperparameter_ranges,
                            max_jobs=100,
                            max_parallel_jobs=2,
                            objective_type=objective_type)


# Triggering training using S3 training and validation data

train_data = sagemaker.session.s3_input(s3_train_data, distribution='FullyReplicated', 
                        content_type='text/plain', s3_data_type='S3Prefix')
validation_data = sagemaker.session.s3_input(s3_validation_data, distribution='FullyReplicated', 
                             content_type='text/plain', s3_data_type='S3Prefix')
data_channels = {'train': train_data, 'validation': validation_data}

tuner.fit(inputs=data_channels)

Fourth: Orchestrate data preparation, model training, and model deployment on Amazon SageMaker using AWS Step Functions

We orchestrated this ML workflow using AWS Step Functions, and we scheduled using an Amazon Cloud Watch Events rule.

AWS Step Functions performs the following steps:

It checks that the Amazon S3 bucket exists where input data for training is present.
It pre-processes the data set for model training.
It starts the training job in Amazon SageMaker with the required parameters.
It keeps checking the status of training job.
After the training is successful, it validates the model.
After the model validated, it deploys the model as Amazon SageMaker endpoints. (If the model endpoint exists, then it updates the model endpoint.)

All o f these steps are developed as AWS Lambda functions.

Note: During AWS re:Invent 2018, a new feature was released that allowed Step Functions to be directly integrated with Amazon SageMaker. This feature can be used to develop some of the steps described earlier without writing Lambda functions. However, the feature was not available when DXC developed this solution.

Fifth: Call the inference

As soon as new ITSM tickets get ingested to an Amazon S3 bucket, an AWS Lambda function is triggered to call the inference using Amazon SageMaker endpoints.

The Lambda function reads the ticket number and description from incoming files and creates a payload like the following:

Then, it calls the Amazon SageMaker model endpoint with payload information:

import boto3
import json
#Sagemaker endpoints passed as Lambda Parameter
ENDPOINT_NAME= <SageMaker Model Endpoint>

#Call Endpoints
response=runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,ContentType='application/json',Body=payload)

It creates a CSV output and stores it on Amazon S3. The output looks like the following example. It stores the ticket number, the predicted KB document, and confidence level.

Sixth: Build a CI/CD pipeline to automate the solution deployment

DXC developed a CI/CD pipeline using Ansible, Jenkins, and AWS CloudFormation templates to automate the deployment of the whole solution.

Seventh: Enable it for the support team

After the predictions are generated, they can be accessed using API endpoints based on Incident Identifiers or Incident Descriptions. Incident Descriptions are more suitable for real-time resolution of issues. It’s possible that you don’t even need to create a ticket. The description of an issue when checked against the Amazon SageMaker endpoint results in the output of a KM article identifier that can be referred offline, which might lead to the resolution of the issue. In this scenario, no ticket had to be created.

In the case where ticket has been created, a Service Desk Agent can use a chatbot that makes a call to the API or uses the API directly by providing the Incident Identifier. The output of the Incident Identifier is a KM article identifier. This can be quickly referred to offline for incident resolution, hence reducing the incident resolution time.

And further integration with runbook automation will result in the automation of ticket resolution with little or zero human effort.

The end-to-end solution

The overall architecture looks like this.

Conclusion – What did DXC achieve?

To summarize, the KM article prediction mechanism realized the following benefits:

Improved the support team’s efficiency. The support team can almost instantly know which KM article to be looked at for solving the ticket.
This prediction mechanism also can be used as a self-service tool where users can enter ticket descriptions and get back the KM article to solve their own issue. This will also reduce the number of tickets.
Integration of this mechanism with runbook automation will help automate resolution of tickets too.

About the Authors

Sougata Biswas is a big data architect at AWS Professional Services. He helps AWS customers in architecting and implementing solutions on AWS to get business value out of data.

Sofian Hamiti is a data scientist at Amazon ML Solutions Lab. He helps AWS customers across different industries accelerate their AI and cloud adoption.

Thanks to DXC team who worked on the project. Special thanks to following leaders from DXC who encouraged and reviewed the blog post.

Niladri Chowdhury, Manager of Data Engineering and Analytics Mgr Operations Engineering and Excellence (OE&E) at DXC Tech. He leads a team of Analysts, Data Engineers and Data Scientists to design, build and deploy the best of the class Business Intelligence delivery solutions in cloud

William Giotto, Global Product Owner at DXC Tech. He aligns efforts towards a vision of Intelligent Automation. Full time father, data science enthusiastic and amateur astronomer (www.astrogiotto.com)

Deploy trained Keras or TensorFlow models using Amazon SageMaker

Written on January 29, 2019. Posted in Amazon.

Amazon SageMaker makes it easier for any developer or data scientist to build, train, and deploy machine learning (ML) models. While it’s designed to alleviate the undifferentiated heavy lifting from the full life cycle of ML models, Amazon SageMaker’s capabilities can also be used independently of one another; that is, models trained in Amazon SageMaker can be optimized and deployed outside of Amazon SageMaker (or even out of the cloud on mobile or IoT devices at the edge). Conversely, Amazon SageMaker can deploy and host pre-trained models from model zoos, or other members of your team.

In this blog post, we’ll demonstrate how to deploy a trained Keras (TensorFlow or MXNet backend) or TensorFlow model using Amazon SageMaker, taking advantage of Amazon SageMaker deployment capabilities, such as selecting the type and number of instances, performing A/B testing, and Auto Scaling. Auto Scaling clusters are spread across multiple Availability Zones to deliver high performance and high availability.

Your trained model will need to be saved in either the Keras (JSON and weights hdf5) format or the TensorFlow Protobuf format. If you’d like to begin from a sample notebook that supports this blog post, download it here.

For more on training the model on SageMaker and deploying, refer to this notebook on Github.

Step 1. Set up

In the AWS Management Console, go to the Amazon SageMaker console. Choose Notebook Instances, and create a new notebook instance. Upload the current notebook and set the kernel to conda_tensorflow_p36.

The get_execution_role function retrieves the AWS Identity and Access Management (IAM) role you created at the time of creating your notebook instance.

import boto3, re
from sagemaker import get_execution_role

role = get_execution_role()

Step 2. Load the Keras model using the JSON and weights file

If you saved your model in the TensorFlow ProtoBuf format, skip to “Step 4. Convert the TensorFlow model to an Amazon SageMaker-readable format.”

import keras
from keras.models import model_from_json

!mkdir keras_model

Navigate to keras_model from the Jupyter notebook home, and upload your model.json and model-weights.h5 files (using the “Upload” menu on the Jupyter notebook home). To use a sample model for this exercise download and unzip the files found here, then upload them to keras_model.

!ls keras_model

json_file = open('/home/ec2-user/SageMaker/keras_model/'+'model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)

loaded_model.load_weights('/home/ec2-user/SageMaker/keras_model/model-weights.h5')

print("Loaded model from disk")

Step 3. Export the Keras model to the TensorFlow ProtoBuf format

from tensorflow.python.saved_model import builder
from tensorflow.python.saved_model.signature_def_utils import predict_signature_def
from tensorflow.python.saved_model import tag_constants

# Note: This directory structure will need to be followed - see notes for the next section
model_version = '1'
export_dir = 'export/Servo/' + model_version

# Build the Protocol Buffer SavedModel at 'export_dir'
builder = builder.SavedModelBuilder(export_dir)

# Create prediction signature to be used by TensorFlow Serving Predict API
signature = predict_signature_def(
    inputs={"inputs": loaded_model.input}, outputs={"score": loaded_model.output})

from keras import backend as K

with K.get_session() as sess:
    # Save the meta graph and variables
    builder.add_meta_graph_and_variables(
        sess=sess, tags=[tag_constants.SERVING], signature_def_map={"serving_default": signature})
    builder.save()

Step 4. Convert TensorFlow model to an Amazon SageMaker-readable format

Move the TensorFlow exported model into a directory exportServo. Amazon SageMaker will recognize this as a loadable TensorFlow model. Your directory and file structure should look like this:

!ls export

!ls export/Servo

!ls export/Servo/1

!ls export/Servo/1/variables

Tar the entire directory and upload to Amazon S3

import tarfile
with tarfile.open('model.tar.gz', mode='w:gz') as archive:
    archive.add('export', recursive=True)

import sagemaker

sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')

Step 5. Deploy the trained model

The entry_point file train.py can be an empty Python file. The requirement will be removed at a later date.

!touch train.py

from sagemaker.tensorflow.model import TensorFlowModel
sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
                                  role = role,
                                  framework_version = '1.12',
                                  entry_point = 'train.py')

%%time
predictor = sagemaker_model.deploy(initial_instance_count=1,
                                   instance_type='ml.m4.xlarge')

Note: You need to update the endpoint in the following command with the endpoint name from the output of the previous cell (INFO:sagemaker:Creating endpoint with name sagemaker-tensorflow-2019-01-29-17-36-55-987).

endpoint_name = 'sagemaker-tensorflow-2019-01-29-17-36-55-987'

import sagemaker
from sagemaker.tensorflow.model import TensorFlowModel
predictor=sagemaker.tensorflow.model.TensorFlowPredictor(endpoint_name, sagemaker_session)

Step 6. Invoke the endpoint

Invoke the Amazon SageMaker endpoint from the notebook

import numpy as np

# The sample model expects an input of shape [1,50]
data = np.random.randn(1, 50)
predictor.predict(data)

Invoke the Amazon SageMaker endpoint using a boto3 client

import json
import boto3
import numpy as np
import io
 
client = boto3.client('runtime.sagemaker')
# The sample model expects an input of shape [1,50]
data = np.random.randn(1, 50).tolist()
response = client.invoke_endpoint(EndpointName=endpoint_name, Body=json.dumps(data))
response_body = response['Body']
print(response_body.read())

Step 7. Clean up

To avoid incurring unnecessary charges, use the AWS Management Console to delete the resources that you created for this exercise: https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-cleanup.html

Conclusion

In this blog post, we demonstrated deploying a trained Keras or TensorFlow model at scale using Amazon SageMaker, independent of the computing resource used for model training. This gives you the flexibility to use your existing workflows for model training, while easily deploying the trained models to production with all the benefits offered by a managed platform. These benefits include the ability to select the optimal type and number of deployment instances, perform A/B testing, and auto scale. The Auto Scaling clusters of Amazon SageMaker ML instances can be spread across multiple Availability Zones to deliver both high performance and high availability.

About the author

Priya Ponnapalli is a principal data scientist at Amazon ML Solutions Lab, where she helps AWS customers across different industries accelerate their AI and cloud adoption.

Thoughts on Recent Research Paper and Associated Article on Amazon Rekognition

Written on January 25, 2019. Posted in Amazon.

A research paper and associated article published yesterday made claims about the accuracy of Amazon Rekognition. We welcome feedback, and indeed get feedback from folks all the time, but this research paper and article are misleading and draw false conclusions. This blog post shares details which we hope will help clarify several ‎misperceptions and inaccuracies.

People often think of accuracy as an absolute measure, such as a percentage score on a math exam, where each answer is either right or wrong.‎ To understand, interpret, and compare the accuracy of machine learning systems, it’s important to understand what is being predicted, the confidence of the prediction, and how the prediction is to be used, which is impossible to glean from a single absolute number or score.

What is being predicted: Amazon Rekognition provides two distinct face capabilities using a type of machine learning called computer vision. The first capability is facial analysis—for a particular image or video, the service can tell you where a face appears, and certain characteristics of the image (such as if the image contains a smile, glasses, mustache, or the gender of a face). These attributes are usually used to help search a catalog of photographs. The second capability of Amazon Rekognition is commonly known as facial recognition. It is a distinct and different feature from facial analysis and attempts to match faces that appear similar. This is the same approach used to unlock some phones, or authenticate somebody entering a building, or by law enforcement to narrow the field when attempting to identify a person of interest. In the latter, it’s the modern equivalent of detectives in old movies flicking through books of photos, but much faster.‎

Facial analysis and facial recognition are completely different in terms of the underlying technology and the data used to train them. Trying to use facial analysis to gauge the accuracy of facial recognition is ill-advised, as it’s not the intended algorithm for that‎ purpose (as we state in our documentation).

Confidence: For both facial analysis and facial recognition, Amazon Rekognition also tells you how confident the service is in a specific result. Since all machine learning systems are probabilistic by nature, the confidence score can be thought of as a measure of how much trust the systems place in their results; the higher the confidence number, the more the results can be trusted. It is not possible to interpret the quality of either facial analysis or facial recognition without being transparent and thoughtful about the confidence threshold used to interpret the results. We are not yet aware of the threshold used in this research, but as you will see below, the results are much different when run with the recommended confidence level.

Use case for predictions: Combined with confidence, the intended use of a machine learning prediction is important, as it helps put the accuracy in context. For example, when using facial analysis to search for images containing ‘sunglasses’ in a photo catalog, showing more images in the search results is often desirable, even if there are some that aren’t perfect matches. Because the cost of an imperfect result in this use case is low, people often accept a lower confidence level in exchange for more results and less manual inspection of those results. However, when using facial recognition to identify persons of interest in an investigation, law enforcement should use our recommended 99% confidence threshold (as documented), and only use those predictions as one element of the investigation (not the sole determinant).

With the above context for how to think about ‘tests’ of Amazon Rekognition, we can get to this latest report and its erroneous claims.

The research paper seeks to “expose performance vulnerabilities in commercial facial recognition products,” but uses facial analysis as a proxy.

As stated above, facial analysis and facial recognition are two separate tools; it is not possible to use facial analysis to match faces in the same way as you would in facial recognition. This is not just an issue of semantics or definitions; they are two different features with two different purposes. Facial analysis can only find generic features (such as facial hair, smiles, frowns, gender, and so forth), which are primarily used to help filter and organize images. It has no knowledge of features which make a face unique (and cannot reverse engineer this from the image). In contrast, facial recognition focuses on unique facial features to match faces, and is used to match faces in datasets that customers bring to the service. Using facial analysis to do facial recognition is an inaccurate and unadvised way to identify unique individuals. We explain this in our documentation,‎ and haven’t received a report from a customer who’s been confused on this issue.

The research paper states that Amazon Rekognition provides low quality facial analysis results. This does not reflect our own extensive testing and what we’ve heard from customers using the service.

First, the researchers used an outdated version of Amazon Rekognition. We made a significant set of improvements in November. Second, in a test run by AWS using the latest version of Amazon Rekognition, we ran facial analysis to perform gender classification on more than 12,000 images: a random selection of 1,000 men and 1,000 women across six ethnicities (South Asian, Hispanic, East Asian, Caucasian, African American, and Middle Eastern). Across all ethnicities, we found no significant difference in accuracy with respect to gender classification. In a broader test of facial recognition (which, as we explained earlier, is the logical and recommended way to do facial recognition), we evaluated photos from parliamentary websites with the Megaface dataset of 1 million images using Amazon Rekognition, and found exactly zero false positive matches at the recommended 99% confidence threshold. The research paper in question does not use the recommended facial recognition capabilities, does not share the confidence levels used in their research, and we have not been able to reproduce the results of the study.‎ We’d love to collaborate with these researchers on helping with this research, and more importantly, to help continue improving the state of the art in facial recognition.

Beyond our internal tests or single ‘point in time’ results, we are very interested in working with academics in establishing a series of standardized tests for facial analysis and facial recognition and in working with policy makers on guidance and/or legislation of its use. One existing standardized test from the National Institute of Standards and Technology (NIST). Amazon Rekognition’s Face API is a large-scale system which runs on a broad set of Amazon EC2 instance types using multiple deep learning models and proprietary data processing, storage, and search systems. Amazon Rekognition can’t be ‘downloaded’ for testing outside of AWS, and components cannot be tested in isolation while replicating how customers would use the service in the real world. We welcome the opportunity to work with NIST on improving their tests against this API objectively, and to establish datasets and benchmarks with the broader academic community.

The research papers implies that Amazon Rekognition is not improving, and that AWS is not interested in discussing issues around facial recognition.

This is false. We are now on our fourth significant version update of Amazon Rekognition. We are acutely aware of the concerns around facial recognition, and remain highly motivated and committed to continuous improvement, just as we are with all of our services. We make funding available for research projects and staff through the AWS Machine Learning Research Grants and have made significant investments to continuously improve Amazon Rekognition. Those improvements are made available to customers in all geographic regions, as soon as our improvements are validated – and just like all AWS services – we will continue to update and improve Amazon Rekognition. So far, our direct offers to discuss, update, and collaborate on these results have not been acknowledged or accepted by the researchers in this case.

We know that facial recognition technology, when used irresponsibly, has risks. This is true of a lot of technologies, computers included.‎ And, people are concerned about this. We are, too. It’s why we suspend people’s use of our services if we find they’re using them irresponsibly or to infringe on people’s civil rights. It’s also why we clearly recommend in our documentation that facial recognition results should only be used in law enforcement when the results have confidence levels of at least 99%, and even then, only as one artifact of many in a human-driven decision.‎ But, we remain optimistic about the good this technology‎ will provide in society, and are already seeing meaningful proof points with facial recognition helping thwart child trafficking, reuniting missing kids with parents, providing better payment authentication, or diminishing credit card fraud. ‎And, to date (over two years after releasing the service), we have had no reported law enforcement misuses of Amazon Rekognition.

The answer to anxieties over new technology is not to run ‘tests’ inconsistent with how the service is designed to be used, and to amplify the test’s false and misleading conclusions through the news media. We are eager to continue to work with researchers, academics, and customers, to continuously improve as we evolve this important technology.

-Dr. Matt Wood, general manager of artificial intelligence at AWS

Updated (1^st Feb): This post was updated to accurately reflect the current state of testing with NIST.

Deploy TensorFlow models with Amazon Elastic Inference using a flexible new Python API available in EI-enabled TensorFlow 1.12

Written on January 23, 2019. Posted in Amazon.

Amazon Elastic Inference (EI) now supports the latest version of TensorFlow–1.12. It provides EIPredictor, a new easy-to-use Python API function for deploying TensorFlow models using EI accelerators. You can now use this new Python API function within your inference scripts as an alternative to using TensorFlow Serving when running TensorFlow models with EI. EIPredictor allows for easy experimentation and lets you compare performance with and without EI. This blog post shows you how to use EIPredictor to deploy your models on EI.

Let me start with some background. Amazon Elastic Inference is a new capability we launched at re:Invent 2018. EI provides a new, significantly more cost-effective way to apply acceleration to your deep learning inference workloads than using standalone GPU instances. EI lets you attach accelerators to any Amazon SageMaker or Amazon EC2 instance type and provides you the low latency, high throughput benefits of GPU acceleration at a much lower cost (up to 75%). You can use EI to deploy TensorFlow, Apache MXNet, and ONNX models for inference.

Using TensorFlow Serving to run models on EI

At the launch of Amazon EI we introduced EI-enabled TensorFlow Serving, which provides an easy way to run your TensorFlow models with EI accelerators without having to make any code changes. Just start a model server with EI-enabled TensorFlow Serving with your trained TensorFlow SavedModel, and make calls to it. EI-enabled TensorFlow Serving uses the same API as normal TensorFlow Serving. The only difference is that the entry point is a different binary named AmazonEI_TensorFlow_Serving_v1.12_v1. Here is an example command that you can use to launch the server:

$ AmazonEI_TensorFlow_Serving_v1.12_v1 --model_name=ssdresnet --model_base_path=/tmp/ssd_resnet50_v1_coco --port=9000

You can find EI-enabled TensorFlow Serving in the AWS Deep Learning AMIs (here’s a tutorial), or you can download the package from this Amazon S3 bucket so you can build it into your own custom Amazon Machine Image (AMI) or Docker container. EI-enabled TensorFlow Serving extends TensorFlow’s high performance model serving system to work seamlessly with EI. It automates accelerator discovery, secures your inference requests over the network with TLS encryption, and restricts access with AWS Identity and Access Management (IAM) policies.

Using EIPredictor to run models on EI

EIPredictor is a simple Python function for performing inference on a pretrained model. It is a new API function available within EI-enabled TensorFlow. It’s also available in the Deep Learning AMI and for download using Amazon S3. You can use EIPredictor in the following ways:

You can use EIPredictor with a saved model or a frozen graph. It’s similar to TF predictor. Please see EI’s documentation for using EIPredictor with these model formats.
You can disable usage of EI by using the use_ei flag which is defaulted to True. This is useful to see how your model performs with and without EI acceleration.

EIPredictor can also be created from a TensorFlow Estimator. Given a trained Estimator, you first export a SavedModel. Refer to the SavedModel documentationfor more details. Example usage:

saved_model_dir = estimator.export_savedmodel(my_export_dir, serving_input_fn)
ei_predictor = EIPredictor(export_dir=saved_model_dir)
//Once the EIPredictor is created, inference is done using the following:
output_dict = ei_predictor(feed_dict)

The following code sample shows the available parameters for this function:

ei_predictor = EIPredictor(model_dir,
           signature_def_key=None,
           signature_def=None,
           input_names=None,
           output_names=None,
           tags=None,
           graph=None,
           config=None,
           use_ei=True)

output_dict = ei_predictor(feed_dict)

Example for running a model with EI Predictor

Here’s an example you can try for serving a ResNet using a Single Shot Detector (SSD) model using EI Predictor. This example assumes that you’ve launched an EC2 instance with an EI accelerator. We’re going to use the latest Deep Learning AMI here for this example.

The first step is to activate the TensorFlow Elastic Inference Note that this is specific to the Deep Learning AMI. You don’t need this step if you built the EI-enabled TensorFlow library with your own custom AMI. You can choose between the Python 2 and Python 3 TensorFlow EI environments. I’ll use Python 2 for this example:
```
$ source activate amazonei_tensorflow_p27
```

Download the ResNet SSD model example from Amazon S3.

$ curl -O https://s3-us-west-2.amazonaws.com/aws-tf-serving-ei-example/ssd_resnet.zip

Unzip the model. Again, you may skip this step if you already have the model.
```
$ unzip ssd_resnet.zip -d /tmp
```

Download a picture of three dogs to your current directory.

$ curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/3dogs.jpg

Now open a text editor, such as vim, and paste the following inference script. Save the file as ssd_resnet_predictor.py

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import sys
import numpy as np
import tensorflow as tf
import matplotlib.image as mpimg
import time
from tensorflow.contrib.ei.python.predictor.ei_predictor import EIPredictor

tf.app.flags.DEFINE_string('image', '', 'path to image in JPEG format')
FLAGS = tf.app.flags.FLAGS
if(FLAGS.image == ''):
  print("Supply an Image using '--image [path/to/image]'")
  exit(1)
coco_classes_txt = "https://raw.githubusercontent.com/amikelive/coco-labels/master/coco-labels-paper.txt"
local_coco_classes_txt = "/tmp/coco-labels-paper.txt"
# Downloading coco labels
os.system("curl -o %s -O %s" % (local_coco_classes_txt, coco_classes_txt))
# Setting default number of predictions
NUM_PREDICTIONS = 20
# Reading coco labels to a list
with open(local_coco_classes_txt) as f:
  classes = ["No Class"] + [line.strip() for line in f.readlines()]


def main(_):
  # Reading the test image given by the user
  img = mpimg.imread(FLAGS.image)
  # Setting batch size to 1
  img = np.expand_dims(img, axis=0)
  # Setting up EIPredictor Input
  ssd_resnet_input = {'inputs': img}

  print('Running SSD Resnet on EIPredictor using specified input and outputs')
  # This is the EIPredictor interface, using specified input and outputs
  eia_predictor = EIPredictor(
      # Model directory where the saved model is located
      model_dir='/tmp/ssd_resnet50_v1_coco/1/',
      # Specifying the inputs to the Predictor
      input_names={"inputs": "image_tensor:0"},
      # Specifying the output names to tensor for Predictor
      output_names={"detection_classes": "detection_classes:0", "num_detections": "num_detections:0",
                    "detection_boxes": "detection_boxes:0"},
  )

  pred = None
  # Iterating over the predictions. The first inference request can take saveral seconds to complete
  for curpred in range(NUM_PREDICTIONS):
    if(curpred == 0):
      print("The first inference request loads the model into the accelerator and can take several seconds to complete. Please standby!")
    # Start the timer
    start = time.time()
    # This is where the inference actually happens
    pred = eia_predictor(ssd_resnet_input)
    print("Inference %d took %f seconds" % (curpred, time.time()-start))

  # Getting the number of objects detected in the input image from the output of the predictor
  num_detections = int(pred["num_detections"])
  print("%d detection[s]" % (num_detections))
  # Getting the class ids from the output
  detection_classes = pred["detection_classes"][0][:num_detections]
  # Mapping the class ids to class names from the coco labels
  print([classes[int(i)] for i in detection_classes])

  print('Running SSD Resnet on EIPredictor using default Signature Def')
  # This is the EIPredictor interface using the default Signature Def
  eia_predictor = EIPredictor(
      # Model directory where the saved model is located
      model_dir='/tmp/ssd_resnet50_v1_coco/1/',
  )

  # Iterating over the predictions. The first inference request can take saveral seconds to complete
  for curpred in range(NUM_PREDICTIONS):
    if(curpred == 0):
      print("The first inference request loads the model into the accelerator and can take several seconds to complete. Please standby!")
    # Start the timer
    start = time.time()
    # This is where the inference actually happens
    pred = eia_predictor(ssd_resnet_input)
    print("Inference %d took %f seconds" % (curpred, time.time()-start))

  # Getting the number of objects detected in the input image from the output of the predictor
  num_detections = int(pred["num_detections"])
  print("%d detection[s]" % (num_detections))
  # Getting the class ids from the output
  detection_classes = pred["detection_classes"][0][:num_detections]
  # Mapping the class ids to class names from the coco labels
  print([classes[int(i)] for i in detection_classes])


if __name__ == "__main__":
  tf.app.run()

Run the inference script.

$ python ssd_resnet_predictor.py --image 3dogs.jpg

Conclusion

You now have two convenient ways, depending on your preference, to run your TensorFlow models on cost efficient accelerators. Give it a try and let us know what you think at amazon-ei-feedback@amazon.com

You can learn more about Elastic Inference here and see our documentation user guide. For instructions on using the Deep Learning AMI for EI check out the AWS Deep Learning AMI documentation.

About the Author

Dominic Divakaruni is the Product Manager for Amazon Elastic Inference. He builds services that help customers scale production machine learning applications. In this spare time he enjoys drumming with his son and working on cars.

AWS launches open source Neo-AI project to accelerate ML deployments on edge devices

Written on January 22, 2019. Posted in Amazon.

At re:Invent 2018, we announced Amazon SageMaker Neo, a new machine learning feature that you can use to train a machine learning model once and then run it anywhere in the cloud and at the edge. Today, we are releasing the code as the open source Neo-AI project under the Apache Software License. This release enables processor vendors, device makers, and deep learning developers to rapidly bring new and independent innovations in machine learning to a wide variety of hardware platforms.

Ordinarily, optimizing a machine learning model for multiple hardware platforms is difficult because developers need to tune models manually for each platform’s hardware and software configuration. This is especially challenging for edge devices, which tend to be constrained in compute power and storage. These constraints limit the size and complexity of the models that they can run. Therefore, developers spend weeks or months manually tuning a model to get the best performance. The tuning process requires rare expertise in optimization techniques and deep knowledge of the hardware. Even then, it typically requires considerable trial and error to get good performance because good tools aren’t readily available.

Differences in software further complicate this effort. If the software on the device isn’t the same version as the model, the model will be incompatible with the device. This leads developers to limit themselves to only the devices that exactly match their model’s software requirements.

All of this makes it very difficult to quickly build, scale, and maintain machine learning applications.

Neo-AI eliminates the time and effort needed to tune machine learning models for deployment on multiple platforms by automatically optimizing TensorFlow, MXNet, PyTorch, ONNX, and XGBoost models to perform at up to twice the speed of the original model with no loss in accuracy. Additionally, it converts models into an efficient common format to eliminate software compatibility problems. On the target platform, a compact runtime uses a small fraction of the resources that a framework would typically consume. By making optimization easier, Neo-AI allows sophisticated models to run on resource-constrained devices, where they can unlock innovation in areas such as autonomous vehicles, home security, and anomaly detection. Neo-AI currently supports platforms from Intel, NVIDIA, and ARM, with support for Xilinx, Cadence, and Qualcomm coming soon.

At its core, Neo-AI is a machine learning compiler and a runtime built on decades of research on traditional compiler technologies, such as LLVM and Halide. It uses TVM and Treelite, which started as open source research projects at the University of Washington. The Neo-AI project uses TVM to compile deep learning models, Treelite to compile decision tree models, platform-specific optimizations from various contributors, and a common runtime for compiled models. AWS is an active contributor to the open source TVM and Treelite projects, and supports the growing TVM and LLVM communities.

Today’s release of AWS code back to open source through the Neo-AI project allows any developer to innovate on the production-grade Neo compiler and runtime. The Neo-AI project will be steered by the contributions of several organizations, including AWS, ARM, Intel, NVIDIA, Qualcomm, Xilinx, Cadence, and others.

By working with the Neo-AI project, processor vendors can quickly integrate their custom code into the compiler at the point at which it has the greatest effect on improving model performance. The project also enables device makers to customize the Neo-AI runtime for the particular software and hardware configuration of their devices. The Neo-AI runtime is currently deployed on devices from ADLINK, Lenovo, Leopard Imaging, Panasonic, and others. The Neo-AI project will absorb innovations from diverse sources into a common compiler and runtime for machine learning to deliver the best available performance for models.

“Intel’s vision of Artificial Intelligence is motivated by the opportunity for researchers, data scientists, developers, and organizations to obtain real value from advances in deep learning,” said Naveen Rao, General Manager of the Artificial Intelligence Products Group at Intel. “To derive value from AI, we must ensure that deep learning models can be deployed just as easily in the data center and in the cloud as on devices at the edge. By supporting Neo through Intel’s software efforts including nGraph and OpenVINO, device makers and system vendors can get better performance for models developed in almost any framework on platforms based on all Intel compute platforms.”

“NVIDIA Jetson with TensorRT is the best performing platform for AI at the edge” said Ian Buck, Vice President and General Manager, Accelerated Computing, NVIDIA. “Neo simplifies the deployment of deep learning models in production by optimizing them for both NVIDIA Tensor Core GPUs and NVIDIA Jetson GPUs to provide higher throughput and low-latency. Our collaboration with AWS and Neo will bring the full capability of NVIDIA Inferencing from the edge to the cloud to a broader set of developers.”

Sudip Nag, Corporate Vice President at Xilinx, said, “Xilinx provides the FPGA hardware and software capabilities that accelerate machine learning inference applications in the cloud and at the edge. We are pleased to support developers using Neo to optimize models for deployment on Xilinx FPGAs. We look forward to enabling Neo-AI to use Xilinx ML Suite to deliver optimal inference performance per watt.”

“ARM’s vision of a trillion connected devices by 2035 is driven by the additional consumer value derived from innovations like machine learning,” said Jem Davies, fellow, General Manager and Vice President for the Machine Learning Group at ARM. “The combination of Neo and the ARM NN SDK will help developers optimize machine learning models to run efficiently on a wide variety of connected edge devices.”

To learn more, see the Neo-AI repository on GitHub.

About the Authors

Sukwon Kim is a Senior Product Manager for AWS Deep Learning. He works on products that make it easier for customers to use deep learning engines. In his spare time, he enjoys hiking and traveling.

Vin Sharma is a Engineering Leader for AWS Deep Learning. He leads the team building Neo, which helps ML models train once and run anywhere in the cloud and at the edge.

Identifying and working with sensitive healthcare data with Amazon Comprehend Medical

Written on January 22, 2019. Posted in Amazon.

At AWS, I regularly speak with AWS customers and AWS Partner Network (APN) partners about how they are using technology to transform human health. These companies often generate large amounts of health data that they use in a variety of applications, such as population health management and electronic health records. Developers need to find ways to use the valuable medical information in these applications while meeting their compliance obligations with regard to sensitive data, such as protected health information (PHI). Some applications where our customers and APN partners are doing this today are clinical decision support, revenue cycle management, and clinical trial management.

There are multiple methods to mask data, and each organization has their own approaches based on internal risk assessments. We recommend that you consult risk assessment specialists for your organization’s specific implementation process. Typically, data is masked in two steps. First, PHI must be identified. Then, an algorithm is used that either anonymizes or de-identifies the data, usually in accordance with Safe Harbor or expert determination. This approach lends itself to using a state machine to apply the business logic your organization requires for each step independently and pass the information between states.

In this blog post, I’ll demonstrate how you can use a combination of Amazon Comprehend Medical, AWS Step Functions, and Amazon DynamoDB to identify sensitive health data and help support your compliance objectives. I’ll then discuss some potential extensions of the architecture that are patterns customers often adopt.

The architecture

This architecture uses the following services:

Amazon Comprehend Medical to identify entities within a body of text
AWS Step Functions and AWS Lambda to coordinate and execute the workflow
Amazon DynamoDB to store the de-identified mapping

This architecture and the code that follows are available as an AWS CloudFormation template.

The individual components

Like many modern applications being built on AWS, the individual components within this architecture are represented as Lambda functions. In this blog post, I’ll show you how to build three Lambda functions:

IdentifyPHI: Uses the Amazon Comprehend Medical API to detect and identify PHI entities from a body of text, such as a medical note.
MaskEntities: Takes the entities from IdentifyPHI as input and masks them in the body of text
DeidentifyEntities: Takes the entities from IdentifyPHI and applies a hash to each entity and stores that mapping in DynamoDB.

Let’s walk through each in turn.

Identify PHI

The following code reads in a JSON body, extracts PHI entities from the message, and returns a list of extracted entities.

from botocore.vendored import requests
import json
import boto3
import logging
import threading
client = boto3.client(service_name='comprehendmedical')

def timeout(event, context):
    raise Exception('Execution is about to time out, exiting...')

def extract_entities_from_message(message):
    return client.detect_phi(Text=message)

def handler(event, context):
    # Add in context for Lambda to exit if needed
    timer = threading.Timer((context.get_remaining_time_in_millis() / 1000.00) - 1, timeout, args=[event, context])
    timer.start()
    print ('Received message payload. Will extract PII')
    try:
        # Extract the message from the event
        message = event['body']['message']
        # Extract all entities from the message
        entities_response = extract_entities_from_message(message)
        entity_list = entities_response['Entities']
        print ('PII entity extraction completed')
        return entity_list
    except Exception as e:
        logging.error('Exception: %s. Unable to extract PII entities from message' % e)
        raise e

The workhorse in this Lambda function is the Amazon Comprehend Medical DetectPHI API call, which returns a list of entities that Amazon Comprehend Medical identifies. Note that confidence scores are provided with each identified entity – these scores indicate the level of confidence in the accuracy of identified entities. You should take these confidence scores into account and review identified entities output to make sure they are correct. For more information on the returned data structure, see the DetectPHI documentation.

Mask entities

There are multiple approaches to masking a message. In this example, we take each entity and replace it with a series of pound signs (#) corresponding to the length of the entity. The output is the message that has been input with each entity masked. You could choose whichever methods that are most meaningful to and appropriate for your business. For example, if there are multiple NAME PHI entities, you could order them as NAME1, NAME2, and so on.

Here’s the Lambda function:

from botocore.vendored import requests
import json
import boto3
import logging
import threading
import sys

def timeout(event, context):
  raise Exception('Execution is about to time out, exiting...')

def mask_entities_in_message(message, entity_list):
  for entity in entity_list:
      message = message.replace(entity['Text'], '#' * len(entity['Text']))
  return message

def handler(event, context):
  # Add in context for Lambda to exit if needed
  timer = threading.Timer((context.get_remaining_time_in_millis() / 1000.00) - 1, timeout, args=[event, context])
  timer.start()
  print ('Received message payload')
  try:
      # Extract the entities and message from the event
      message = event['body']['message']
      entity_list = event['body']['entities']
      # Mask entities
      masked_message = mask_entities_in_message(message, entity_list)
      print (masked_message)
      return masked_message
  except Exception as e:
      logging.error('Exception: %s. Unable to extract entities from message' % e)
      raise e

De-identify entities

There are multiple methods for de-identification. The example described in this blog post is meant to demonstrate one way you can de-identify sensitive entities so that they can be reidentified later on by a user with the appropriate permissions. Here, we do several steps:

Apply a salt to the entity.
For each entity, generate a sha3-256 hash of the salted entity. Store this entity in a dictionary.
Replace each entity in the message with the hash generated in step 1.
Generate a sha3-256 hash of the de-identified message.
Store the entities in DynamoDB with the hashed message as the hash key and the entity hash as the range key.

Here is the Lambda function for this step. The EntityMap, which is a DynamoDB table, is read in as an environment variable:

from botocore.vendored import requests
import json
import boto3
import hashlib
import base64
import logging
import threading
import uuid
import os

ddb = boto3.client('dynamodb')

def timeout(event, context):
    raise Exception('Execution is about to time out, exiting...')
    
def store_deidentified_message(message, entity_map, ddb_table):
    hashed_message = hashlib.sha3_256(message.encode()).hexdigest()
    for entity_hash in entity_map:
        ddb.put_item(
            TableName=ddb_table,
            Item={
                'MessageHash': {
                    'S': hashed_message
                },
                'EntityHash': {
                    'S': entity_hash
                },
                'Entity': {
                    'S': entity_map[entity_hash]
                }
            }
        )
    return hashed_message
    
def deidentify_entities_in_message(message, entity_list):
    entity_map = dict()
    for entity in entity_list:
      salted_entity = entity['Text'] + str(uuid.uuid4())
      hashkey = hashlib.sha3_256(salted_entity.encode()).hexdigest()
      entity_map[hashkey] = entity['Text']
      message = message.replace(entity['Text'], hashkey)
    return message, entity_map
    
def handler(event, context):
    # Add in context for Lambda to exit if needed
    timer = threading.Timer((context.get_remaining_time_in_millis() / 1000.00) - 1, timeout, args=[event, context])
    timer.start()
    print ('Received message payload')
    try:
        # Extract the entities and message from the event
        message = event['body']['message']
        entity_list = event['body']['entities']
        # Mask entities
        deidentified_message, entity_map = deidentify_entities_in_message(message, entity_list)
        hashed_message = store_deidentified_message(deidentified_message, entity_map, os.environ['EntityMap'])
        return {
            "deid_message": deidentified_message, 
            "hashed_message": hashed_message
        }
    except Exception as e:
      logging.error('Exception: %s. Unable to extract entities from message' % e)
      raise e

Building the Boto3 Lambda layer

Next, we’ll create a Lambda layer containing Boto3. This is a common best practice when deploying Lambda functions in production.

Copy and paste the following code into a terminal. Feel free to change boto3env to a folder of your choice. The following example uses Python 3.6.

pip install boto3 --target python/.
 
# install botocore
pip install botocore --target python/.
 
# zip to four layer
zip boto3layer.zip -r python/

aws lambda publish-layer-version --layer-name boto3-layer --zip-file fileb://boto3layer.zip

Note the LayerVersionArn in the output. We’ll use this shortly.

Building the state machine

The multiple steps within this workflow, such as data passed between steps and forking paths based on user input, can be best represented as a state machine. We’ll use AWS Step Functions to define the state machines and execute the individual Lambda functions.

The state machine reads in a JSON blob containing the message text to process as well as whether to mask or de-identify the message. The overall steps are:

Identify PHI entities using Amazon Comprehend Medical APIs.
Determine whether to mask entities or de-identify.
Based on results of Step 2, act accordingly.

Here is the Amazon States Language code defining this state machine:

{
  "Comment": "State Machine that anonymizes or deidentifies PHI",
  "StartAt": "Identify PHI",
  "States": {
    "Identify PHI": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:IdentifyPHILambda",
      "InputPath": "$",
      "ResultPath": "$.body.entities",
      "Next": "Anonymize Or De-identify"
    },
    "Anonymize Or De-identify": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.body.anonymizeOrDeidentify",
          "StringEquals": "anonymize",
          "Next": "Anonymize"
        },
        {
          "Variable": "$.body.anonymizeOrDeidentify",
          "StringEquals": "deidentify",
          "Next": "De-identify"
        }
      ],
      "Default": "Anonymize"
    },
    "Anonymize": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:MaskEntitiesLambda",
      "InputPath": "$",
      "ResultPath": "$.maskedMessage",
      "OutputPath": "$.maskedMessage",
      "End": true
    },
    "De-identify": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:DeidentifyLambda",
      "InputPath": "$",
      "ResultPath": "$.maskedMessage",
      "OutputPath": "$.maskedMessage",
      "End": true
    }
  }
}

Testing the state machine

As mentioned in the introduction, you can deploy the entire architecture using AWS CloudFormation. Launch the CloudFormation template now:

Use the LayerVersionArn output that you noticed previously in the Boto3LayerArn CloudFormation parameter.

After the CloudFormation stack deploys, you should have the following resources:

The three Lambda functions
A DynamoDB table containing mappings to the re-identified entities
A Step Functions state machine
AWS Identity and Access Management (IAM) resources

Let’s take a fictional medical note, or rather a combination of what would be several notes, which was provided by the Amazon Comprehend Medical team. Notice that it’s filled with typos, which would present challenges for rules-based approaches for entity identification.

Stay Free Medical Center
Emergency Department
Clinical Summary
12341 W. Bohannon Rd, Grantville, GA
Phone: (770) 922-9800

PERSON INFORMATION
Name: SALAZAR, CARLOS
MRN: RQ36114734
ED Arrival Time: 11/12/2011 18:15

Sex: Male
DOB: 2/11/1961
Age: 50 Years
Visit Reason: New onset A Fib, SOB
Acuity: 2 Emergent Disposition: Home/Self-Care
Address: 186 VALETINE, NE 69201
Phone: 402 213-2221

SUBJECTIVE:
Carlos came to the ED via ambulance accompanied by son, Jorge. He is a 50 yo male who was working at Food Corp when he had sudden onset of palpitations. Carlos stated his fater, Diego, also had palpitations through his life.

Provider Contact Time: 11/12/2011 19:00
Decision to Admit: Not entered
ED Departure Time: 11/23/2011 00:07

DIAGNOSIS: Hyperthyroidism
Attending Provider:
Saanvi Sarkar, MD

Primary Nurse(s):
Jackson; Mateo

Fill New Prescriptions:
nepafenac (nepafenac 1 mg / 1mL Ophthalmic Suspension) 1 drop left eye every 12 hours 14 day(s)
zofran (Ondansetron 4 mg oral tablet) 4 mg ORAL PRN
atropine sulfate 0.05 mcg / hyopscyamine sulfate 3.1 mcg / phenobartbital 48.6 MG / scopolamine hydrobromide 0.0195 mg ( Donnata ER oral tablet) 1 table PO PRN
acetaminophen – hydrocodone ( Vicodin 5 mg – 500 mg oral tablet ) 2 tablet(s) by Mouth every 6 hours as needed for pain
docusate sodium 100 mg oral capsule 100 mg by Mouth twice daily as needed for constipation

Allergies:
penicillins
ibuprofen
bee pollen

Patient Education and Follow-up Information
Instructions:
ED, Nausea (Custom)
Follow up:

With:
Address:
When:

Return to Emergency Department

Comments:

Nausea Vomiting

Nausea persists without control from anti-nausea medications Projectile vomiting Uncontrolled , consistent nausea & vomiting Blood or “coffee grounds” appearing material in vomit Medicine not kept down because of vomiting Weakness or dizziness along with nausea/vomiting Severe stomach pain while vomiting

Pain
Severe Chest / Arm pain Severe squeezing or pressure in chest Severe sudden headache
New or uncontrolled pain New headache Chest discomfort Pounding heart Heart “flip – flop” feeling Painful Central Line site or area of “tunnel” Burning in chest or stomach Pain or burning while urinating Pain with infusion of medications or fluids into Central Line

Diarrhea

Constant or uncontrolled diarrhea New onset diarrhea Diarrhea with fever and abdominal cramping Whole pills passed in stool Greater than 5 times each day Stool which is bloody , burgundy or black Abdominal cramping

Fatigue
Unable to wake
Dizziness Fatigue is getting worse Too tired to get out of bed or walk to the bathroom Staying in bed all day

Fever / Chills

Shaking chills , temperature may be normal Temperature greater than 38.3° C or 100.9° F by mouth Fever greater than 1 degree above usual if on steroids 24 Cold symptoms ( runny nose , watery eyes , sneezing , coughing )

With:
Address:
When:

Follow up with primary care provider

Comments:

Call tomorrow to make an appointment for the next 1-2 days and to start arranging PCP follow-up

Thank you for visiting the Stay Free Medical Center.

Comments:

Call tomorrow to make an appointment for the next 1-2 days and to start arranging PCP follow-up

Thank you for visiting the Stay Free Medical Center.

The input to the state machine takes two values. First, the note. Second, a choice of whether to anonymize the note or de-identify it. In this example we’ll de-identify the message. Here’s what that looks like:

{
	"body": {
		"message": " Stay Free Medical Center nEmergency Department nClinical Summary n12341 W. Bohannon Rd, Grantville, GAnPhone: (770) 922-9800 nnnPERSON INFORMATIONnName:  SALAZAR, CARLOSnMRN:  RQ36114734 nED Arrival Time:  11/12/2011 18:15n nSex:  Male nDOB:  2/11/1961n Age:   50 Years nVisit Reason:  New onset A Fib, SOBn Acuity:  2   Emergent Disposition:  Home/Self-Care nAddress:  186 VALETINE, NE 69201nPhone:  402 213-2221 n nSUBJECTIVE:nCarlos came to the ED via ambulance accompanied by son, Jorge. He is a 50 yo male who was working at Food Corp when he had sudden onset of palpitations. Carlos stated his fater, Diego, also had palpitations through his life.n nProvider Contact Time:  11/12/2011 19:00n Decision to Admit:  Not enteredn ED Departure Time:  11/23/2011 00:07n nDIAGNOSIS:  Hyperthyroidism n Attending Provider: nSaanvi Sarkar, MDn n Primary Nurse(s): nJackson; Mateon nn Fill New Prescriptions:nnepafenac (nepafenac 1 mg / 1mL Ophthalmic Suspension) 1 drop left eye every 12 hours 14 day(s)nzofran (Ondansetron 4 mg oral tablet) 4 mg ORAL PRNnatropine sulfate 0.05 mcg / hyopscyamine sulfate 3.1 mcg / phenobartbital 48.6 MG / scopolamine hydrobromide 0.0195 mg ( Donnata ER oral tablet) 1 table PO PRNnacetaminophen - hydrocodone ( Vicodin 5 mg - 500 mg oral tablet )  2 tablet(s) by Mouth every 6 hours as needed for painndocusate sodium 100 mg oral capsule 100 mg by Mouth twice daily as needed for constipationnn nAllergies:n penicillinsn ibuprofenn bee pollenn nPatient Education and Follow-up Informationn Instructions:n   ED, Nausea (Custom) n Follow up:n  n With:nAddress:nWhen:nnReturn to Emergency DepartmentnnnnComments:nnNausea VomitingnnNausea persists without control from anti-nausea medications  Projectile vomiting  Uncontrolled , consistent nausea & vomiting  Blood or “coffee grounds” appearing material in vomit Medicine not kept down because of vomiting Weakness or dizziness along with nausea/vomiting Severe stomach pain while vomitingnnPain nSevere Chest / Arm pain Severe squeezing or pressure in chest Severe sudden headachenNew or uncontrolled pain New headache Chest discomfort Pounding heart Heart “flip - flop” feeling Painful Central Line site or area of “tunnel” Burning in chest or stomach Pain or burning while urinating Pain with infusion of medications or fluids into Central LinennnDiarrhea nnConstant or uncontrolled diarrhea New onset diarrhea Diarrhea with fever and abdominal cramping Whole pills passed in stool Greater than 5 times each day Stool which is bloody , burgundy or black Abdominal crampingnnFatiguenUnable to wakenDizziness Fatigue is getting worse Too tired to get out of bed or walk to the bathroom Staying in bed all daynnFever / Chills nnShaking chills , temperature may be normal Temperature greater than 38.3° C or 100.9° F by mouth Fever greater than 1 degree above usual if on steroids 24 Cold symptoms ( runny nose , watery eyes , sneezing , coughing ) nnnnWith:nAddress:nWhen:nnFollow up with primary care providernnnnComments:nnCall tomorrow to make an appointment for the next 1-2 days and to start arranging PCP follow-upnnnThank you for visiting the Stay Free Medical Center.n n",
		"anonymizeOrDeidentify": "deidentify"
	}
}

In the AWS CloudFormation console, navigate to the output page and note the state machine Amazon Resource Name (ARN), you will be using it later to invoke a state machine execution.

You can test using the AWS CLI, your AWS SDK of choice, or the AWS Step Functions console. The following command shows what it would be like if you used the CLI. However, before you type the following command, copy the previous JSON and save it to example_note.json. Also replace the AWS Step Functions state machine ARN with the ARN in the CloudFormation output.

aws stepfunctions start-execution --state-machine-arn YOUR_STATEMACHINE_ARN --input file://example_note.json

The overall execution should take only a couple of seconds. Let’s navigate to the AWS Step Functions console to see what happened.

When you ran the previous command, several things happened.

A Lambda function identified potential PHI entities within the note.
These entities were salted and the resulting combination was hashed using SHA3-256.
The hashes replaced the original entities in the message and the updated message was then hashed.
The mappings were stored in DynamoDB.
The hashed message is returned as the output of the execution.

You can view the output from the steps in the AWS Step Functions console. The previous message should now look like the following (formatted for ease of reading). The de-identified message still contains valuable information that can be used, but the sensitive data has been masked using the previous masking example.

8db49f8fdfc0a003402dd68439d2a848635d6c60a2719020c7b922916aafbdf0 
c027ee7d7992ea804c589c2c2777fc646e2f394d5db900177246f9d7bd8d762d 
Clinical Summary 
5d0276605f49fa2c8e010b9781cb348d9efca84dd7a49e0ce6fb845e156f3331
Phone: 988c20b763f3b60b83aa64f48ce3184642dcf15707eeaead9d24c266e8967680 

PERSON INFORMATION
Name:  ba1a8b9ba885c1b867b0fda332845d2c4435a921cdbf849a86b4e5768b00972395cbf6c9eda8afecd3a8a454a28774512f78cd9d03ae7f2670433bc0217379
MRN:  45dd4310f18cddb1f37c4e11b36b12e77fc64001229a2632333d1e0f379f5847 
2e0fbaa0c9008457c15f9306a9cd588ec09402f8db12194ec705b3f058e3eff6 Arrival Time:  712caefd59fd2015172ef9cb560dad2852c652368a618d446b472958db6a288b 18:15
 
Sex:  Male 
DOB:  88d76b85ad3e7cc2b1d06ea99a8a13df842fdd7ab0986ae3c747a3993944f91d
 Age:   b9ba885c1b867b0fda332845d2c4435a921cdbf849a86b4e5768b00972395cbf Years 
Visit Reason:  New onset A Fib, SOB
 Acuity:  2   Emergent Disposition:  Home/Self-Care 
Address:  aff3537058e53a9de01a4689cf1c3109584370e98ec31241a3ae4c07eceb0cbb
Phone:  35cda8ec6c456bdf120843e0a1302f0aef1bab003a51353a02fe41e56baa92f03a465fe2bac1c23d18cacdb3576a84aa5c0aeee3fb8aafb61bd18a6970610d 
 
SUBJECTIVE:
1093369cc39bcae926a41719947e202ba749ff91691777321dcec52d34eb9296 came to the 2e0fbaa0c9008457c15f9306a9cd588ec09402f8db12194ec705b3f058e3eff6 via ambulance accompanied by son, abaefc3557e1c7577a16c658126d74cf8ae36857737c22eb587bc414bd926936. He is a b9ba885c1b867b0fda332845d2c4435a921cdbf849a86b4e5768b00972395cbf yo male who was working at f64136468ffae173d7eb43e4735e0bb9940d1718723dc0f42e0ffeb9053756cf when he had sudden onset of palpitations. 1093369cc39bcae926a41719947e202ba749ff91691777321dcec52d34eb9296 stated his fater, 23f255a3e4ec38a0fd094f3d96f30cb1a4787f269913aa890fb3a68058bd44fb, also had palpitations through his life.
 
Provider Contact Time:  712caefd59fd2015172ef9cb560dad2852c652368a618d446b472958db6a288b 19:00
 Decision to Admit:  Not entered
 2e0fbaa0c9008457c15f9306a9cd588ec09402f8db12194ec705b3f058e3eff6 Departure Time:  c078acc1b42e5eda560ec66cbabcddc16fde9ba0758ef73f095a11b87cda87b5 00:07
 
DIAGNOSIS:  Hyperthyroidism 
 Attending Provider: 
9437ca325df16c59a18c57c52194cc344ea3a3e4155a9b8decb7caf453b93c10, MD
 
 Primary Nurse(s): 
30ed768bd50007158ddd6ca6e71bc3e5d8bf411cb7597692c7aa729b53a13527

 Fill New Prescriptions:
nepafenac (nepafenac 1 mg / 1mL Ophthalmic Suspension) 1 drop left eye every 12 hours 14 day(s)
zofran (Ondansetron 4 mg oral tablet) 4 mg ORAL PRN
atropine sulfate 0.05 mcg / hyopscyamine sulfate 3.1 mcg / phenobartbital 48.6 MG / scopolamine hydrobromide 0.0195 mg ( Donnata ER oral tablet) 1 table PO PRN
acetaminophen - hydrocodone ( Vicodin 5 mg - b9ba885c1b867b0fda332845d2c4435a921cdbf849a86b4e5768b00972395cbf0 mg oral tablet )  2 tablet(s) by Mouth every 6 hours as needed for pain
docusate sodium 100 mg oral capsule 100 mg by Mouth twice daily as needed for constipation

Allergies:
 penicillins
 ibuprofen
 bee pollen
 
Patient Education and Follow-up Information
 Instructions:
   2e0fbaa0c9008457c15f9306a9cd588ec09402f8db12194ec705b3f058e3eff6, Nausea (Custom) 
 Follow up:
  
 With:
Address:
When:

Return to c027ee7d7992ea804c589c2c2777fc646e2f394d5db900177246f9d7bd8d762d

Comments:

Nausea Vomiting

Nausea persists without control from anti-nausea medications  Projectile vomiting  Uncontrolled , consistent nausea & vomiting  Blood or “coffee grounds” appearing material in vomit Medicine not kept down because of vomiting Weakness or dizziness along with nausea/vomiting Severe stomach pain while vomiting

Pain 
Severe Chest / Arm pain Severe squeezing or pressure in chest Severe sudden headache
New or uncontrolled pain New headache Chest discomfort Pounding heart Heart “flip - flop” feeling Painful Central Line site or area of “tunnel” Burning in chest or stomach Pain or burning while urinating Pain with infusion of medications or fluids into Central Line

Diarrhea 

Constant or uncontrolled diarrhea New onset diarrhea Diarrhea with fever and abdominal cramping Whole pills passed in stool Greater than 5 times each day Stool which is bloody , burgundy or black Abdominal cramping

Fatigue
Unable to wake
Dizziness Fatigue is getting worse Too tired to get out of bed or walk to the bathroom Staying in bed all day

Fever / Chills 

Shaking chills , temperature may be normal Temperature greater than 38.3° C or 100.9° F by mouth Fever greater than 1 degree above usual if on steroids 24 Cold symptoms ( runny nose , watery eyes , sneezing , coughing ) 

With:
Address:
When:

Follow up with primary care provider

Comments:

Call tomorrow to make an appointment for the next 1-2 days and to start arranging PCP follow-up

Thank you for visiting the 8db49f8fdfc0a003402dd68439d2a848635d6c60a2719020c7b922916aafbdf0.

Here’s what the table looks like after two runs with the same message.

Because each entity is salted, there’s no way of mapping that hash back to the original entity without using the DynamoDB mapping table, which you can notice by repeated entities having different hashes due to salting. Additionally, since you can manage DynamoDB access using IAM, you can control who has access to the items in your table. You can then use AWS CloudTrail to audit reads from your table containing sensitive information.

Conclusion and next steps

Protecting sensitive data is always job zero for healthcare organizations. In this blog post, I demonstrated how you can use Amazon Comprehend Medical to work with and identify protected health information. While organizations have different approaches to protect sensitive data, they follow the same architectural pattern: (1) identify the sensitive entities, and (2) apply the appropriate protection strategy for the sensitive entities as defined by your organization. A state machine is well-suited to orchestrate the two steps.

There are additional modifications you can make to this architecture to suit your needs. Here are a few ideas:

Put the state machine behind Amazon API Gateway to add an authorization layer to process your text, as well as a gateway to the individual Lambda functions.
Filter by the confidence of the DetectPHI call. Amazon Comprehend Medical entities have a Score field in addition to Text. You can apply a threshold to filter the calls by, depending on your business requirements.
Use DetectPHI in conjunction with DetectEntities to help you detect and identify PHI, and also extract non-PHI entity relationships, which can be used for downstream analytics.

Interested in learning more about Amazon Comprehend Medical?

Check out the documentation
Explore related blog posts discussing Amazon Comprehend Medical
- Amazon Comprehend Medical – Natural Language Processing for Healthcare Customers
- Extract and visualize clinical entities using Amazon Comprehend Medical

Coming to HIMSS? Meet the AWS Healthcare team live at HIMSS19 Booth #5058!

We welcome your questions and comments. We look forward to hearing from you!

About the Author

Dr. Aaron Friedman is a Healthcare and Life Sciences Partner Solutions Architect at Amazon Web Services. He works with ISVs and SIs to architect healthcare solutions on AWS, and bring the best possible experience to their customers. His passion is working at the intersection of science, big data, and software. In his spare time, he’s exploring the outdoors, learning a new thing to cook, or spending time with his wife, son, and his dog, Macaroon.

Extract and visualize clinical entities using Amazon Comprehend Medical

Written on January 22, 2019. Posted in Amazon.

Amazon Comprehend Medical is a new HIPAA-eligible service that uses machine learning (ML) to extract medical information with high accuracy. This reduces the cost, time, and effort of processing large amounts of unstructured medical text. You can extract entities and relationships like medication, diagnosis, and dosage, and you can also extract protected health information (PHI). Using Amazon Comprehend Medical allows end users to get value from raw clinical notes that is otherwise largely unused for analytical purposes because it’s difficult to parse. There is immense value associated with extracting information from these notes and integrating it with other medical systems like an Electronic Health Record (EHR) and a Clinical Trial Management System (CTMS). This allows you to generate a longitudinal view of the patient considering information in raw notes that would otherwise be discarded.

As with all our API-level services, the focus of Amazon Comprehend Medical is ease of use for developers. We provide a pre-trained model that can be invoked using an API call or the console. The results are returned as a structured JSON file that can be parsed and integrated with other structural clinical datasets. To know more about Amazon Comprehend Medical, see the product documentation.

In this example, I demonstrate how you can use Amazon Comprehend Medical to extract clinical entities and visualize them on a Kibana dashboard. The solution is provided as an AWS CloudFormation template so you can deploy it easily in your own environments.

Solution architecture

The architecture diagram showcases the various components of the solution. Here are the details of each component:

You can use Amazon S3 as a platform to store your raw clinical notes.
Use Amazon Comprehend Medical API to loop through your notes and extract various clinical entities and relationships from the notes. You can also filter the extracted elements to exclude all protected health information (PHI) from the notes. This is useful for use cases that require non-identifiable elements in a note for downstream analysis.
The extracted entities JSON file is parsed and inserted into an Amazon DynamoDB table. This table can serve as a repository of all clinical entities over time and can be used for downstream integration by developers.
The DynamoDB table has a stream attached to it. This stream is parsed using an AWS Lambda function that is triggered by an event on the stream.
The Lambda function inserts the records into an Amazon Elasticsearch Service (Amazon ES) domain. This domain can be kept up to date with all clinical entity information.
A Kibana dashboard is built on top of Amazon ES to visualize the clinical entities. This will serve as an entry point for end users looking for analytical information and search capabilities on the notes.

Instructions for deploying the solution

We will use an AWS CloudFormation stack to deploy the solution. The CloudFormation stack creates the resources needed by the solution. These include:

An S3 bucket
A DynamoDB table
A Lambda function
Necessary AWS Identity and Access Management (IAM) roles

This example uses the us-east-1 (N. Virginia) AWS Region.

Log into the AWS Management Console with your IAM username and password. Right click on the “Launch Stack” icon below and open it in a new tab.

On the Select Template page, choose Next.

On the next page provide a name for the stack. Enter a name and choose Next.

On the options page, leave everything as the default and choose Next.

On the Review page, scroll down and select the checkbox “I acknowledge that AWS CloudFormation might create IAM resources wit custom names.” Choose Create.

Wait for the stack to complete executing. You can examine various events from the stack creation process in the Events tab. After the stack creation is complete, look at the Resources tab to see all the resources created by the CloudFormation template. Open the Outputs tab to look at the output of the CloudFormation stack.

Setting up Amazon Elasticsearch Service and AWS Lambda

Now we’ll use Amazon Comprehend Medical to extract entities from a collection of clinical notes and visualize them on a Kibana dashboard.

Log into the AWS Management Console and follow these steps to complete this part of the workshop:

Open the Amazon Elasticsearch Service. You should see the domain that you created for this example. Choose it.
On the Overview tab, copy the Endpoint url and paste it on a notepad. You will use this in step 10 below.
Choose the Modify access policy
On the Select a template dropdown, select Allow or deny access to one or more AWS accounts or IAM users.
In the pop-up window, paste the Account ID in the Account ID or ARN textbox. Click Ok
An access policy will be generated for you automatically. Review it and choose Submit.
Navigate to the AWS Lambda console.
On the list of functions, select the function created for the workshop.
Scroll down to the section that contains the function code.
On line 13, you will see a variable named host set to the Elasticsearch Host Name. Replace that with the hostname that was copied in step 2, earlier. Make sure to put it between single quotes.
Choose Save.

Setting up the local environment

You will run a python program to extract entities from raw notes using Amazon comprehend medical and then insert those entities into a DynamoDB table. Before you run this program, you will have to setup your environment. Make sure you have competed the following setup tasks before executing the program:

Have the AWS command line interface (CLI) installed and configured with an identity and access management (IAM) user with Administrator privileges. Click on this link to see the steps to configure your AWS CLI.
Create a folder on your computer and download this zip file into it. Unzip the file.
This will create a new folder Blog_Code. Navigate to the notes folder inside the Blog_Code folder and open up one of the note files and examine its contents to see how the unstructured notes look like. Here is a screenshot of file1.txt.
Go back to the <> directory and open the python file and paste the name of your DynamoDB table between the single quotes replacing “your_table_name_here” on line 7 as shown in the screenshot below:
You are now ready to execute the python program. Execute the program by typing:
```
python Entity_Extraction.py
```
The program will extract entities from the downloaded notes and insert them into DynamoDB. Once completed, you will see the following message:

Visualizing entitles on Kibana

On the AWS Management Console, navigate to the DynamoDB console.
Choose Tables on the left navigation pane.
Choose the table name created for you by the CloudFormation stack. You can get the name of the table in the Outputs tab of the CloudFormation stack, in the CloudFormation console
On the Overview tab, you can see the value Latest stream ARN, which denotes that this DynamoDB table has a stream associated with it.
Choose the Items
You can see the extracted entities from the notes. We get the attributes like Category, Type and also a confidence score. In addition, we also get a list of attributes and traits associated with the entities.
Now, let’s visualize these entities in Kibana. To do this, we will use an open source proxy called aws-es-kibana. Please follow the steps on the GitHub repo to install the proxy on your computer.
Once installed, run the following command:
```
aws-es-kibana your_Elasticsearch_domain_endpoint 
```
You can find the domain endpoint in the outputs tab of the CloudFormation stack. You should see the following output:
Copy the url for Kibana and paste it in your browser window. This will open up the Kibana dashboard. On the Index pattern text box, type lambda-index and choose Create.
You will see the field and attribute names in Kibana on which we will build some visualizations.
Kibana provides multiple options to build visualizations that can be integrated into a dashboard. You can experiment with those options in the Visualize and Dashboard links on the left navigation pane of Kibana. To get you started, we have pre-built a dashboard for you as a basic example. Follow these steps to import the dashboard file and visualize the results.
On the left navigation pane, click Management and then click Saved Objects.
Click Import on the top right corner and navigate to the Entity_Dashboard.json file under the folder Blog_Code you downloaded and extracted earlier.
You will see a pop-up message with a question asking if you want to overwrite. Choose Yes, overwrite all.
You will see another pop-up window saying some index patterns do not exist. Make sure lambda-index is selected in the drop-down Newindex pattern list and click Confirm all changes. You should see a new dashboard called EntityDashboard.
On the left navigation pane, click Dashboard and then on the EntityDashboard Link. You will see the dashboard with visualizations generated from the extracted entities.

There are three visualizations in the dashboard. The top left visualization aggregates the counts of different categories. As you can see, our sample notes had Medical Condition as the highest category. The top right visualization is a pie chart capturing the distribution of entity types. The bottom visualization is a term cloud that tells you what are the most common terms extracted from the notes. You can experiment with different visualizations and options to build you own visual dashboards.

Conclusion

In the example in this blog post, you saw how to use Amazon Comprehend Medical to extract clinical entities and visualize them on a Kibana dashboard. We foresee many use cases being enabled by this ability to extract entities. Some examples include:

Patient and Population Health Analytics: Unstructured data is difficult to mine.
Example: Clinical team in the ICU makes over 120 decisions about care per day, How do you keep up?
Revenue Cycle Management: Medical Coding: Process of coding or classifying patient records according to the International Classification of Diseases (ICD) is one of the most complex transactions.
Pharmacovigilance: Multiple avenues of reporting adverse drug reactions or adverse events.
PHI Compliance: Difficult to maintain HIPAA compliance and technical requirements for PHI.
Clinical Trial Management: Identify the right patients for clinical trials quickly.

You can also combine Amazon Comprehend Medical with upstream services like Optical Character Recognition (OCR) systems to extract information from medical forms and pass it to comprehend medical for analysis. For downstream analysis, customers can integrate the output into a clinical data warehouse to improve reporting on Centers for Medicare and Medicaid services (CMS) quality measures.

Amazon Comprehend Medical also enables you to build machine learning models on raw clinical data in EHR systems for common problems like mortality risk prediction and predicting readmissions. These are models that are largely built using structured clinical data and by adding attributes from raw clinical notes can improve the results.

Explore related blog posts discussing Amazon Comprehend Medical:

There are many possibilities, and we are excited to see how you use Amazon Comprehend Medical for your use cases.

Disclaimer: Please keep in mind the following guidelines and limits for Amazon Comprehend Medical. https://docs.aws.amazon.com/comprehend/latest/dg/guidelines-and-limits-med.html

The notes used in this blog post are borrowed from https://www.mtsamples.com/

About the Author

Ujjwal Is a Principal Machine Learning Specialist Solution Architect in the Global Healthcare and Lifesciences team at Amazon Web Services. He works on the application of machine learning and deep learning to real world industry problems like medical imaging, unstructured clinical text, genomics, precision medicine, clinical trials and quality of care improvement. He has expertise in scaling machine learning/deep learning algorithms on the AWS cloud for accelerated training and inference. In his free time, he enjoys listening to (and playing) music and taking unplanned road trips with his family.

Ensure consistency in data processing code between training and inference in Amazon SageMaker

Written on January 10, 2019. Posted in Amazon.

In this blog post, we’ll introduce Inference Pipelines, a new feature in Amazon SageMaker that enables you to specify a sequence of steps that are executed in order for each inference request. Using this feature, you can reuse the data processing steps applied in training during inference without the need to maintain two separate copies of the same code. This ensures accuracy of your predictions and reduces development overhead. In our example, we’ll pre-process input data for training and inference using transformers in Apache Spark MLlib and train a machine learning model to predict the condition of a car using Amazon SageMaker’s XGBoost algorithm.

Introduction

Data scientists and developers spend a large portion of their time cleaning and preparing data before training machine learning (ML) models. This is because the real-world data cannot be used directly. There may be missing values, duplicate information, or multiple variations of the same information that need to be standardized. Additionally, data often needs to be transformed from one format to another so it can be used by machine learning algorithms. For example, the XGBoost algorithm can only accept numerical data, so if input data in strings or categorical format, it needs to be converted to numerical format before it can be used. In other cases, combining multiple input features into a single feature can result in more accurate machine learning models. For example, using a combination of temperature and humidity to predict flight delays produces more accurate models.

When you deploy machine learning models into production to make predictions on new data (a process called inference), you need to ensure that the same data processing steps that were used in training are also applied to each inference request. Otherwise, you can get incorrect prediction results. Until now, you had to maintain two copies of the same data processing steps for use in training and inference and ensure that they were always in sync. Also, the data processing steps had to be coupled either with the application code making requests to the machine learning models or baked into the inference logic. As a result, development overhead and complexity was higher than it needed to be, and your ability to iterate quickly was limited.

Now, you can reuse the same data processing steps from training during inference by creating an inference pipeline in Amazon SageMaker. You can use an inference pipeline to specify up to five data processing and inference steps. These steps are executed for every prediction request. You can reuse the data processing steps from training, so you only manage one copy of the data processing code, and you can independently update the data processing steps without the need to update your client application or inference logic.

Amazon SageMaker provides flexibility in how you compose your inference pipelines. For data processing steps, you can use built-in data transformers available in Scikit-Learn and Apache SparkMLlib to process and convert data from one format to another for common use cases, or you can write your custom transformers. For inference, you can use the built-in machine learning algorithms and frameworks available in Amazon SageMaker, or use your custom trained models. The same inference pipeline can be used for real-time and batch inferences. All steps in the inference pipelines execute on the same instance, so there is minimal latency impact.

Example

In this example, we’ll use Apache Spark MLLib for data processing using AWS Glue and reuse the data processing code during inference. We’ll use the Car Evaluation Data Set from UCI’s Machine Learning Repository. Our goal is to predict the acceptability of a specific car, amongst the values of unacc, acc, good, and vgood. At the core, it is a classification problem, and we will train a machine learning model using Amazon SageMaker’s built-in XGBoost algorithm. However, the dataset only contains six categorical string features – buying, maint, doors, persons, lug_boot, and safety and XGBoost can only process data that is in numerical format. Therefore we will pre-process the input data using SparkML StringIndexer followed by OneHotEncoder to convert it to a numerical format. We will also apply a post-processing step on the prediction result using IndexToString to convert our inference output back to their original labels that correspond to the predicted condition of the car.

We’ll write our pre-processing and post-processing scripts once, and apply them for processing training data using AWS Glue. Then, we will serialize and capture these artifacts produced by AWS Glue to Amazon S3 using MLeap, a common serialization format and execution engine for machine learning pipelines. This is so the pre-processing steps can be reused during inference for real-time requests using the SparkML Serving container that Amazon SageMaker provides. Finally, we will deploy the pre-processing, inference, and post-processing steps in an inference pipeline and will execute these steps for each real-time inference request.

The following figure summarizes the steps we will follow:

The following figure shows how the inference pipeline will be deployed on an endpoint for real-time inferences. The same inference pipeline can also be used in batch transform jobs for processing batch requests.

Start a notebook Instance and download the notebook

For this example, we will show two complementary workflows within the AWS ecosystem: The first uses the AWS Management Console, and the second uses Boto3 and a Jupyter notebook in an Amazon SageMaker notebook instance. Both workflows will start within Jupyter notebooks to help speed up some of the setup. This will help us place the necessary files in your account’s Amazon S3 bucket and set up the necessary AWS Identity and Access Management (IAM) roles so that Amazon SageMaker and AWS Glue have the necessary access to the data. You can also use the high-level Python SDK for deploying inference pipelines and can refer to this example. If you want to use Scikit-Learn instead of SparkML, you can refer to this example.

Start by going to Amazon SageMaker in the console by selecting Services, and Amazon SageMaker under Machine Learning. While this feature is available in any Region with Amazon SageMaker, for this example, make sure that your Region is set to Oregon in the upper right. We need to make sure that both our Amazon S3 bucket and the services we are using are in the same Region. In the Amazon SageMaker console, under Notebook, choose Notebook instances. Now choose Create notebook instance.

We need to give our new notebook instance a name. Let’s name it processing example. The default instance size will be sufficient for this exercise, as will most of the other settings. However, we still need to create an IAM role for Amazon SageMaker to execute its functions under. Under IAM role, choose Create a new role.

When creating a new IAM role, we can specify None for the S3 buckets you specify. This is because we are going to create an S3 bucket during this example with the name sagemaker as part of the name, and the default role will have access to this bucket. Select Create role.

Your notebook instance settings should now look like this:

Choose Create notebook instance.

After a few minutes, your Notebook instance will be ready. After its status is set to InService, select the Open Jupyter link.

Once the notebook has been loaded, open the tab labeled SageMaker examples and select the Advanced Functionality header. Choose the folder titled inference_pipeline_sparkml_xgboost_car_evaluation and choose Use option next to the .ipynb notebook. This will create a copy of the notebook and open it in the Jupyter notebook interface.

Preparing files and roles

Whether you are going to follow our example in the notebook or on the console, there is some initial setup. This is done more conveniently within the notebook. After your AWS environment is properly set up, feel free to follow along either in the notebook or on the console.

First, we need to set up an S3 bucket within your account and upload the necessary files to this bucket. To set up the bucket, we will run the first code block, labeled Setup S3 bucket. To run the cell while the code cell is selected, you can either press Shift and Return at the same time or choose the Run button at the top of the Jupyter notebook.

Make a note of the S3 bucket name that was created here. If you are planning to follow along in the console, you will need this name later.

Now we need to upload the raw data and the AWS Glue processing script to Amazon S3. We can do that by running the code blocks in the notebook labeled Upload files to S3. The first downloads the files to your notebook instance, while the second uploads them to the relevant bucket in S3.

Your S3 bucket is now set up for our example.

Pre-processing using Apache Spark in AWS Glue

If you take a look at the data we downloaded, you’ll notice all of the fields are categorical data in string format, which XGBoost can’t natively handle. To utilize the Amazon SageMaker XGBoost, we need to pre-process our data into a series of one hot encoded columns. Apache Spark provides pre-processing pipeline capabilities that we will utilize.

Furthermore, to make our endpoint particularly useful, we also generate a post-processor in this script, which can convert our label indexes back to their original labels. All of these processor artifacts will be saved to S3 for use in Amazon SageMaker later.

In this example, you download our pre-processor.py script, and we recommend that you take the time to explore how Spark pipelines are handled. Let’s take a look at the relevant part of the code where we define and fit our Spark pipeline:

    # Target label
    catIndexer = StringIndexer(inputCol="cat", outputCol="label")
    
    labelIndexModel = catIndexer.fit(train)
    train = labelIndexModel.transform(train)
    
    converter = IndexToString(inputCol="label", outputCol="cat")

    # Index labels, adding metadata to the label column.
    # Fit on whole dataset to include all labels in index.
    buyingIndexer = StringIndexer(inputCol="buying", outputCol="indexedBuying")
    maintIndexer = StringIndexer(inputCol="maint", outputCol="indexedMaint")
    doorsIndexer = StringIndexer(inputCol="doors", outputCol="indexedDoors")
    personsIndexer = StringIndexer(inputCol="persons", outputCol="indexedPersons")
    lug_bootIndexer = StringIndexer(inputCol="lug_boot", outputCol="indexedLug_boot")
    safetyIndexer = StringIndexer(inputCol="safety", outputCol="indexedSafety")
    

    # One Hot Encoder on indexed features
    buyingEncoder = OneHotEncoder(inputCol="indexedBuying", outputCol="buyingVec")
    maintEncoder = OneHotEncoder(inputCol="indexedMaint", outputCol="maintVec")
    doorsEncoder = OneHotEncoder(inputCol="indexedDoors", outputCol="doorsVec")
    personsEncoder = OneHotEncoder(inputCol="indexedPersons", outputCol="personsVec")
    lug_bootEncoder = OneHotEncoder(inputCol="indexedLug_boot", outputCol="lug_bootVec")
    safetyEncoder = OneHotEncoder(inputCol="indexedSafety", outputCol="safetyVec")

    # Create the vector structured data (label,features(vector))
    assembler = VectorAssembler(inputCols=["buyingVec", "maintVec", "doorsVec", "personsVec", "lug_bootVec", "safetyVec"], outputCol="features")

    # Chain featurizers in a Pipeline
    pipeline = Pipeline(stages=[buyingIndexer, maintIndexer, doorsIndexer, personsIndexer, lug_bootIndexer, safetyIndexer, buyingEncoder, maintEncoder, doorsEncoder, personsEncoder, lug_bootEncoder, safetyEncoder, assembler])

    # Train model.  This also runs the indexers.
    model = pipeline.fit(train)

This snippet defines both our pre-processor and post-processor. The pre-processor converts all the training columns from categorical labels into a vector of one hot encoded columns, while the post-processor converts our label index back to a human-readable string.

Also, it may be helpful to examine the code that allows us to serialize and store our Spark pipeline artifacts in the MLeap format. Because the Spark framework was designed around batch use cases, we need to use MLeap here. MLeap serializes Spark ML Pipelines and provides a run time for deploying for real-time, low latency use cases. Amazon SageMaker has launched a SparkML Serving container that uses MLEAP to make it easy to use for inference. Let’s look at the following code:

    # Serialize and store via MLeap  
    SimpleSparkSerializer().serializeToBundle(model, "jar:file:/tmp/model.zip", predictions)
    
    # Unzipping as SageMaker expects a .tar.gz file but MLeap produces a .zip file.
    import zipfile
    with zipfile.ZipFile("/tmp/model.zip") as zf:
        zf.extractall("/tmp/model")

    # Writing back the content as a .tar.gz file
    import tarfile
    with tarfile.open("/tmp/model.tar.gz", "w:gz") as tar:
        tar.add("/tmp/model/bundle.json", arcname='bundle.json')
        tar.add("/tmp/model/root", arcname='root')

    s3 = boto3.resource('s3')
    file_name = args['s3_model_bucket_prefix'] + '/' + 'model.tar.gz'
    s3.Bucket(args['s3_model_bucket']).upload_file('/tmp/model.tar.gz', file_name)

    os.remove('/tmp/model.zip')
    os.remove('/tmp/model.tar.gz')
    shutil.rmtree('/tmp/model')
    
    # Save postprocessor
    SimpleSparkSerializer().serializeToBundle(converter, "jar:file:/tmp/postprocess.zip", predictions)

    with zipfile.ZipFile("/tmp/postprocess.zip") as zf:
        zf.extractall("/tmp/postprocess")

    # Writing back the content as a .tar.gz file
    import tarfile
    with tarfile.open("/tmp/postprocess.tar.gz", "w:gz") as tar:
        tar.add("/tmp/postprocess/bundle.json", arcname='bundle.json')
        tar.add("/tmp/postprocess/root", arcname='root')

    file_name = args['s3_model_bucket_prefix'] + '/' + 'postprocess.tar.gz'
    s3.Bucket(args['s3_model_bucket']).upload_file('/tmp/postprocess.tar.gz', file_name)

    os.remove('/tmp/postprocess.zip')
    os.remove('/tmp/postprocess.tar.gz')
    shutil.rmtree('/tmp/postprocess')

You’ll notice that we unzip this archive and re-archive it into a tar.gz file that Amazon SageMaker recognizes.

To run our Spark pipelines in Amazon SageMaker, we are going to utilize our notebook instance. In the Amazon SageMaker notebook, you can run the cell labeled Create and run AWS Glue Preprocessing Job, which looks like this:

### Create and run AWS Glue Preprocessing Job

# Define the Job in AWS Glue
glue = boto3.client('glue')

try:
    glue.get_job(JobName='preprocessing-cars')
    print("Job already exists, continuing...")
except glue.exceptions.EntityNotFoundException:
    response = glue.create_job(
        Name='preprocessing-cars',
        Role=role,
        Command={
            'Name': 'glueetl',
            'ScriptLocation': 's3://{}/scripts/preprocessor.py'.format(bucket_name)
        },
        DefaultArguments={
            '--s3_input_data_location': 's3://{}/data/car.data'.format(bucket_name),
            '--s3_model_bucket_prefix': 'model',
            '--s3_model_bucket': bucket_name,
            '--s3_output_bucket': bucket_name,
            '--s3_output_bucket_prefix': 'output',
            '--extra-py-files': 's3://{}/scripts/python.zip'.format(bucket_name),
            '--extra-jars': 's3://{}/scripts/mleap_spark_assembly.jar'.format(bucket_name)
        }
    )

    print('{}n'.format(response))

# Run the job in AWS Glue
try:
    job_name='preprocessing-cars'
    response = glue.start_job_run(JobName=job_name)
    job_run_id = response['JobRunId']
    print('{}n'.format(response))
except glue.exceptions.ConcurrentRunsExceededException:
    print("Job run already in progress, continuing...")

    
# Check on the job status
import time

job_run_status = glue.get_job_run(JobName=job_name,RunId=job_run_id)['JobRun']['JobRunState']
while job_run_status not in ('FAILED', 'SUCCEEDED', 'STOPPED'):
    job_run_status = glue.get_job_run(JobName=job_name,RunId=job_run_id)['JobRun']['JobRunState']
    print (job_run_status)
    time.sleep(30)

This cell will define the job in AWS Glue, run the job, and monitor the status until the job has completed.

In summary, we have now pre-processed our data into a training and validation set, with one hot encoding for all of the string values. We have also serialized a pre-processor and post-processor into the MLeap format so that we can reuse these pipelines in our endpoint later. The next step is to train a machine learning model. We will be using the Amazon SageMaker built-in XGBoost for this.

Training an Amazon SageMaker XGBoost model

Now that we have our data pre-processed in a format that XGBoost recognizes, we can run a simple training job to train a classifier model on our data. We can do this from the console with the following settings: Set the Job name to xgboost-cars (you may need to append unique characters to this if you’ve run an identical job name previously). Select the IAM role you created above for your Notebook instance. For Algorithm source, choose Amazon SageMaker built-in algorithm, and under Algorithm choose XGBoost.

Under Hyperparameters set early_stopping_rounds to 5, num_rounds to 10, change the objective to multi:softmax, num_class to 4, and eval_metric to mlogloss. This will configure XGBoost to run a classification model that works with the data was pre-processed in AWS Glue.

For the Input data configuration, leave the Channel name as train, for Content type put csv, Compression type as None, Record wrapper as None, S3 data type as S3Prefix, and S3 data distribution type as FullyReplicated. Finally, your S3 location should be s3://<your-bucket-name>/output/train .

Select Add channel, and repeat this input for the validation set. Set the Channel name as validation, for Content type put csv, Compression type as None, Record wrapper as None, S3 data type as S3Prefix, and S3 data distribution type as FullyReplicated. Finally, your S3 location should be s3://<your-bucket-name>/output/validation .

Finally, for the Output data configuration, set the S3 output path to s3://<your-bucket-name>/xgb.

Choose Create training job.

Alternatively, we can run this entire process in our Jupyter notebook. Run the following cell, labeled Run Amazon SageMaker XGBoost Training Job:

### Run Amazon SageMaker XGBoost Training Job

from sagemaker.amazon.amazon_estimator import get_image_uri

import random
import string

# Get XGBoost container image for current region
training_image = get_image_uri(region, 'xgboost', repo_version="latest")

# Create a unique training job name
training_job_name = 'xgboost-cars-'+''.join(random.choice(string.ascii_lowercase + string.digits) for _ in range(8))

# Create the training job in Amazon SageMaker
sagemaker = boto3.client('sagemaker')
response = sagemaker.create_training_job(
    TrainingJobName=training_job_name,
    HyperParameters={
        'early_stopping_rounds ': '5',
        'num_round': '10',
        'objective': 'multi:softmax',
        'num_class': '4',
        'eval_metric': 'mlogloss'

    },
    AlgorithmSpecification={
        'TrainingImage': training_image,
        'TrainingInputMode': 'File',
    },
    RoleArn=role,
    InputDataConfig=[
        {
            'ChannelName': 'train',
            'DataSource': {
                'S3DataSource': {
                    'S3DataType': 'S3Prefix',
                    'S3Uri': 's3://{}/output/train'.format(bucket_name),
                    'S3DataDistributionType': 'FullyReplicated'
                }
            },
            'ContentType': 'text/csv',
            'CompressionType': 'None',
            'RecordWrapperType': 'None',
            'InputMode': 'File'
        },
        {
            'ChannelName': 'validation',
            'DataSource': {
                'S3DataSource': {
                    'S3DataType': 'S3Prefix',
                    'S3Uri': 's3://{}/output/validation'.format(bucket_name),
                    'S3DataDistributionType': 'FullyReplicated'
                }
            },
            'ContentType': 'text/csv',
            'CompressionType': 'None',
            'RecordWrapperType': 'None',
            'InputMode': 'File'
        },
    ],
    OutputDataConfig={
        'S3OutputPath': 's3://{}/xgb'.format(bucket_name)
    },
    ResourceConfig={
        'InstanceType': 'ml.m4.xlarge',
        'InstanceCount': 1,
        'VolumeSizeInGB': 1
    },
    StoppingCondition={
        'MaxRuntimeInSeconds': 3600
    },)

print('{}n'.format(response))

# Monitor the status until completed
job_run_status = sagemaker.describe_training_job(TrainingJobName=training_job_name)['TrainingJobStatus']
while job_run_status not in ('Failed', 'Completed', 'Stopped'):
    job_run_status = sagemaker.describe_training_job(TrainingJobName=training_job_name)['TrainingJobStatus']
    print (job_run_status)
    time.sleep(30)

This will run our XGBoost training job in Amazon SageMaker, and monitor the progress of the job. Once the job status is ‘Completed,’ you can move on to the next cell.

This will train the model on the preprocessed data we created earlier. After a few minutes, usually less than 5, the job should be completed successfully, and it should output our model artifacts to the S3 location we specified. After this is done, we can deploy an inference pipeline that consists of pre-processing, inference, and post-processing steps.

Deploying an Amazon SageMaker endpoint using your pre-processing artifacts

Now that we have a set of model artifacts, we can set up an inference pipeline that executes sequentially in Amazon SageMaker. We start by setting up a model, which will point to all of our model artifacts, then we setup an endpoint configuration to specify our hardware, and finally we can stand up an endpoint. With this endpoint, we will pass the raw data and no longer need to write pre-processing logic in our application code. The same pre-processing steps that ran for training can be applied to inference input data for better consistency and ease of management.

From the Amazon SageMaker console, select Models choose Inference options on the left. Choose Create model. This will bring you to the model settings. For the Model name, put pipeline-xgboost. For the IAM role, select the SageMaker execution role you created earlier for your Notebook instance. It should look like this:

For Container definition 1, under Container input options, choose Provide model artifacts and inference image location. Under Location of inference image enter 246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.2. This is the SparkML serving image provided by Amazon SageMaker. The full list of SparkML images provided for every region is available here. Under Location of model artifacts, enter s3://<your-bucket-name>/model/model.tar.gz. These are the pre-processor artifacts created when running the AWS Glue job we ran earlier.

Next, we need to define a schema for our SparkML serving container via an Environment variable. For the Key enter SAGEMAKER_SPARKML_SCHEMA, and for Value enter:

{"input":[{"type":"string","name":"buying"},{"type":"string","name":"maint"},{"type":"string","name":"doors"},{"type":"string","name":"persons"},{"type":"string","name":"lug_boot"},{"type":"string","name":"safety"}],"output":{"type":"double","name":"features","struct":"vector"}}

Select Add container.

For Container definition 2, under Container input options, select Provide model artifacts and inference image location.

Under Location of inference image enter 433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest. This is the XGBoost serving container provided by Amazon SageMaker. Under Location of model artifacts, enter s3://<your-bucket-name>/xgb/xgb/output/model.tar.gz. This archive contains the serialized XGBoost model artifacts from our earlier training job.

No Environment variables are needed for Container definition 2.

Choose Add container.

Finally, for Container definition 3, under Container input options, select Provide model artifacts and inference image location. Under Location of inference image enter 246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.2. This is the same SparkML serving image provided by Amazon SageMaker that we used for Container definition 1. Under Location of model artifacts, enter s3://<your-bucket-name>/model/postprocess.tar.gz. This is the reverse indexer that allows us to go from the indexed value output by XGBoost back to the original label.

Next we need to define a schema for our SparkML serving container using an Environment variable. For the Key enter SAGEMAKER_SPARKML_SCHEMA, and for Value enter:

{"input": [{"type": "double", "name": "label"}], "output": {"type": "string", "name": "cat"}}

After all three container definitions are in place, choose Create model.

You can now find your models underneath Inference, Models in the Amazon SageMaker console. Select the pipeline-xgboost model from the list to bring up the model details. Now choose the Create endpoint button.

Under Endpoint, Endpoint name, input pipeline-xgboost.

Under New endpoint configuration provide the Endpoint configuration name of pipeline-xgboost. Choose Create endpoint configuration.

Finally, choose Create endpoint at the bottom.

Alternatively, all of these steps can be run in the notebook by running the cell labeled Create SageMaker endpoint with pipeline:

### Create SageMaker endpoint with pipeline
from botocore.exceptions import ClientError

# Image locations are published at: https://github.com/aws/sagemaker-sparkml-serving-container
sparkml_images = {
    'us-west-1': '746614075791.dkr.ecr.us-west-1.amazonaws.com/sagemaker-sparkml-serving:2.2',
    'us-west-2': '246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.2',
    'us-east-1': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sparkml-serving:2.2',
    'us-east-2': '257758044811.dkr.ecr.us-east-2.amazonaws.com/sagemaker-sparkml-serving:2.2',
    'ap-northeast-1': '354813040037.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-sparkml-serving:2.2',
    'ap-northeast-2': '366743142698.dkr.ecr.ap-northeast-2.amazonaws.com/sagemaker-sparkml-serving:2.2',
    'ap-southeast-1': '121021644041.dkr.ecr.ap-southeast-1.amazonaws.com/sagemaker-sparkml-serving:2.2',
    'ap-southeast-2': '783357654285.dkr.ecr.ap-southeast-2.amazonaws.com/sagemaker-sparkml-serving:2.2',
    'ap-south-1': '720646828776.dkr.ecr.ap-south-1.amazonaws.com/sagemaker-sparkml-serving:2.2',
    'eu-west-1': '141502667606.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-sparkml-serving:2.2',
    'eu-west-2': '764974769150.dkr.ecr.eu-west-2.amazonaws.com/sagemaker-sparkml-serving:2.2',
    'eu-central-1': '492215442770.dkr.ecr.eu-central-1.amazonaws.com/sagemaker-sparkml-serving:2.2',
    'ca-central-1': '341280168497.dkr.ecr.ca-central-1.amazonaws.com/sagemaker-sparkml-serving:2.2',
    'us-gov-west-1': '414596584902.dkr.ecr.us-gov-west-1.amazonaws.com/sagemaker-sparkml-serving:2.2'
}



try:
    sparkml_image = sparkml_images[region]

    response = sagemaker.create_model(
        ModelName='pipeline-xgboost',
        Containers=[
            {
                'Image': sparkml_image,
                'ModelDataUrl': 's3://{}/model/model.tar.gz'.format(bucket_name),
                'Environment': {
                    'SAGEMAKER_SPARKML_SCHEMA': '{"input":[{"type":"string","name":"buying"},{"type":"string","name":"maint"},{"type":"string","name":"doors"},{"type":"string","name":"persons"},{"type":"string","name":"lug_boot"},{"type":"string","name":"safety"}],"output":{"type":"double","name":"features","struct":"vector"}}'
                }
            },
            {
                'Image': training_image,
                'ModelDataUrl': 's3://{}/xgb/{}/output/model.tar.gz'.format(bucket_name, training_job_name)
            },
            {
                'Image': sparkml_image,
                'ModelDataUrl': 's3://{}/model/postprocess.tar.gz'.format(bucket_name),
                'Environment': {
                    'SAGEMAKER_SPARKML_SCHEMA': '{"input": [{"type": "double", "name": "label"}], "output": {"type": "string", "name": "cat"}}'
                }

            },
        ],
        ExecutionRoleArn=role
    )

    print('{}n'.format(response))
    
except ClientError:
    print('Model already exists, continuing...')


try:
    response = sagemaker.create_endpoint_config(
        EndpointConfigName='pipeline-xgboost',
        ProductionVariants=[
            {
                'VariantName': 'DefaultVariant',
                'ModelName': 'pipeline-xgboost',
                'InitialInstanceCount': 1,
                'InstanceType': 'ml.m4.xlarge',
            },
        ],
    )
    print('{}n'.format(response))

except ClientError:
    print('Endpoint config already exists, continuing...')


try:
    response = sagemaker.create_endpoint(
        EndpointName='pipeline-xgboost',
        EndpointConfigName='pipeline-xgboost',
    )
    print('{}n'.format(response))

except ClientError:
    print("Endpoint already exists, continuing...")


# Monitor the status until completed
endpoint_status = sagemaker.describe_endpoint(EndpointName='pipeline-xgboost')['EndpointStatus']
while endpoint_status not in ('OutOfService','InService','Failed'):
    endpoint_status = sagemaker.describe_endpoint(EndpointName='pipeline-xgboost')['EndpointStatus']
    print(endpoint_status)
    time.sleep(30)

After a few minutes, Amazon SageMaker creates an endpoint using all three of the provided containers on a single instance. When the endpoint is invoked with a payload, the output of the earlier containers is passed as the input to the later containers, until the payload reaches its final output.

In this example, the raw string categories are sent to our preprocessing MLeap container and run through a Spark pipeline to one hot encode the features. Then the one hot encoded data is sent to our XGBoost container, where our model makes a prediction to an index. The index is then fed to our post-processing MLeap container, with a Spark model artifact, which converts the index back to its original label string, which is returned to the client. These are the same steps you used for preprocessing training data, and it was only necessary to write the code once.

Testing the endpoint, monitoring, and metrics

After the Amazon SageMaker endpoint is InService, we can test it by calling the invoke-endpoint command from the AWS CLI. For example, we can use the following command:

aws sagemaker-runtime invoke-endpoint --point-name pipeline-xgboost --content-type text/csv --body low,low,5more,more,big,high out

If successful, you should see a message like this:

{
    "ContentType": "text/csv",
    "InvokedProductionVariant": "default-variant-name"
}

The output of the invocation appears in the file out, and you can see it with the following command:

cat out

If successful, this should return one of the following values: unacc, acc, good, vgood.

Alternatively, this can be done in the notebook by running the cell labeled Invoke the Endpoint:

### Invoke the Endpoint
client = boto3.client('sagemaker-runtime')

sample_payload=b'low,low,5more,more,big,high'

response = client.invoke_endpoint(
    EndpointName='pipeline-xgboost',
    Body=sample_payload,
    ContentType='text/csv'
)

print('Our result for this payload is: {}'.format(response['Body'].read().decode('ascii')))

Metrics for your inference pipelines

When building your deployments, you may find you need to monitor or debug your endpoint, and the new inference pipelines change how the logs appear in Amazon CloudWatch. You can now see logs and metrics for each of your containers within a single endpoint. To see these logs, return to the AWS Management Console, and go to Services, Amazon SageMaker, Inference, and then Endpoints. Locate your pipeline-xgboost endpoint in the list, and select it by the name to see the endpoint details.

Locate the Monitor section, and you will find a View logs link. Select it, and you will be taken to a CloudWatch Logs interface. For our example endpoint, there are three sets of log streams, one for each container. It should look like this:

If an invocation gives an error, the relevant output will appear in the relevant log stream. Whatever is output to stdout for each container will end up at this location.

Cleaning up your AWS environment

When you are done with this experiment, make sure to delete your Amazon SageMaker endpoint to avoid incurring unexpected costs. You can do this from the console by going to Services, Amazon SageMaker, Inference, and Endpoints. Choose pipeline-xgboost under Endpoints. In the upper-right, choose Delete. This will remove the endpoint from your AWS account. You will also want to make sure to stop your Notebook instance.

A more extensive cleanup can be done from your Notebook instance by running the code cell labeled Environment cleanup, as follows:

### Environment cleanup

print('Deleting SageMaker endpoint...')
result = sagemaker.delete_endpoint(
    EndpointName='pipeline-xgboost'
)
print(result)

print('Deleting SageMaker endpoint config...')
result = sagemaker.delete_endpoint_config(
    EndpointConfigName='pipeline-xgboost'
)
print(result)

print('Deleting SageMaker model...')
result = sagemaker.delete_model(
    ModelName='pipeline-xgboost'
)
print(result)

print('Deleting Glue job...')
result = glue.delete_job(
    JobName='preprocessing-cars'
)
print(result)

Conclusion

Congratulations! You have now learned how to do pre-processing and post-processing using Apache Spark in AWS Glue as part of your Amazon SageMaker ML workflow. You can now deploy a sequence of five data processing and inference steps that are executed on each inference request in Amazon SageMaker. With this new feature, you can write your pre-processing code once, and use it for both training and inference (real-time or batch). This will improve consistency between your training and deployment of your ML models. Furthermore, with the new SparkML Serving container provided by Amazon SageMaker, you can make use of Spark pipelines for real-time data. Feel free to adapt this process to different data sets or different models.

Citations

Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.

About the Authors

Thomas Hughes is a Data Scientist with AWS Professional Services. He has a PhD from UC Santa Barbara and has tackled problems in the social sciences, education, and advertising fields. He is currently working to solve some of the trickiest problems that arise when machine learning meets big data.

Urvashi Chowdhary is a Senior Product Manager for Amazon SageMaker. She is passionate about working with customers and making machine learning more accessible. In her spare time, she loves sailing, paddle boarding, and kayaking.

How simpleshow uses Amazon Polly to voice stories in their explainer videos

Written on January 10, 2019. Posted in Amazon.

More than ten years ago, simpleshow started to help their customers explain materials, ideas, and products by using three-minute animated explainer videos. These explainer videos use two hands and simple, black and white illustration to lead viewers through a story. Today, the company also provides mysimpleshow.com, a platform that allows anyone to produce high-quality explainer videos about virtually any topic. This platform is integrated with Amazon Polly, so anyone can use natural sounding voices for explainer videos, as long as transcripts are provided.

First I’ll tell you a bit more about simpleshow, and then I’ll show you how mysimpleshow is integrated with Amazon Polly.

Over the past ten years, simpleshow has scientifically proven the effectiveness of the explainer video format. simpleshow experts have helped customers present their topics in a simple and entertaining way in thousands of explainer videos.

The production of these videos requires many talents in the team:

Storytelling: Certified simpleshow concept writers create stories around basic facts.
Illustration: Talented artists illustrate objects and concepts at the right abstraction level.
Visualization: Storyboard artists and motion designers visualize the stories and animate them.
Voice: A network of professional speakers ensures the right tone.

The simpleshow team realized that explainer videos are a very versatile format, so they wanted to make the resource available to even more users in even more subject areas. Therefore, the simpleshow team created mysimpleshow.com, a platform that allows anyone to produce high-quality explainer videos about virtually any topic. mysimpleshow uses artificial intelligence (AI) and has an easy-to-use user interface..

The process at mysimpleshow is very simple:

First, users write their story. mysimpleshow provides guidance with templates and inspiration with sample stories that cover a broad selection of topics.
The text of the story is then analyzed by the artificial intelligence at the core of mysimpleshow—the Explainer Engine. The Explainer Engine uses natural language processing (NLP) to identify meaningful keywords, people, and places. Using Wikidata, the knowledge base behind Wikipedia, keyword terms are then generalized. For example, if the name of a tennis player or basketball player is present in the story, the Explainer Engine uses Wikidata to identify the profession of the person, as a result a tennis racket or a basketball is suggested as a suitable illustration.

This means that even if an illustration hasn’t been created for the person in the story, the story is visualized in a highly fitting way. For location names in the story, the Explainer Engine finds out the number of inhabitants and offers a suitable skyline as an illustration.
At the click of a button, the Explainer Engine searches for the right image in the simpleshow database of all illustrations. All illustrations are tagged using a multi-tiered system.

simpleshow teams up with Amazon Polly

Spoken words are crucial for the transfer of knowledge in explainer videos. Most of the information is transmitted by voice. The illustrations and animations draw the user’s attention and support the storytelling for better understanding. As a result, the multisensory content is better retained than just a voice or animation alone.

mysimpleshow supports its users with another important component of an explainer video—it provides a computer-generated voice that reads the users’ stories. For reading the story, mysimpleshow uses Amazon Polly.

Why simpleshow uses Amazon Polly

simpleshow uses Amazon Polly for several reasons as the automated voice for explainer videos.

mysimpleshow is an AWS-based software as a service (SaaS), making extensive use of AWS Elastic Beanstalk, Amazon DynamoDB, Amazon Simple Workflow Service (SWF), Amazon Simple Queue Service (SQS), and other AWS services. The integration of mysimpleshow and Amazon Polly was straightforward.
With Amazon Polly, simpleshow was able to optimize the costs for text-to-speech. The team was able to significantly simplify maintenance and operations and improve scalability.
Amazon Polly supports many languages. mysimpleshow already exists in English and German. Amazon Polly voices are available for possible extension to many other languages.
The Amazon Polly voices are of high quality.
Amazon Polly allows customized pronunciation of words.

All of these reasons are important, but the last one especially stood out. The Amazon Polly pronunciation lexicons enable mysimpleshow to customize the pronunciation of words.

For example, in the German language, new words are often formed by putting together existing words. Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz (literally “beef labeling supervision duties delegation law”)[1] is a very long German compound word. Amazon Polly can handle these linguistic compounds very well on its own. However, in German, this type of word formation is also extended to words imported from other languages. For example, the name of the product—mysimpleshow— is a combination of the words my-simple-show. German-speaking users are advised to divide compounds derived from English words into the individual words. This usually significantly improves the pronunciation of these words.

Some code examples

The following code samples illustrate how mysimpleshow uses Amazon Polly.

mysimpleshow uses Simple Synthesis Markup Language (SSML) for requests to Amazon Polly. SSML gives the highest level of control over how the voices are rendered. In addition, the SSML representation is very useful for debugging purposes.

SSML

<speak><prosody volume="+20dB" rate="100%"><break time="500.0ms"/>This is Tom. He wants to buy a used car. So he starts browsing the internet.</prosody></speak>

In the first step, timings for the spoken words are requested from Amazon Polly. Timings for the keywords define when the illustration related to the keyword is placed in the scene. The spoken words basically define the timeline of the video. This is somewhat specific to Amazon Polly. Other TTS services may provide MP3 and timings together.

TTS Call Timings

import com.amazonaws.services.polly.AmazonPolly;
import com.amazonaws.services.polly.model.OutputFormat;
import com.amazonaws.services.polly.model.SynthesizeSpeechRequest;
import com.amazonaws.services.polly.model.SynthesizeSpeechResult;
import com.amazonaws.services.polly.model.TextType;
import com.mysimpleshow.backend.common.enums.Voice;
import com.mysimpleshow.backend.common.service.DictionaryService;

class TTSService {

    private DictionaryService dictionaryService;

    private AmazonPolly polly;

    public SynthesizeSpeechResult synthesizePolly(final String text, final Voice voice) {
        final SynthesizeSpeechRequest request = new SynthesizeSpeechRequest()
                .withOutputFormat(OutputFormat.Json)
                .withText(text)
                .withTextType(TextType.Ssml)
                .withVoiceId(voice.getVoiceId())
                .withLexiconNames(dictionaryService.getDictionaryNameForLocale(voice.getLocale());

        return polly.synthesizeSpeech(request);
    }
}

In the second step, the MP3 for the spoken words is generated. This is pretty much the same call as before – only the result is now MP3 instead of the JSON with the earlier timings.

TTS Call MP3

import com.amazonaws.services.polly.AmazonPolly;
import com.amazonaws.services.polly.model.OutputFormat;
import com.amazonaws.services.polly.model.SynthesizeSpeechRequest;
import com.amazonaws.services.polly.model.SynthesizeSpeechResult;
import com.amazonaws.services.polly.model.TextType;
import com.mysimpleshow.backend.common.enums.Voice;
import com.mysimpleshow.backend.common.service.DictionaryService;

class TTSService {

    private DictionaryService dictionaryService;

    private AmazonPolly polly;

    public SynthesizeSpeechResult synthesizePolly(final String text, final Voice voice) {
        final SynthesizeSpeechRequest request = new SynthesizeSpeechRequest()
            .withOutputFormat(OutputFormat.Mp3)
            .withText(text)
            .withTextType(TextType.Ssml)
            .withVoiceId(voice.getVoiceId())
            .withLexiconNames(dictionaryService.getDictionaryNameForLocale(voice.getLocale()));

        return polly.synthesizeSpeech(request);
    }
}

In the third and final step, the Amazon Polly voice, background music, more sounds, and the animation are combined into a finished video using ffmpeg.

ffmpeg -y -i <video-path> -i <sound-path> <output-path> -shortest

mysimpleshow can provide a great experience for our customers by using the variety of human voices Amazon Polly provides. In addition, using Amazon Polly is low cost; it provides an easy integration through its API; and it gives us the ability to customize voices. Amazon Polly has been a crucial AI service for mysimpleshow, and we look forward to innovating more with it.

About the Author

Hans-Christian Pahlig is Head of IT at simpleshow. He leads the development and the IT infrastructure teams. He is an internationally experienced IT expert in the media industry. As a math graduate, Hans-Christian comes from the school of thought where neural networks were conceived. If not on the job, he is inspired by the art and zen of landscape photography.

Automated and continuous deployment of Amazon SageMaker models with AWS Step Functions

Written on January 9, 2019. Posted in Amazon.

Amazon SageMaker is a complete machine learning (ML) workflow service for developing, training, and deploying models, lowering the cost of building solutions, and increasing the productivity of data science teams. Amazon SageMaker comes with many predefined algorithms. You can also create your own algorithms by supplying Docker images, a training image to train your model and an inference model to deploy to a REST endpoint.

Automating the build and deployment of machine learning models is an important step in creating production machine learning services. Models need to be retrained and deployed when code and/or data are updated. In this blog post we will discuss a technique for Amazon SageMaker automation using AWS Step Functions. We’ll demonstrate it through a new open source project, aws-sagemaker-build. This project provides a full implementation of our workflow. It includes Jupyter notebooks showing how to create, launch, stop, and track the progress of the build using Python and Amazon Alexa! The goal of aws-sagemaker-build is to provide a repository of common and useful pipelines that use Amazon SageMaker and AWS Step Functions that can be shared with the community and grown by the community.

The code is open source, and it is hosted on GitHub here.

Custom models

This blog post won’t discuss the details of how to write and design your Dockerfiles for training or inference. For more details you can dive deep into our documentation here:

What AWS services do we need?

We focus on serverless technologies and managed services to keep this solution simple. It’s important for our solution to be scalable and cost effective even when training takes a long time. Training large neural networks can sometimes take days to complete!

AWS Step Functions

There are several AWS services for workflow orchestration such as AWS CloudFormation, AWS Step Functions, AWS CodePipeline, AWS Glue and others. For our application AWS Step Functions provides the right tools to implement our workflow. Step Functions act like a state machine. They begin with an initial state and use AWS Lambda Functions to transform the state, — changing, branching, or looping through state as needed. This abstraction makes Step Functions very flexible. They also can run for up to one year and are only charged by the transition, making them a scalable and cost efficient tool for our use case.

AWS CodeBuild

AWS CodeBuild is an on demand code building service. We will use it to build our Docker images and push them to an Amazon Elastic Container Registry (Amazon ECR) repository. For more information see the documentation.

AWS Lambda

Step Functions use Lambda functions to do the work of the build. There are functions for starting training, checking on training status, starting CodeBuild, checking on CodeBuild, and so on.

One challenge was to figure out how to provide configuration parameters to different stages of the build, given that some parameters would be static, others would be dependent on previous build steps, and others would be specific to a customers need. For example, the training and inference image IDs need to be passed on to the training and deployment steps, the Amazon S3 bucket name is static to the pipeline, and the ML instances used for training and inference need to be chosen by the individual user. The solution was to also use Lambda functions. There are two Lambda functions that take as input the current state of the build and output the training job and endpoint configurations. You can edit or overwrite the code of these functions to suit your needs. For example, the Lambda function could query a data catalog to get the Amazon S3 location of a data set.

Lambdas functions are also used for various custom resources needed in setting up and tearing down the CloudFormation script. Custom resource Lambda functions include: clearing out an S3 bucket on stack delete, uploading a Jupyter notebook to the Amazon SageMaker notebook instance, clearing SageMaker resources

AWS Systems Manager Parameter Store

AWS Systems Manager Parameter Store provides a durable, centralized, and scalable data store. We will store the parameters of our training jobs and deployment here and the Step Functions’ Lambda functions will query the parameters from this store. To change the parameters you just change the JSON string in the store. The example notebooks included with aws-sagemaker-build show you how to do this.

Amazon SNS

Amazon Simple Notification service (Amazon SNS) is used for starting builds and for notifications. AWS CodeCommit, GitHub, and Amazon S3 can publish to a start-build SNS topic when a change is made. We also publish to a notifications SNS topic when the build has started, finished, and failed. You can use these topics to connect aws-sagemaker-build to other systems.

Deployment steps

To deploy an model using Amazon Sagemaker you need to do the following steps.

If using custom algroithms, build the Docker images and upload to Amazon ECR.
Create an Amazon SageMaker training job and wait to complete.
Create an Amazon SageMaker model.
Create an Amazon SageMaker endpoint configuration.
Create/update a SageMaker endpoint and wait for it to finish.

Those are the steps that aws-sagemaker-build will automate using Step Functions.

Achitecture

The following diagram describes the flow of the Step Functions state machine. There are several points where the state machine has to poll and wait for a task to be completed.
The following diagram shows how the services work together

Launch

The following CloudFormation template will create resources in your account. These include an Amazon SageMaker notebook instance and an Amazon SageMaker Endpoint, both resources you pay for by the hour.

Note: To order to prevent unnecessary charges, please tear down this stack when you are done!

Click the “Lauch Stack” button below to launch the aws-sagemaker-build CloudFormation template. Choose a name for your CloudFormation stack and leave all the other parameters at defaults.

Once your template has finished being created follow these instructions:

In the outputs of your stack choose the link next to NoteBookUrl
In the Jupyter browser choose the SageBuild folder so see the example notebooks for how to use aws-sagemaker-build.

Set up events and notifications

The CloudFormation stack can automatically create a CodeCommit repo and an S3 bucket that will launch a build when any updates happen. Do this by setting the “BucketTriggerBuild” or “BucketTriggerBuild” stack parameters to non-default values. You can have other events trigger rebuilds by publishing to the LaunchTopic SNS topic in the outputs of the CloudFormation template. To setup a GitHub repo to trigger rebuilds on changes follow the instructions in this blog post You can also have the TrainStatusTopic send email or text you updates by subscribing it.

Alexa skill

The CloudFormation stack has an output named AlexaLambdaArn. You can use this Lambda function to create an Alexa skill to manage aws-sagemaker-build:

Download the model definition:json
The Lambda function is already configured with permissions to be called by Alexa.
Create an Amazon Developer account if you don’t have one. This is different than your AWS account.
Create the Alexa skill following these instructions:
1. Log In to the Amazon developer console and choose the “Alexa Skills Kit” tab.
2. In the next screen choose “custom” for your skill type and give your skill a name.
3. In the menu on the left choose “Invocation” and give your skill an invocation name like “sagebuild”.
4. In the menu on the left choose “Endpoint” and copy the AlexaLambdaArn output from your aws-sagemaker-build stack and paste into the default region field under “AWS Lambda Arn”
5. In the menu on the left choose “JSON Editor” and copy the model definition you downloaded and paste in to the editor
6. Choose “Save Model” and then “Build Model”

You can now have a workflow where you push code changes to a repository (or upload new data), make some dinner, and periodically ask Alexa, “Alexa, ask SageBuild, ‘Is my build done?’.” I have done this and it is very awesome!

Validation

aws-sagemaker-build does not do any validation on your training. This means that if your training job does not fail then the model is deployed to the endpoint, even if that model does not perform better than the current model. Your training job should contain logic to validate your model and cause the training to fail if necessary.

Frameworks

aws-sagemaker-build supports four different configurations: Bring-Your-Own-Docker (BYOD), Amazon SageMaker algorithms, TensorFlow, and MXNet. The configuration is set as a parameter of the CloudFormation template but can be changed after deployment. For the TensorFlow and MXNet configurations the user scripts are copied and saved with version names so that roll backs or redeployment of old versions works correctly. The notebook that is launched in the aws-sagemaker-build stack has examples of each different configuration.

Advanced

Dev/Prod deployments

First Create a CodeCommit repo and an Amazon S3 data bucket. Then launch two aws-sagemaker-build stacks, both using the repo and the S3 bucket you just created. Set one stack to use the “master” branch and another to use the “dev” branch.

Here is a diagram of what that architecture would look like:

Amazon CloudWatch Events

With Amazon CloudWatch Events you can publish to your stack’s LaunchTopic topic on a regular schedule (for example, everyday at 5pm or once a week on Friday at 9pm). You can use this in a workflow in which you have a smaller development dataset that you develop with during the week. You pushing your tested changes to your code branch, and you only redeploy this branch at the end of the week. This way you’re not constantly training large models and replacing them, which can be very expensive.

Conclusion and let us know what you think

If this blog post helps you or inspires you to solve a problem we would love to hear about it! We also have the code up on GitHub for you to use and extend. Contributions are always welcome!

Acknowledgements

Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

About the Author

John Calhoun is a machine learning specialist for AWS Public Sector. He works with our customers and partners to provide leadership on machine learning, helping them shorten their time to value when using AWS.

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Amazon

The DXC solution on AWS

First: Build a data lake on Amazon S3

Second: Choose the right machine learning tool and algorithm

Third: Train the model

Training data preparations:

Training job with hyperparameter optimization (HPO)

Fourth: Orchestrate data preparation, model training, and model deployment on Amazon SageMaker using AWS Step Functions

Fifth: Call the inference

Sixth: Build a CI/CD pipeline to automate the solution deployment

Seventh: Enable it for the support team

The end-to-end solution

Conclusion – What did DXC achieve?

About the Authors

Step 1. Set up

Step 2. Load the Keras model using the JSON and weights file

Step 3. Export the Keras model to the TensorFlow ProtoBuf format

Step 4. Convert TensorFlow model to an Amazon SageMaker-readable format

Step 5. Deploy the trained model

Step 6. Invoke the endpoint

Step 7. Clean up

Conclusion

Using TensorFlow Serving to run models on EI

Using EIPredictor to run models on EI

Example for running a model with EI Predictor

Conclusion

About the Author

About the Authors

The architecture

The individual components

Identify PHI

Mask entities

De-identify entities

Building the Boto3 Lambda layer

Building the state machine

Testing the state machine

Conclusion and next steps

About the Author

Solution architecture

Instructions for deploying the solution

Setting up Amazon Elasticsearch Service and AWS Lambda

Setting up the local environment

Visualizing entitles on Kibana

Conclusion

About the Author

Introduction

Example

Start a notebook Instance and download the notebook

Preparing files and roles

Pre-processing using Apache Spark in AWS Glue

Training an Amazon SageMaker XGBoost model

Deploying an Amazon SageMaker endpoint using your pre-processing artifacts

Testing the endpoint, monitoring, and metrics

Metrics for your inference pipelines

Cleaning up your AWS environment

Conclusion

Citations

About the Authors

simpleshow teams up with Amazon Polly

Why simpleshow uses Amazon Polly

Some code examples

About the Author

Custom models

What AWS services do we need?

AWS Step Functions

AWS CodeBuild

AWS Lambda

AWS Systems Manager Parameter Store

Amazon SNS

Deployment steps

Achitecture

Launch

Set up events and notifications

Alexa skill