Category: Global

Long-Range Robotic Navigation via Automated Reinforcement Learning

Written on February 27, 2019. Posted in Google.

Aleksandra Faust, Senior Research Scientist and Anthony Francis, Senior Software Engineer, Robotics at Google

In the United States alone, there are 3 million people with a mobility impairment that prevents them from ever leaving their homes. Service robots that can autonomously navigate long distances can improve the independence of people with limited mobility, for example, by bringing them groceries, medicine, and packages. Research has demonstrated that deep reinforcement learning (RL) is good at mapping raw sensory input to actions, e.g. learning to grasp objects and for robot locomotion, but RL agents usually lack the understanding of large physical spaces needed to safely navigate long distances without human help and to easily adapt to new spaces.

In three recent papers, “Learning Navigation Behaviors End-to-End with AutoRL,” “PRM-RL: Long-Range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning”, and “Long-Range Indoor Navigation with PRM-RL”, we investigate easy-to-adapt robotic autonomy by combining deep RL with long-range planning. We train local planner agents to perform basic navigation behaviors, traversing short distances safely without collisions with moving obstacles. The local planners take noisy sensor observations, such as a 1D lidar that provides distances to obstacles, and output linear and angular velocities for robot control. We train the local planner in simulation with AutoRL, a method that automates the search for RL reward and neural network architecture. Despite their limited range of 10 – 15 meters, the local planners transfer well to both real robots and to new, previously unseen environments. This enables us to use them as building blocks for navigation in large spaces. We then build a roadmap, a graph where nodes are locations and edges connect the nodes only if local planners, which mimic real robots well with their noisy sensors and control, can traverse between them reliably.

Automating Reinforcement Learning (AutoRL)
In our first paper, we train the local planners in small, static environments. However, training with standard deep RL algorithms, such as Deep Deterministic Policy Gradient (DDPG), poses several challenges. For example, the true objective of the local planners is to reach the goal, which represents a sparse reward. In practice, this requires researchers to spend significant time iterating and hand-tuning the rewards. Researchers must also make decisions about the neural network architecture, without clear accepted best practices. And finally, algorithms like DDPG are unstable learners and often exhibit catastrophic forgetfulness.

To overcome those challenges, we automate the deep Reinforcement Learning (RL) training. AutoRL is an evolutionary automation layer around deep RL that searches for a reward and neural network architecture using large-scale hyperparameter optimization. It works in two phases, reward search and neural network architecture search. During the reward search, AutoRL trains a population of DDPG agents concurrently over several generations, each with a slightly different reward function optimizing for the local planner’s true objective: reaching the destination. At the end of the reward search phase, we select the reward that leads the agents to its destination most often. In the neural network architecture search phase, we repeat the process, this time using the selected reward and tuning the network layers, optimizing for the cumulative reward.

Automating reinforcement learning with reward and neural network architecture search.

However, this iterative process means AutoRL is not sample efficient. Training one agent takes 5 million samples; AutoRL training over 10 generations of 100 agents requires 5 billion samples – equivalent to 32 years of training! The benefit is that after AutoRL the manual training process is automated, and DDPG does not experience catastrophic forgetfulness. Most importantly, the resulting policies are higher quality — AutoRL policies are robust to sensor, actuator and localization noise, and generalize well to new environments. Our best policy is 26% more successful than other navigation methods across our test environments.

AutoRL (red) success over short distances (up to 10 meters) in several unseen buildings. Compared to hand-tuned DDPG (dark-red), artificial potential fields (light blue), dynamic window approach (blue), and behavior cloning (green).

AutoRL local planner policy transfer to robots in real, unstructured environments

While these policies only perform local navigation, they are robust to moving obstacles and transfer well to real robots, even in unstructured environments. Though they were trained in simulation with only static obstacles, they can also handle moving objects effectively. The next step is to combine the AutoRL policies with sampling-based planning to extend their reach and enable long-range navigation.

Achieving Long Range Navigation with PRM-RL
Sampling-based planners tackle long-range navigation by approximating robot motions. For example, probabilistic roadmaps (PRMs) sample robot poses and connect them with feasible transitions, creating roadmaps that capture valid movements of a robot across large spaces. In our second paper, which won Best Paper in Service Robotics at ICRA 2018, we combine PRMs with hand-tuned RL-based local planners (without AutoRL) to train robots once locally and then adapt them to different environments.

First, for each robot we train a local planner policy in a generic simulated training environment. Next, we build a PRM with respect to that policy, called a PRM-RL, over a floor plan for the deployment environment. The same floor plan can be used for any robot we wish to deploy in the building in a one time per robot+environment setup.

To build a PRM-RL we connect sampled nodes only if the RL-based local planner, which represents robot noise well, can reliably and consistently navigate between them. This is done via Monte Carlo simulation. The resulting roadmap is tuned to both the abilities and geometry of the particular robot. Roadmaps for robots with the same geometry but different sensors and actuators will have different connectivity. Since the agent can navigate around corners, nodes without clear line of sight can be included. Whereas nodes near walls and obstacles are less likely to be connected into the roadmap because of sensor noise. At execution time, the RL agent navigates from roadmap waypoint to waypoint.

Roadmap being built with 3 Monte Carlo simulations per randomly selected node pair.

The largest map was 288 meters by 163 meters and contains almost 700,000 edges, collected over 4 days using 300 workers in a cluster requiring 1.1 billion collision checks.

The third paper makes several improvements over the original PRM-RL. First, we replace the hand-tuned DDPG with AutoRL-trained local planners, which results in improved long-range navigation. Second, it adds Simultaneous Localization and Mapping (SLAM) maps, which robots use at execution time, as a source for building the roadmaps. Because SLAM maps are noisy, this change closes the “sim2real gap”, a phonomena in robotics where simulation-trained agents significantly underperform when transferred to real-robots. Our simulated success rates are the same as in on-robot experiments. Last, we added distributed roadmap building, resulting in very large scale roadmaps containing up to 700,000 nodes.

We evaluated the method using our AutoRL agent, building roadmaps using the floor maps of offices up to 200x larger than the training environments, accepting edges with at least 90% success over 20 trials. We compared PRM-RL to a variety of different methods over distances up to 100m, well beyond the local planner range. PRM-RL had 2 to 3 times the rate of success over baseline because the nodes were connected appropriately for the robot’s capabilities.

Navigation over 100 meters success rates in several buildings. First paper -AutoRL local planner only (blue); original PRMs (red); path-guided artificial potential fields (yellow); second paper (green); third paper – PRMs with AutoRL (orange).

We tested PRM-RL on multiple real robots and real building sites. One set of tests are shown below; the robot is very robust except near cluttered areas and off the edge of the SLAM map.

On-robot experiments

Conclusion
Autonomous robot navigation can significantly improve independence of people with limited mobility. We can achieve this by development of easy-to-adapt robotic autonomy, including methods that can be deployed in new environments using information that it is already available. This is done by automating the learning of basic, short-range navigation behaviors with AutoRL and using these learned policies in conjunction with SLAM maps to build roadmaps. These roadmaps consist of nodes connected by edges that robots can traverse consistently. The result is a policy that once trained can be used across different environments and can produce a roadmap custom-tailored to the particular robot.

Acknowledgements
The research was done by, in alphabetical order, Hao-Tien Lewis Chiang, James Davidson, Aleksandra Faust, Marek Fiser, Anthony Francis, Jasmine Hsu, J. Chase Kew, Tsang-Wei Edward Lee, Ken Oslund, Oscar Ramirez from Robotics at Google and Lydia Tapia from University of New Mexico. We thank Alexander Toshev, Brian Ichter, Chris Harris, and Vincent Vanhoucke for helpful discussions.

Use two additional data labeling services for your Amazon SageMaker Ground Truth labeling jobs

Written on February 26, 2019. Posted in Amazon.

We’re excited to announce the availability of two more data labeling services that you can use for your Amazon SageMaker Ground Truth labeling jobs:

Data Labeling Services by iMerit’s US-based workforce
Data Labeling Services by Startek, Inc.

These new listings on the AWS Marketplace supplement the existing iMerit India-based workforce listing to provide you a total of three options.

iMerit now provides access to their full-time, US-based staff of data labeling specialists. Their image labeling capabilities include classification, bounding boxes, image segmentation, key points, polygons, and polylines. Their text labeling capabilities include entity extraction and classification in both English and Spanish.

StarTek is a business process outsourcing company that offers data labeling services. StarTek is a publicly traded company (NYSE: SRT), and their workforces are spread across the Philippines, Honduras, India, Brazil, and Jamaica. Their image labeling capabilities include classification, bounding boxes, image segmentation, key points, polygons, and polylines. Their text labeling capabilities include entity extraction and classification in English.

We launched Amazon SageMaker Ground Truth at re:Invent 2018. It’s a service that helps you build highly accurate training datasets for machine learning. You can learn more from our launch blog. When you set up a Ground Truth labeling job, you can send labeling tasks to your own workers, Amazon Mechanical Turk public workers, or one of the vendors with listings on the AWS Marketplace.

You can assign data labeling tasks to one of the pre-approved vendors, who are vetted by Amazon for confidentiality, service guarantees, or special skills. The vendors are approved based on meeting specific requirements for data security, restricted access to physical facilities, and secure data transmission. We perform regular security audits to ensure the vendors continue to meet requirements.

Typically, finding and then contracting the right vendor is a time-consuming and tedious process. With Ground Truth, working with vendors is simple and involves just a few clicks through AWS Marketplace. All vendor-related charges appear directly on your AWS bill through the AWS Marketplace listing. The steps here show how easy it is to work with vendors to complete your Ground Truth labeling jobs.

Step 1: Navigate to the Vendor tab for Labeling Workforces

After signing into your AWS account, navigate to the Amazon SageMaker console. On the left-hand side navigation panel, select Labeling workforces. Then, choose Vendor on the right pane.

Step 2: Subscribe to the labeling services of a vendor through AWS Marketplace

After you choose Find data labeling services, you’re directed to the AWS Marketplace.

From here, you can select any of the available vendors to learn more about their company, labeling services, pricing, and much more. After you have selected the vendor that meets your needs, choose Continue to Subscribe on the listing page and complete the subscription process. Now you can use this vendor for a Ground Truth labeling job. You can be subscribed to any number of vendors at any time.

Step 3: Select a vendor when setting up your labeling job

When you create a labeling job, you see a list of all your subscribed vendors in the Subscribed data labeling services dropdown list. Choose one to kick off your labeling job. You have the flexibility to use different data labeling services for any of your labeling jobs.

Now Available

If you want to learn more about each of the data labeling services, visit the AWS Marketplace listings page. Now it’s your turn to work with them, and let us know what you think.

About the Author

Vikram Madan is the Product Manager for Amazon SageMaker Ground Truth. He focusing on delivering products that make it easier to build machine learning solutions. In his spare time, he enjoys running long distances and watching documentaries.

Preprocess input data before making predictions using Amazon SageMaker inference pipelines and Scikit-learn

Written on February 25, 2019. Posted in Amazon.

Amazon SageMaker enables developers and data scientists to build, train, tune, and deploy machine learning (ML) models at scale. You can deploy trained ML models for real-time or batch predictions on unseen data, a process known as inference. However, in most cases, the raw input data must be preprocessed and can’t be used directly for making predictions. This is because most ML models expect the data in a predefined format, so the raw data needs to be first cleaned and formatted in order for the ML model to process the data.

In this blog post, we’ll show how you can use the Amazon SageMaker built-in Scikit-learn library for preprocessing input data and then use the Amazon SageMaker built-in Linear Learner algorithm for predictions. We’ll deploy both the library and the algorithm on the same endpoint using the Amazon SageMaker Inference Pipelines feature so you can pass raw input data directly to Amazon SageMaker. We’ll also show how you can make your ML workflow modular and reuse the preprocessing code between training and inference to reduce development overhead and errors.

In our example (which is also published on GitHub), we’ll use the abalone dataset from the UCI machine learning repository. The dataset includes various data on abalones (a type of shellfish), including sex, length, diameter, height, shell weight, shucked weight, whole weight, viscera weight, and age. Since measuring the age of abalones is a time-consuming task, building a model to predict the age of the abalone enables us to estimate the abalone’s age based on physical measurements alone, removing the need to manually measure the abalone’s age.

To accomplish this, we’ll first do some simple preprocessing with the Amazon SageMaker built in Scikit-learn library. We will use the SimpleImputer , StandardScaler, and OneHotEncoder transformers on the raw abalone data; these are commonly-used data transformers included in Scikit-learn’s preprocessing library that process the data into a format required by ML models. Then, we’ll use our processed data to train the Amazon SageMaker Linear Learner algorithm to predict the age of abalones. Finally, we’ll create a pipeline that combines the data processing and model prediction steps using Amazon SageMaker Inference Pipelines. Using this pipeline, we can pass raw input data to a single endpoint that is first preprocessed and then is used to make a prediction for a given abalone.

Step 1: Launch SageMaker notebook instance and set up exercise code

From the SageMaker landing page, choose Notebook instances in the left panel and choose Create notebook Instance.

Give your notebook instance a name and make sure you choose an AWS Identity and Access Management (IAM) role that has access to Amazon S3. We’ll need to upload data to an Amazon S3 bucket for this project, so make sure you have a bucket you can access. If you don’t have an Amazon S3 bucket, follow this guide to create one.

Leave all other fields as the default and choose Create notebook Instance to launch your notebook instance (the notebook will take a few minutes to spin up). After the status reads “InService” your notebook instance is ready. Click Open Jupyter to open the notebook environment.

When the notebook environment opens, choose New and then choose conda_python3 in the upper right corner to create a new Python notebook. The subsequent steps will be fully contained within the notebook.

Step 2: Set up Amazon SageMaker role and download data

First we need to set up an Amazon S3 bucket to store our training data and model outputs. Replace the ENTER BUCKET NAME HERE placeholder with the name of the bucket from Step 1.

# S3 prefix
s3_bucket = '< ENTER BUCKET NAME HERE >'
prefix = 'Scikit-LinearLearner-pipeline-abalone-example'

Now we need to set up the Amazon SageMaker execution role, so that Amazon SageMaker can communicate with other parts of AWS.

import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

# Get a SageMaker-compatible role used by this Notebook Instance.
role = get_execution_role()

Finally, we download the abalone dataset to the Amazon SageMaker notebook instance. This is the dataset used in this example:

!wget --directory-prefix=./abalone_data https://s3-us-west-2.amazonaws.com/sparkml-mleap/data/abalone/abalone.csv

Step 3: Upload input data for Amazon SageMaker

We need to upload our dataset to Amazon S3 so Amazon SageMaker can call it later on:

WORK_DIRECTORY = 'abalone_data'

train_input = sagemaker_session.upload_data(
    path='{}/{}'.format(WORK_DIRECTORY, 'abalone.csv'), 
    bucket=s3_bucket,
    key_prefix='{}/{}'.format(prefix, 'train'))

Step 4: Create preprocessing script

This code described in this step already exists on the SageMaker instance, so you do not need to run the code in the section – you will simply call the existing script in the next step. However, we recommend that you take the time to explore how the pipeline is handled by reading through the code.

Now we are ready to create the container that will preprocess our data before it’s sent to the trained Linear Learner model. This container will run the sklearn_abalone_featurizer.py’ script, which Amazon SageMaker will import for both training and prediction. Training is executed using the main method as the entry point, which parses arguments, reads the raw abalone dataset from Amazon S3, then runs the SimpleImputer and StandardScaler on the numeric features and SimpleImputer and OneHotEncoder on the categorical features. At the end of training, the script serializes the fitted ColumnTransformer to Amazon S3 so that it may be used during inference.

from __future__ import print_function

import time
import sys
from io import StringIO
import os
import shutil

import argparse
import csv
import json
import numpy as np
import pandas as pd

from sklearn.compose import ColumnTransformer
from sklearn.externals import joblib
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import Binarizer, StandardScaler, OneHotEncoder

from sagemaker_containers.beta.framework import (
    content_types, encoders, env, modules, transformer, worker)

# Since we get a headerless CSV file we specify the column names here.
feature_columns_names = [
    'sex', # M, F, and I (infant)
    'length', # Longest shell measurement
    'diameter', # perpendicular to length
    'height', # with meat in shell
    'whole_weight', # whole abalone
    'shucked_weight', # weight of meat
    'viscera_weight', # gut weight (after bleeding)
    'shell_weight'] # after being dried

label_column = 'rings'

feature_columns_dtype = {
    'sex': str,
    'length': np.float64,
    'diameter': np.float64,
    'height': np.float64,
    'whole_weight': np.float64,
    'shucked_weight': np.float64,
    'viscera_weight': np.float64,
    'shell_weight': np.float64}

label_column_dtype = {'rings': np.float64} # +1.5 gives the age in years

def merge_two_dicts(x, y):
    z = x.copy()   # start with x's keys and values
    z.update(y)    # modifies z with y's keys and values & returns None
    return z

if __name__ == '__main__':

    parser = argparse.ArgumentParser()

    # Sagemaker specific arguments. Defaults are set in the environment variables.
    parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])

    args = parser.parse_args()

    # Take the set of files and read them all into a single pandas dataframe
    input_files = [ os.path.join(args.train, file) for file in os.listdir(args.train) ]
    if len(input_files) == 0:
        raise ValueError(('There are no files in {}.n' +
                          'This usually indicates that the channel ({}) was incorrectly specified,n' +
                          'the data specification in S3 was incorrectly specified or the role specifiedn' +
                          'does not have permission to access the data.').format(args.train, "train"))

    raw_data = [ pd.read_csv(
        file, 
        header=None, 
        names=feature_columns_names + [label_column],
        dtype=merge_two_dicts(feature_columns_dtype, label_column_dtype)) for file in input_files ]
    concat_data = pd.concat(raw_data)

    # We will train our classifier with the following features:
    # Numeric Features:
    # - length:  Longest shell measurement
    # - diameter: Diameter perpendicular to length
    # - height:  Height with meat in shell
    # - whole_weight: Weight of whole abalone
    # - shucked_weight: Weight of meat
    # - viscera_weight: Gut weight (after bleeding)
    # - shell_weight: Weight after being dried
    # Categorical Features:
    # - sex: categories encoded as strings {'M', 'F', 'I'} where 'I' is Infant
    numeric_features = list(feature_columns_names)
    numeric_features.remove('sex')
    numeric_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='median')),
        ('scaler', StandardScaler())])

    categorical_features = ['sex']
    categorical_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
        ('onehot', OneHotEncoder(handle_unknown='ignore'))])

    preprocessor = ColumnTransformer(
        transformers=[
            ('num', numeric_transformer, numeric_features),
            ('cat', categorical_transformer, categorical_features)],
        remainder="drop")

    preprocessor.fit(concat_data)

    joblib.dump(preprocessor, os.path.join(args.model_dir, "model.joblib"))

The next methods of the script are used during inference. The input_fn and output_fn methods will be used by Amazon SageMaker to parse the data payload and reformat the response. In this example, the input method only accepts ‘text/csv’ as the content-type, but can easily be modified to accept other input formats. The input_fn function also checks the length of the csv passed to determine whether to preprocess training data, which includes the label, or prediction data. The output method returns back in JSON format because by default the Inference Pipeline expects JSON between the containers, but can be modified to add other output formats.

def input_fn(input_data, content_type):
    """Parse input data payload

    We currently only take csv input. Since we need to process both labelled
    and unlabelled data we first determine whether the label column is present
    by looking at how many columns were provided.
    """
    if content_type == 'text/csv':
        # Read the raw input data as CSV.
        df = pd.read_csv(StringIO(input_data), 
                         header=None)

        if len(df.columns) == len(feature_columns_names) + 1:
            # This is a labelled example, includes the ring label
            df.columns = feature_columns_names + [label_column]
        elif len(df.columns) == len(feature_columns_names):
            # This is an unlabelled example.
            df.columns = feature_columns_names

        return df
    else:
        raise ValueError("{} not supported by script!".format(content_type))


def output_fn(prediction, accept):
    """Format prediction output

    The default accept/content-type between containers for serial inference is JSON.
    We also want to set the ContentType or mimetype as the same value as accept so the next
    container can read the response payload correctly.
    """
    if accept == "application/json":
        instances = []
        for row in prediction.tolist():
            instances.append({"features": row})

        json_output = {"instances": instances}

        return worker.Response(json.dumps(json_output), accept, mimetype=accept)
    elif accept == 'text/csv':
        return worker.Response(encoders.encode(prediction, accept), accept, mimetype=accept)
    else:
        raise RuntimeException("{} accept type is not supported by this script.".format(accept))

Our predict_fn will take the input data, which was parsed by our input_fn, and the deserialized model from the model_fn (described in detail next) to transform the source data. The script also adds back labels if the source data had labels, which would be the case for preprocessing training data.

def predict_fn(input_data, model):
    """Preprocess input data

    We implement this because the default predict_fn uses .predict(), but our model is a preprocessor
    so we want to use .transform().

    The output is returned in the following order:

        rest of features either one hot encoded or standardized
    """
    features = model.transform(input_data)

    if label_column in input_data:
        # Return the label (as the first column) and the set of features.
        return np.insert(features, 0, input_data[label_column], axis=1)
    else:
        # Return only the set of features
        return features

The model_fn takes the location of a serialized model and returns the deserialized model back to Amazon SageMaker. Note that this is the only method that does not have a default because the definition of the method will be closely linked to the serialization method implemented in training. In this example, we use the joblib library included with Scikit-learn.

def model_fn(model_dir):
    """Deserialize fitted model
    """
    preprocessor = joblib.load(os.path.join(model_dir, "model.joblib"))
    return preprocessor

Step 5: Fit the data preprocessor

We now create a preprocessor using the script we defined in step 4. This will allow us to send raw data to the model and output the processed data. To do this, we define an SKLearn estimator that accepts several constructor arguments:

entry_point: The path to the Python script that Amazon SageMaker runs for training and prediction (this is the script we defined in step 4).
role: Role Amazon Resource Name (ARN).
train_instance_type (optional): The type of Amazon SageMaker instances for training. Note: Because Scikit-learn does not natively support GPU training, Amazon SageMaker Scikit-learn does not currently support training on GPU instance types.
sagemaker_session (optional): The session used to train on Amazon SageMaker.

from sagemaker.sklearn.estimator import SKLearn

script_path = '/home/ec2-user/sample-notebooks/sagemaker-python-sdk/scikit_learn_inference_pipeline/sklearn_abalone_featurizer.py'

sklearn_preprocessor = SKLearn(
    entry_point=script_path,
    role=role,
    train_instance_type="ml.c4.xlarge",
    sagemaker_session=sagemaker_session)

sklearn_preprocessor.fit({'train': train_input})

It will take a few minutes (up to 5) for the preprocessor to be created. After the preprocessor is ready, we can send our raw data to the preprocessor and store our processed abalone data back in Amazon S3. We’ll do this in the next step.

Step 6: Batch transform training data

Now that our preprocessor is ready, we can use it to batch transform raw data into preprocessed data for training. To do this, we create a transformer and point it to the raw data on Amazon S3:

# Define a SKLearn Transformer from the trained SKLearn Estimator
transformer = sklearn_preprocessor.transformer(
    instance_count=1, 
    instance_type='ml.m4.xlarge',
    assemble_with = 'Line',
    accept = 'text/csv')

# Preprocess training input
transformer.transform(train_input, content_type='text/csv')
print('Waiting for transform job: ' + transformer.latest_transform_job.job_name)
transformer.wait()
preprocessed_train = transformer.output_path

When the transformer is done, our transformed data will be stored in Amazon S3. You can find the location of the preprocessed data by looking at the values in the preprocessed_train variable.

Step 7: Fit the Linear Learner model with preprocessed data

Now that we’ve built the preprocessing container and processed the raw data, we need to create the model container that our processed data will be sent to. This container will input the processed dataset and train a model to predict the age of a given abalone based on the (processed) feature values. We’ll use the Amazon SageMaker Linear Learner for the prediction model.

First we define the Linear Learner image location using a helper function in the Python SDK:

import boto3
from sagemaker.amazon.amazon_estimator import get_image_uri
ll_image = get_image_uri(boto3.Session().region_name, 'linear-learner')

Now we can fit the model. Fitting the model has four main steps:

Define the Amazon S3 location in which to store the model results.
Create the Linear Learner Estimator.
Set Estimator hyperparameters, including number of features, predictor type, and mini-batch size.
Define a variable for the location of our transformed data (from step 6), and use it to train the Linear Learner model.

s3_ll_output_key_prefix = "ll_training_output"
s3_ll_output_location = 's3://{}/{}/{}/{}'.format(s3_bucket, prefix, s3_ll_output_key_prefix, 'll_model')

ll_estimator = sagemaker.estimator.Estimator(
    ll_image,
    role, 
    train_instance_count=1, 
    train_instance_type='ml.m4.2xlarge',
    train_volume_size = 20,
    train_max_run = 3600,
    input_mode= 'File',
    output_path=s3_ll_output_location,
    sagemaker_session=sagemaker_session)

ll_estimator.set_hyperparameters(feature_dim=10, predictor_type='regressor', mini_batch_size=32)

ll_train_data = sagemaker.session.s3_input(
    preprocessed_train, 
    distribution='FullyReplicated',
    content_type='text/csv', 
    s3_data_type='S3Prefix')

data_channels = {'train': ll_train_data}
ll_estimator.fit(inputs=data_channels, logs=True)

As with the previous training jobs, it will take a few minutes (up to 5) for the Estimator model to fit.

Step 8: Create inference pipeline

In step 5, we created an inference preprocessor that will take input data and preprocess our features. We now combine this preprocessor with the Linear Learner model in step 7 to create an inference pipeline that processes the raw data and sends it to the prediction model for prediction. Notice that setting up the pipeline is straightforward. After defining the models and assigning names, we simply create a “PipelineMode” that points to our preprocessing and prediction models. We then deploy the pipeline model to a single endpoint:

from sagemaker.model import Model
from sagemaker.pipeline import PipelineModel
import boto3
from time import gmtime, strftime

timestamp_prefix = strftime("%Y-%m-%d-%H-%M-%S", gmtime())

scikit_learn_inferencee_model = sklearn_preprocessor.create_model()
linear_learner_model = ll_estimator.create_model()

model_name = 'inference-pipeline-' + timestamp_prefix
endpoint_name = 'inference-pipeline-ep-' + timestamp_prefix
sm_model = PipelineModel(
    name=model_name, 
    role=role, 
    models=[
        scikit_learn_inferencee_model, 
        linear_learner_model])

sm_model.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge', endpoint_name=endpoint_name)

After the endpoint has been created, our pipeline is ready to use!

Step 9: Make a prediction using the inference pipeline

We can test our pipeline by sending data for prediction. The pipeline will accept raw data, transform it using the preprocessor we create in steps 3 and 4, and create a prediction using the Linear Learner model we created in step 7.

First, we define a ‘payload’ variable that contains the data we want to send through the pipeline. Then we define a predictor using our pipeline endpoint, send the payload to the predictor, and print the model prediction:

from sagemaker.predictor import json_serializer, csv_serializer, json_deserializer, RealTimePredictor
from sagemaker.content_types import CONTENT_TYPE_CSV, CONTENT_TYPE_JSON

payload = 'M, 0.44, 0.365, 0.125, 0.516, 0.2155, 0.114, 0.155'
actual_rings = 10

predictor = RealTimePredictor(
    endpoint=endpoint_name,
    sagemaker_session=sagemaker_session,
    serializer=csv_serializer,
    content_type=CONTENT_TYPE_CSV,
    accept=CONTENT_TYPE_JSON)

print(predictor.predict(payload))

Our model predicts an age of 9.53 for the abalone defined in our payload. Notice that we sent raw data into our pipeline, which preprocessed this data before sending it to the linear model for scoring.

Step 10: Delete endpoint

After we are finished we can delete the endpoints used in this example:

sm_client = sagemaker_session.boto_session.client('sagemaker')
sm_client.delete_endpoint(EndpointName=endpoint_name)

Conclusion

In this blog post, we built a ML pipeline that uses Amazon SageMaker and the built-in Scikit-learn library to process raw data. We trained a ML model on the processed data using the Amazon SageMaker built-in Linear Learner algorithm, and created predictions with the trained model. This allows us to pass raw data to the pipeline and get model predictions in Amazon S3 without having to repeat the intermediary data processing steps every time!

Citations

Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.

About the Authors

Matt McKenna is a Data Scientist focused on machine learning in Amazon Alexa, and is passionate about applying statistics and machine learning methods to solve real world problems. In his spare time, Matt enjoys playing guitar, running, craft beer, and rooting for Boston sports teams.

Eric Kim is an engineer in the Algorithms & Platforms Group of Amazon AI Labs. He helps support the AWS service SageMaker, and has experience in machine learning research, development, and application. Outside of work, he is an avid music lover and a fan of all dogs.

Urvashi Chowdhary is a Senior Product Manager for Amazon SageMaker. She is passionate about working with customers and making machine learning more accessible. In her spare time, she loves sailing, paddle boarding, and kayaking.

Learning to Generalize from Sparse and Underspecified Rewards

Written on February 21, 2019. Posted in Google.

Posted by Rishabh Agarwal, Google AI Resident and Mohammad Norouzi, Research Scientist

Reinforcement learning (RL) presents a unified and flexible framework for optimizing goal-oriented behavior, and has enabled remarkable success in addressing challenging tasks such as playing video games, continuous control, and robotic learning. The success of RL algorithms in these application domains often hinges on the availability of high-quality and dense reward feedback. However, broadening the applicability of RL algorithms to environments with sparse and underspecified rewards is an ongoing challenge, requiring a learning agent to generalize (i.e., learn the right behavior) from limited feedback. A natural way to investigate the performance of RL algorithms in such problem settings is via language understanding tasks, where an agent is provided with a natural language input and needs to generate a complex response to achieve a goal specified in the input, while only receiving binary success-failure feedback.

For instance, consider a “blind” agent tasked with reaching a goal position in a maze by following a sequence of natural language commands (e.g., “Right, Up, Up, Right”). Given the input text, the agent (green circle) needs to interpret the commands and take actions based on such interpretation to generate an action sequence (a). The agent receives a reward of 1 if it reaches the goal (red star) and 0 otherwise. Because the agent doesn’t have access to any visual information, the only way for the agent to solve this task and generalize to novel instructions is by correctly interpreting the instructions.

In this instruction-following task, the action trajectories a₁, a₂ and a₃ reach the goal, but the sequences a₂ and a₃ do not follow the instructions. This illustrates the issue of underspecified rewards.

In these tasks, the RL agent needs to learn to generalize from sparse (only a few trajectories lead to a non-zero reward) and underspecified (no distinction between purposeful and accidental success) rewards. Importantly, because of underspecified rewards, the agent may receive positive feedback for exploiting spurious patterns in the environment. This can lead to reward hacking, causing unintended and harmful behavior when deployed in real-world systems.

In “Learning to Generalize from Sparse and Underspecified Rewards“, we address the issue of underspecified rewards by developing Meta Reward Learning (MeRL), which provides more refined feedback to the agent by optimizing an auxiliary reward function. MeRL is combined with a memory buffer of successful trajectories collected using a novel exploration strategy to learn from sparse rewards. The effectiveness of our approach is demonstrated on semantic parsing, where the goal is to learn a mapping from natural language to logical forms (e.g., mapping questions to SQL programs). In the paper, we investigate the weakly-supervised problem setting, where the goal is to automatically discover logical programs from question-answer pairs, without any form of program supervision. For instance, given the question “Which nation won the most silver medals?” and a relevant Wikipedia table, an agent needs to generate an SQL-like program that results in the correct answer (i.e., “Nigeria”).

The proposed approach achieves state-of-the-art results on the WikiTableQuestions and WikiSQL benchmarks, improving upon prior work by 1.2% and 2.4% respectively. MeRL automatically learns the auxiliary reward function without using any expert demonstrations, (e.g., ground-truth programs) making it more widely applicable and distinct from previous reward learning approaches. The diagram below depicts a high level overview of our approach:

Overview of the proposed approach. We employ (1) mode covering exploration to collect a diverse set of successful trajectories in a memory buffer; (2) Meta-learning or Bayesian optimization to learn an auxiliary reward that provides more refined feedback for policy optimization.

Meta Reward Learning (MeRL)
The key insight of MeRL in dealing with underspecified rewards is that spurious trajectories and programs that achieve accidental success are detrimental to the agent’s generalization performance. For example, an agent might be able to solve a specific instance of the maze problem above. However, if it learns to perform spurious actions during training, it is likely to fail when provided with unseen instructions. To mitigate this issue, MeRL optimizes a more refined auxiliary reward function, which can differentiate between accidental and purposeful success based on features of action trajectories. The auxiliary reward is optimized by maximizing the trained agent’s performance on a hold-out validation set via meta learning.

Schematic illustration of MeRL: The RL agent is trained via the reward signal obtained from the auxiliary reward model while the auxiliary rewards are trained using the generalization error of the agent.

Learning from Sparse Rewards
To learn from sparse rewards, effective exploration is critical to find a set of successful trajectories. Our paper addresses this challenge by utilizing the two directions of Kullback–Leibler (KL) divergence, a measure on how different two probability distributions are. In the example below, we use KL divergence to minimize the difference between a fixed bimodal (shaded purple) and a learned gaussian (shaded green) distribution, which can represent the distribution of the agent’s optimal policy and our learned policy respectively. One direction of the KL objective learns a distribution which tries to cover both the modes while the distribution learned by other objective seeks a particular mode (i.e. it prefers one mode over another). Our method exploits the mode covering KL’s tendency to focus on multiple peaks to collect a diverse set of successful trajectories and mode seeking KL’s implicit preference between trajectories to learn a robust policy.

Left: Optimizing mode covering KL. Right: Optimizing mode seeking KL

Conclusion
Designing reward functions that distinguish between optimal and suboptimal behavior is critical for applying RL to real-world applications. This research takes a small step in the direction of modelling reward functions without any human supervision. In future work, we’d like to tackle the credit-assignment problem in RL from the perspective of automatically learning a dense reward function.

Acknowledgements
This research was done in collaboration with Chen Liang and Dale Schuurmans. We thank Chelsea Finn and Kelvin Guu for their review of the paper.

Identifying bird species on the edge using the Amazon SageMaker built-in Object Detection algorithm and AWS DeepLens

Written on February 20, 2019. Posted in Amazon.

Custom object detection has become an important enabler for a wide range of industries and use cases—such as finding tumors in MRIs, identifying diseased crops, and monitoring railway platforms. In this blog post, we build a bird identifier based on an annotated public dataset. This type of model could be used in a number of different ways. You could use it to automate environmental studies for construction projects, or it could be used by bird enthusiasts when bird watching. You could also use this model as a working example to drive new ideas for your own use cases.

For this example, we use the built-in Object Detection algorithm provided by Amazon SageMaker. Amazon SageMaker is an end-to-end machine learning (ML) platform. By using built-in algorithms, developers can accelerate machine learning without needing expertise in using low-level ML frameworks, such as TensorFlow and MXNet. We’ll train the model in Amazon SageMaker’s fully managed and on-demand training infrastructure. Trained models can easily be hosted in the cloud or on the edge using AWS IoT Greengrass.

To demonstrate the use of custom object detection on the edge, we also show you how to deploy the trained model on AWS DeepLens, the world’s first deep-learning-enabled video camera for developers. AWS DeepLens helps put deep learning in the hands of developers, literally, with a fully programmable video camera, tutorials, code, and pre-trained models designed to expand deep learning skills.

The following diagram gives a high-level view how our bird identifier solution is built:

Understanding the dataset

The CUB 200-2011 birds dataset contains 11,788 images across 200 bird species (the original technical report can be found here). Each species comes with about 60 images, with a typical size of about 350 pixels by 500 pixels. Bounding boxes are provided, as are annotations of bird parts. A recommended train/test split is given, but image size data is not.

Preparing the image dataset

The most efficient way to provide image data to the Amazon SageMaker Object Detection algorithm is by using the RecordIO format. MXNet provides a tool called im2rec.py to create RecordIO files for your datasets. To use the tool, you provide listing files describing the set of images.

For object detection datasets, Amazon SageMaker needs bounding boxes to be described in terms of xmin, ymin, xmax, and ymax, which are ratios of the box corners to the full image. The CUB dataset bounding box instead gives you x, y, width, and height in pixels. See the following picture to understand the difference in metadata.

To address this discrepancy, we retrieve the size of each image and translate the absolute bounding box to have dimensions relative to the image size. In the following example, the box dimensions from the dataset are shown in black, while the dimensions required by RecordIO are shown in green.

The following Python code snippet shows how we converted the original bounding box dimensions to those needed by im2rec. See the sample Amazon SageMaker notebook for the full code.

# Define the bounding boxes in the format required by the Amazon SageMaker 
# built-in Object Detection algorithm.
# The xmin/ymin/xmax/ymax parameters are specified as ratios to 
# the total image pixel size

df['header_cols'] = 2 # number of header cols and label width
df['label_width'] = 5 # label cols: class, xmin, ymin, xmax, ymax

df['xmin'] =  df['x_abs'] / df['width']
df['xmax'] = (df['x_abs'] + df['bbox_width']) / df['width']
df['ymin'] =  df['y_abs'] / df['height']
df['ymax'] = (df['y_abs'] + df['bbox_height']) / df['height']

With the listing files in place, the im2rec utility can be used to create the RecordIO files by executing the following command:

python im2rec.py --resize 256 --pack-label birds CUB_200_2011/images/

After the RecordIO files are created, they are uploaded to Amazon S3 as input to the Object Detection algorithm with the following Python code:

train_channel      = prefix + '/train'
validation_channel = prefix + '/validation'

sess.upload_data(path='birds_ssd_sample_train.rec',
                 bucket=bucket,
                 key_prefix=train_channel)
sess.upload_data(path='birds_ssd_sample_val.rec',
                 bucket=bucket,
                 key_prefix=validation_channel)

Training the object detection model using the Amazon SageMaker built-in algorithm

With the images available in Amazon S3, the next step is to train the model. Documentation for the object detection hyperparameters is available here. For our example, we have a few hyperparameters of interest:

Number of classes and training samples.
Batch size, epochs, image size, pre-trained model, and base network.

Note that the Amazon SageMaker Object Detection algorithm requires models to be trained on a GPU instance type such as ml.p3.2xlarge. Here is a Python code snippet for creating an estimator and setting the hyperparameters:

od_model = sagemaker.estimator.Estimator(
                 training_image,
                 role, 
                 train_instance_count=1, 
                 train_instance_type='ml.p3.2xlarge',
                 train_volume_size = 50,
                 train_max_run = 360000,
                 input_mode= 'File',
                 output_path=s3_output_location,
                 sagemaker_session=sess)  

od_model.set_hyperparameters(
                 base_network='resnet-50',
                 use_pretrained_model=1,
                 num_classes=5,
                 mini_batch_size=16,
                 epochs=100,               
                 learning_rate=0.001, 
                 lr_scheduler_step='33,67',
                 lr_scheduler_factor=0.1,
                 optimizer='sgd',
                 momentum=0.9,
                 weight_decay=0.0005,
                 overlap_threshold=0.5,
                 nms_threshold=0.45,
                 image_shape=512,
                 label_width=350,
                 num_training_samples=150)

With the dataset uploaded, and the hyperparameters set, the training can be started using the following Python code:

s3_train_data      = 's3://{}/{}'.format(bucket, train_channel)
s3_validation_data = 's3://{}/{}'.format(bucket, validation_channel)

train_data      = sagemaker.session.s3_input(
                         s3_train_data,
                         distribution='FullyReplicated',
                         content_type='application/x-recordio',
                         s3_data_type='S3Prefix')
validation_data = sagemaker.session.s3_input(
                         s3_validation_data,
                         distribution='FullyReplicated',
                         content_type='application/x-recordio',
                         s3_data_type='S3Prefix')

data_channels = {'train': train_data, 'validation': validation_data}

od_model.fit(inputs=data_channels, logs=True)

For a subset of 5 species on an ml.p3.2xlarge instance type, we can get accuracy of 70 percent or more with 100 epochs in about 11 minutes.

You can create the training job using the AWS CLI, using a notebook, or using the Amazon SageMaker console.

Hosting the model using an Amazon SageMaker endpoint

After we have trained our model, we’ll host it on Amazon SageMaker. We use CPU instances, but you can also use GPU instances. Deployment from your Amazon SageMaker notebook takes a single line of Python code:

object_detector = od_model.deploy(
                     initial_instance_count = 1,
                     instance_type = 'ml.m4.xlarge')

Testing the model

After the model endpoint is in service, we can pass in images that the model has not yet seen and see how well the birds are detected. See the sample notebook for a visualize_detections function. Given a URL to a bird image, here is the Python code to invoke the endpoint, get back a set of predicted bird species and their bounding boxes, and visualize the results:

b = ''
with open(filename, 'rb') as image:
    f = image.read()
    b = bytearray(f)

endpoint_response = runtime.invoke_endpoint(
                                 EndpointName=ep,
                                 ContentType='image/jpeg',
                                 Body=b)
results    = endpoint_response['Body'].read()
detections = json.loads(results)

visualize_detection(filename,
                    detections['prediction'],
                    OBJECT_CATEGORIES, thresh)

Here is a sample result for an image of a blue jay:

Running your model on the edge using AWS DeepLens

In some use cases, an Amazon SageMaker hosted endpoint will be a sufficient deployment mechanism, but there are many use cases that require real-time object detection at the edge. Imagine a phone-based assistant app in a bird sanctuary. You walk around and point your app at a bird and instantly get details about the species (listen to the bird call, understand its habitat, etc.). No more guessing about what bird you just saw.

AWS DeepLens lets you experiment with deep learning on the edge, giving developers an easy way to deploy trained models and use Python code to come up with interesting applications. For our bird identifier, you could mount an AWS DeepLens device next to your kitchen window overlooking a set of bird feeders. The device could feed cropped images of detected birds to Amazon S3. It could even trigger a text to your mobile phone to let you know what birds visited.

A previous blog post covered how to deploy a custom image classification model on AWS DeepLens. For a custom object detection model, there are two differences:

The model artifacts must be converted before being deployed.
The model must be optimized before being loaded.

Let’s go into more detail on each of these differences.

Convert your model artifacts before deploying to AWS DeepLens

For custom object detection models produced by Amazon SageMaker, you need to perform an additional step if you want to deploy your models on AWS DeepLens. MXNet provides a utility function for converting the model. To get started with conversion, first clone the GitHub repository:

git clone https://github.com/apache/incubator-mxnet

The next step is to download a copy of your model artifacts that were saved in Amazon S3 as a result of your training job. Extract the actual artifacts (a parameters file, a symbol file, and the hyperparameters file), and rename them so that they reflect the base network and the image size that were used in training. Here are the commands from a bash script you can use to perform the conversion:

BUCKET_AND_PREFIX=s3://<your-bucket>/<your-prefix>/output
TARGET_PREFIX=s3://<your-deeplens-bucket>/<your-prefix>
rm -rf tmp && mkdir tmp
aws s3 cp $BUCKET_AND_PREFIX/model.tar.gz tmp
gunzip -k -c tmp/model.tar.gz | tar -C tmp -xopf –
mv tmp/*-0000.params tmp/ssd_resnet50_512-0000.params
mv tmp/*-symbol.json tmp/ssd_resnet50_512-symbol.json

After the contents have been extracted and renamed, invoke the conversion utility:

python incubator-mxnet/example/ssd/deploy.py --network resnet50 
  --data-shape 512 --num-class 5 --prefix tmp/ssd_

You can now remove the original files and create a new compressed tar file with the converted artifacts. Copy the new model artifacts file to Amazon S3, where it can be used when importing a new AWS DeepLens object detection model:

rm tmp/ssd_* && rm tmp/model.tar.gz
tar -cvzf ./patched_model.tar.gz -C tmp 
  ./deploy_ssd_resnet50_512-0000.params 
  ./deploy_ssd_resnet50_512-symbol.json 
  ./hyperparams.json
aws s3 cp patched_model.tar.gz $TARGET_PREFIX-patched/

Note that the destination bucket for the patched model must have the word “deeplens” in the bucket name. Otherwise, you will get an error when importing the model in the AWS DeepLens console. A complete script for patching the model artifacts can be found here.

Optimize the model from your AWS Lambda function on AWS DeepLens

An AWS DeepLens project consists of a trained model and an AWS Lambda function. Using AWS IoT Greengrass on the AWS DeepLens, the inference Lambda function performs three important functions:

It captures the image from a video stream.
It performs an inference using that image against the deployed machine learning model.
It provides the results to both AWS IoT and the output video stream.

AWS IoT Greengrass lets you execute AWS Lambda functions locally, reducing the complexity of developing embedded software. For details on creating and publishing your inference Lambda function, see this documentation.

When using a custom object detection model produced by Amazon SageMaker, there is an additional step in your AWS DeepLens inference Lambda function. The inference function needs to call MXNet’s model optimizer before performing any inference using your model. Here is the Python code for optimizing and loading the model:

ret, model_path = mo.optimize('deploy_ssd_resnet50_512',
                              input_width, input_height)
model = awscam.Model(model_path, {'GPU': 1})

Performing model inference on AWS DeepLens

Model inference from your AWS Lambda function is very similar to the steps we showed earlier for invoking a model using an Amazon SageMaker hosted endpoint. Here is a piece of the Python code for finding birds in a frame provided by the AWS DeepLens video camera:

frame_resize = cv2.resize(frame, (512, 512))

# Run the images through the inference engine and parse the results using
# the parser API.  Note it is possible to get the output of doInference
# and do the parsing manually, but since it is a ssd model,
# a simple API is provided.
parsed_inference_results = model.parseResult(
                                 model_type,
                                 model.doInference(frame_resize))

A complete inference Lambda function for use on AWS DeepLens with this object detection model can be found here.

Conclusion

In this blog post, we have shown how to use the Amazon SageMaker built-in Object Detection algorithm to create a custom model for detecting bird species based on a publicly available dataset. We also showed you how to run that model on a hosted Amazon SageMaker endpoint and on the edge using AWS DeepLens. You can clone and extend this example for your own use cases. We would love to hear how you are applying this code in new ways. Please let us know your feedback by adding your comments.

About the author

Mark Roy is a Solution Architect focused on Machine Learning, with a particular interest in helping customers and partners design computer vision solutions. In his spare time, Mark loves to play, coach, and follow basketball.

On the Path to Cryogenic Control of Quantum Processors

Written on February 20, 2019. Posted in Google.

Posted by Joseph Bardin, Visiting Faculty Researcher and Erik Lucero, Staff Research Scientist and Hardware Lead, Google AI Quantum Team

Building a quantum computer that can solve practical problems that would otherwise be classically intractable due to the computation complexity, cost, energy consumption or time to solution, is the longstanding goal of the Google AI Quantum team. Current thresholds suggest a first generation error-corrected quantum computer will require on the order of 1 million physical qubits, which is more than four orders of magnitude more qubits than exist in Bristlecone, our 72 qubit quantum processor. Increasing the number of physical qubits needed for a fault-tolerant quantum computer while maintaining high-quality control of each qubit are intertwined and exciting technological challenges that will require inventions beyond simply copying and pasting our current control architecture. One critical challenge is reducing the number of input/output control lines per qubit by relocating the room temperature analog control electronics to the 3 kelvin stage in the cryostat, while maintaining high-quality qubit control.

As a step towards solving that challenge, this week we presented our first generation cryogenic-CMOS single-qubit controller at the International Solid State Circuits Conference in San Francisco. Fabricated using commercial CMOS technology, our controller operates at 3 kelvin, consumes less than 2 milliwatts of power and measures just 1 mm by 1.6 mm. Functionally, it provides an instruction set for single-qubit gate operations, providing analog control of a qubit via digital lines between room temperature and 3 kelvin, all while consuming ~1000 times less power compared to our current room temperature control electronics.

Google’s first generation cryogenic-CMOS single-qubit controller (center and zoomed on the right) packaged and ready to be deployed inside our cryostat. The controller measures 1mm by 1.6mm.

How to Control 72 Qubits
In our lab in Santa Barbara, we run programs on Bristlecone by applying gigahertz frequency analog control signals to each of the qubits to manipulate the qubit state, to entangle qubits and to measure the outcomes of our computations. How well we define the shape and frequency of these control signals directly impacts the quality of our computation. To make high-quality qubit control signals, we leverage technology developed for smartphones packaged in server racks at room temperature. Individual coaxial cables deliver these signals to each qubit, which are themselves kept inside a cryostat chilled to 10 millikelvin. While this approach makes sense for a Bristlecone-scale quantum processor, which demands 2 control lines per qubit for 144 unique control signals, we realized that a more integrated approach would be required in order to scale our systems to the million qubit level.

Research Scientist Amit Vainsencher checking the wiring on Bristlecone in one of Google’s flagship cryostats. Blue coaxial cables are connected from custom analog control electronics (server rack on the right) to the quantum processor.

In our current setup, the number of physical wires connected from room temperature to the qubits inside the cryostat and the finite cooling power of the cryostat represent a significant constraint. One way to alleviate this is to move the digital to analog control closer to the quantum processor. Currently, our room temperature digital-to-analog waveform generators used to control individual qubits, dissipate ~1 watt of waste heat per qubit. The cooling power of our cryostat at 3 kelvin is 0.1 watt. That means if we crammed 150 waveform generators into our cryostat (never mind the limited physical space inside the refrigerator for a moment) we would overwhelm the cooling power of our cryostat by 1500x, thereby cooking our cryostat and rendering our qubits useless. Therefore, simply installing our existing digital-to-analog control in the cryostat will not set us on the path to control millions of qubits. It is clear we need an integrated low-power qubit control solution.

A Cool Idea
In collaboration with University of Massachusetts Professor Joseph Bardin, we set out to develop custom integrated circuits (ICs) to control our qubits from within the cryostat to ultimately reduce the physical I/O connections to and from our future quantum processors. These ICs would be designed to operate in the ultracold environment, specifically 3 kelvin, and turn digital instructions into analog control pulses for qubits. A key research objective was to first design a custom IC with low power requirements, in order to prevent warming up the cryostat.

We designed our IC to dissipate no more than 2 milliwatts of power at 3 kelvin, which can be challenging as most physical CMOS models assume operation closer to 300 kelvin. After design and fabrication of the IC with the low power design constraints in mind, we verified that the cryogenic-CMOS qubit controller worked at room temperature. We then mounted it in our cryostat at 3 kelvin and connected it to a qubit (mounted at 10 millikelvin in the same cryostat). We carried out a series of experiments to establish that the cryogenic-CMOS qubit controller worked as designed, and most importantly, that we hadn’t just installed a heater inside our cryostat.

Schematic of the cryogenic-CMOS qubit controller mounted on the 3 kelvin stage of our dilution refrigerator and connected to a qubit. Our standard qubit control electronics were connected in parallel to enable control and measurement of the qubit as an in-situ check experiment.

Performance at Low Temperature
Baseline experiments for our new quantum control hardware, including T1, Rabi oscillations, and single qubit gates, show similar performance compared to our standard room-temperature qubit control electronics: qubit coherence time was virtually unchanged, and high-visibility Rabi oscillations were observed by varying the amplitude of the pulses out of the cryogenic-CMOS qubit controller—a signature response of a driven qubit.

Comparison of the qubit coherence time measured using the standard and cryogenic quantum controllers.

Measured Rabi amplitude oscillations using the cryogenic controller. The green and black traces are the probability of measuring the qubits in the 1 and 0 states, respectively.

Next Steps
Although all of these results are promising, this first generation cryogenic-CMOS qubit controller is but one small step towards a truly scalable qubit control and measurement system. For instance, our controller is only able to address a single qubit, and it still requires several connections to room temperature. In addition, we still need to work hard to quantify the error rates for single qubit gates. As such, we are excited to reduce the energy required to control qubits and still maintain the delicate control required to perform high-quality qubit operations.

Acknowledgements
This work was carried out with the support of the Google Visiting Researcher Program while Prof. Bardin, an Associate Professor with the University of Massachusetts Amherst, was on sabbatical with the Google AI Quantum Team. This work would not have been possible without the many contributions of members of the Google AI Quantum team, especially Evan Jeffrey for his integration of the cryo-CMOS controller into the qubit calibration software, Ted White for his on-demand qubit calibrations and Trent Huang for his tireless design rules checks.

Newstag improves global video news discoverability using AI language services on AWS

Written on February 19, 2019. Posted in Amazon.

Swedish startup Newstag uses artificial intelligence (AI) to allow customers to create personalized video news channels from major global news providers. Their mission is to continuously empower people and organizations with the latest, diverse information. To increase discoverability of video news from all around the world for their customers, Newstag creates rich metadata for each video. Newstag was able to automate this manually intensive process of extracting and creating metadata from videos by using Amazon Transcribe, Amazon Translate, and Amazon Comprehend. Using a combination of AWS services, Newstag can create rich metadata for ten times more videos than was previously possible.

“We believe people want to choose what news they want to see. Enabling customers to curate relevant stories is pivotal for us to carry out our company mission,” says Mats Ekholm, Chief Technology Officer of Newstag. To accomplish this, Newstag has developed tags that customers can select to create a personalized video news channel. The following screenshot illustrates how customers can select these tags in Newstag.

To curate over 1,000 videos a day, Newstag’s editorial staff had spent a lot of time manually tagging content in various languages. Tags mostly consisted of titles, brief descriptions, and limited metadata. Struggling to keep up with demand, the startup looked for a simple, cost-efficient, and easy-to-deploy solution. By using pre-trained machine learning (ML) services on AWS, Newstag was able to use AI to solve the problem even though they had no previous experience with the technology.

First, Newstag uses Amazon Transcribe to create transcripts of speech in supported languages from the videos stored using Amazon Simple Storage Service (S3). Then Amazon Translate is applied to non-English transcripts as well as other titles, descriptions, or keywords originally provided with the video for accurate translation into English. Finally, Amazon Comprehend, a machine learning service that provides insights from analyzing textual content, is used to extract entities from all texts available in English. These named entities, such as organizations, people, places, and locations, are used to create accurate tags to help customers find targeted content.

“We used to manually create tags for about three to four videos per hour,” explains Ekholm. “With AI language services offered by AWS, we can now create tags for about 30 to 40 videos per hour. It means a 10-times increase in the number of news stories that our customers can see on Newstag.”

Ekholm automated a majority of the tagging process for video news in different languages within five hours at low cost. “I was impressed by how easy it was to deploy Transcribe, Translate, and Comprehend. I was also very pleased with their low costs. As a start-up, we have to be smart about operating costs,” says Ekholm.

Learn more

See the AWS website to learn more about language services for AI. Here are some useful blog posts from AWS to get you started:

About the Author

Woo Kim is a Product Marketing Manager for AWS machine learning services. He spent his childhood in South Korea and now lives in Seattle, WA. In his spare time, he enjoys playing volleyball and tennis.

Run ONNX models with Amazon Elastic Inference

Written on February 18, 2019. Posted in Amazon.

At re:Invent 2018, AWS announced Amazon Elastic Inference (EI), a new service that lets you attach just the right amount of GPU-powered inference acceleration to any Amazon EC2 instance. This is also available for Amazon SageMaker notebook instances and endpoints, bringing acceleration to built-in algorithms and to deep learning environments.

In this blog post, I show how to use the models in the ONNX Model Zoo on GitHub to perform inference by using MXNet with Elastic Inference Accelerator (EIA) as a backend.

The benefits of Amazon Elastic Inference

Amazon Elastic Inference allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Amazon SageMaker instances to reduce the cost of running deep learning inference by up to 75 percent.

Amazon Elastic Inference provides support for Apache MXNet, TensorFlow, and ONNX models. ONNX is an open standard format for deep learning models that enables interoperability between deep learning frameworks such as Apache MXNet, Caffe2, Microsoft Cognitive Toolkit (CNTK), PyTorch, and more. This means that you can use any of these frameworks to train a model, export the model in ONNX format, and then import them into Apache MXNet for inference.

You can see the collection of pre-trained, state-of-the-art models in ONNX format at the ONNX Model Zoo on GitHub.

Getting started with inference by using Resnet 152v1 model

To start with the tutorial, I use an AWS Deep Learning AMI (DLAMI), which already provides support for Apache MXNet, EIA, ONNX and other required libraries. You can review Elastic Inference Prerequisites for the instructions related to Elastic Inference. For detailed instructions on how to launch a DLAMI with an Elastic Inference Accelerator, see the Elastic Inference documentation. I use the standard ResNet-152v1 ONNX model from model zoo for inference in MXNet.

Step 1: Activate the MXNet EI environment

To begin the tutorial, log in to your Deep Learning AMI with Conda console. Activate the Python 3 MXNet EI environment.

source activate amazonei_mxnet_p36

Step 2: Import dependencies and download

From the ONNX model zoo, download both the Resnet-152v1 model and synset.txt file, which contains class labels.

import mxnet as mx
import matplotlib.pyplot as plt
import numpy as np
from mxnet.gluon.data.vision import transforms
from mxnet.contrib.onnx.onnx2mx.import_model import import_model
import os
# Download model and synset.txt files containing class labels
mx.test_utils.download('https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet152v1/resnet152v1.onnx')
mx.test_utils.download('https://s3.amazonaws.com/onnx-model-zoo/synset.txt')
with open('synset.txt', 'r') as f:
    labels = [l.rstrip() for l in f]

# Download image for inference
img_path = mx.test_utils.download('https://s3.amazonaws.com/onnx-mxnet/examples/mallard_duck.jpg')

Step 3: Import ONNX model in MXNet and perform inference

Import ONNX model in MXNet with the help of ONNX-MXNet API.

# Enter path to the ONNX model file
model_path= 'resnet152v1.onnx'
sym, arg_params, aux_params = import_model(model_path)

Load the resnet152v1 network for inference using CPU context.

# Determine and set context
ctx = mx.cpu()
# Load module
mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))], 
         label_shapes=mod._label_shapes)
mod.set_params(arg_params, aux_params, allow_missing=True, allow_extra=True)

Define a predict function, which takes the path of the input image and prints the top five predictions.

# Preprocess input image
def preprocess(img):   
    transform_fn = transforms.Compose([
    	transforms.Resize(256),
    	transforms.CenterCrop(224),
    	transforms.ToTensor(),
    	transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
    img = transform_fn(img)
    img = img.expand_dims(axis=0)
    return img

def predict(path):
    img = preprocess(path)
    # Run forward pass
    mod.predict(mx.nd.array(img))
    # Take softmax to generate probabilities
    scores = mx.ndarray.softmax(mod.get_outputs()[0]).asnumpy()
    # print the top-5 inferences class
    scores = np.squeeze(scores)
    a = np.argsort(scores)[::-1]
    for i in a[0:5]:
        print('class=%s ; probability=%f' %(labels[i],scores[i]))

Plot the input image for inference.

img = mx.image.imread(img_path)
plt.imshow(img.asnumpy())

Step 4: Generate prediction on input image

The top five classes, in order, along with the probabilities generated for the image displayed are as below.

predict(img)

Result:
class=n01847000 drake ; probability=0.999519
class=n02018207 American coot, marsh hen, mud hen, water hen, Fulica americana ; probability=0.000230
class=n01855032 red-breasted merganser, Mergus serrator ; probability=0.000130
class=n01855672 goose ; probability=0.000044
class=n09332890 lakeside, lakeshore ; probability=0.000022

Evaluate your output and improve performance

Inference on this model takes approximately 131 milliseconds on C5.4xlarge. So, for 100,000 inference requests, this would cost $2.46 USD. This can be expensive for production use cases. So, let’s look at how Amazon Elastic Inference can help.

Amazon Elastic Inference is available in the following three sizes, making it efficient for a wide range of inference models including computer vision, natural language processing, and speech recognition.

eia1.medium: 8 teraflops of mixed-precision performance
eia1.large: 16 teraflops of mixed-precision performance
eia1.xlarge: 32 teraflops of mixed-precision performance

This lets you select the best price-to-performance ratio for your application. I ran the inference on the same model using GPU and EIA contexts to see the difference in the cost and performance.

To run the model with mx.eia() context, you just need to do minor changes in the code.

With EIA context, when you use either the Symbol API or the Module API, make sure you set for_training=False.
Set the context to bind your model as ctx=mx.eia().

EI typically aims to minimize the host instance CPU memory requirements by offloading to the EI accelerator, but some pre and post-processing must still be done on the host. Depending on the application’s compute and memory requirements, you can select the instance types that are most appropriate.

I evaluated performance of this model with C5 and M5 instances but found that this model required more CPU memory. The M5 instances with more RAM were the most cost effective solution. I ran tests with a few different sized M5 instances with an EIA1.Medium accelerator and observed that instance sizes larger than the M5.xlarge didn’t materially improve latency performance. Next, I tested the M5.xlarge with different EI accelerator sizes. Inference calls with an EIA1.large accelerator were significantly faster than an EIA1.Medium, but my EIA1.Medium at 50ms for an inference request met my requirements, so I didn’t need more horsepower.

Based on my requirements, I decided on an M5.xlarge with an EIA1.Medium as the right infrastructure combination for my workload. Comparing the hourly costs for the instances in our comparison: a P2.xlarge cost $0.90 per hour, whereas the M5.xlarge + EIA1.Medium costs $0.32 per hour, and lastly the C5.4xlarge is $0.68 per hour. But let’s also compare the cost to perform 100,000 inferences, this will incorporate hourly cost and performance to give us a meaningful comparison. The P2.xlarge costs $1.23 to execute 100,000 inferences, whereas this new EI based combination costs $0.45, a whopping 74% reduction in cost, sacrificing just 2% speed. If you use C5.4xlarge, it costs $2.47 and is 2.5x slower than M5.xlarge with EIA1.Medium! See the graph below for more information:

Conclusion

As you can see from the tutorial here, Amazon Elastic Inference gives you the opportunity to select the best price-to-performance ratio suitable for your application. For ONNX ResNet152 model inference, EIA1.medium is 2.5x faster and 81% cheaper than C5.4xlarge! Also with ONNX support, you can export models trained in different deep learning frameworks to run inference with EIA using Apache MXNet as a backend.

For general information about how to use EI, see Working with Amazon EI in the EC2 user guide. You can also find more information about ONNX support in MXNet, in the ONNX API documentation on the MXNet website.

About the Authors

Roshani Nagmote is a Software Developer for AWS Deep Learning. She focusses on building distributed Deep Learning systems and innovative tools to make Deep Learning accessible for all. In her spare time, she enjoys hiking, exploring new places and is a huge dog lover.

Vandana Kannan is a Software Developer for AWS Deep Learning focusing on building scalable deep learning systems. In her spare time, she enjoys painting, learning Indian classical dance, and spending time with family and friends.

Hagay Lupesko is an Engineering Manager for AWS Deep Learning. He focuses on building Deep Learning tools that enable developers and scientists to build intelligent applications. In his spare time he enjoys reading, hiking and spending time with his family.

Machine learning: What’s in it for government?

Written on February 18, 2019. Posted in Amazon.

Machine learning (ML) allows governments to deliver better, more cost-effective, and citizen-friendly services. We talked with three Amazon Web Services (AWS) customers from government authorities and institutes who shared their stories about how ML helped them transform their services and their organizations. These customers gathered at an executive learning track curated particularly for European Government delegates, as part of AWS re:Invent 2018.

The National Health Service Business Services Authority

The United Kingdom’s (UK) National Health Service Business Services Authority (NHSBSA), the organization overseeing the delivery of primary care, dental and prescription services to UK citizens, told us how they introduced Amazon Connect, a cloud-based chatbot to its contact center service to increase its capacity to respond to customer needs.

Chris Suter, Lead Cloud Architect of Digitisation, Insight, and Technology Solutions, shared the results of this investment with the group. In the first three weeks of its implementation, the chatbot helped NHSBSA respond to approximately 11,000 calls, addressing simple queries and rerouting complicated queries to staff who can provide more support. This helped NHSBSA save USD $650,000 per year.

NHSBSA used Amazon Lex to ensure that the calls were routed automatically and answered correctly, and they used Amazon Polly to simulate human-like speech. The ML-powered front end handles 40 percent of inbound calls, making staff available with an almost-zero customer queue time.

“This not only resulted in higher efficiency and cost savings for NHSBSA, but also boosted the morale of employees as they could focus their efforts on providing adequate guidance to customers with more complicated questions,” Suter said.

The Belgian public employment service VDAB

Another AWS customer, Belgian public employment service VDAB, wanted to know how they could use machine learning to improve job-matching, that is finding the right opportunities for the right people. Radix.ai’s JobNet used a deep learning model to enhance this function. With each new dataset, the engine learns how the job market evolves, noting changes in job demand and how trends shift over time.

The deep learning model goes beyond analysis of words in job descriptions and resumes to include information on interests and talents of job seekers. By using this service, employment officials want to provide better and faster connections between job seekers and available jobs.

The Royal Institute of Blind People

The impact of machine learning on people with disabilities has also been transformative. The Royal Institute of Blind People (RNIB) uses Amazon Polly to provide the UK’s largest community of blind and partially sighted people with reading services. RNIB’s Talking Books service provides access to over 26,000 audiobooks, free of charge. For millions of people in the UK, this service can be life changing.

More and more government customers are discovering that ML can be a game-changing technology for their users, and in turn for their businesses. These examples serve as starting points for governments.

About the AWS Institute

The AWS Institute, which curated this program, will publish more blog posts on how machine learning has an impact on the public sector.

The AWS Institute convenes global leaders who share an interest in solving some of the world’s most pressing challenges using technology. The Institute convenes leaders from government, academia, and nonprofit organizations for private discussions to explore innovative ideas to transform the public sector. For a related blog post on how to prepare governments for digital transformation, check out How Can Government Grow and Recruit Digital Talent? The Case of the UK Driver and Vehicle Licensing Agency.

About the Author

Maysam Ali is Global Content Lead for the Amazon Web Services Institute. She writes about the impact of technology on society. She helps governments, nonprofits and educational leaders better understand how they can use new technologies, including machine learning and artificial intelligence, to address major societal challenges.

Leonardo Quattrucci is the Lead for Europe, Middle East and Africa for the Amazon Web Services Institute. He works with government executives to accelerate public sector transformation. By innovating on policy processes and building digital competencies, he is helping leaders use technology to deliver better citizen services.

Creating hierarchical label taxonomies using Amazon SageMaker Ground Truth

Written on February 14, 2019. Posted in Amazon.

At re:Invent 2018 we launched Amazon SageMaker Ground Truth, which can Build Highly Accurate Datasets and Reduce Labeling Costs by up to 70% using machine learning. Amazon SageMaker Ground Truth offers easy access to public and private human labelers and provides them with built-in workflows and interfaces for common labeling tasks. Additionally, Amazon SageMaker Ground Truth lowers your labeling costs by using automated data labeling, which works by training Ground Truth from data labeled by humans so that the service learns to label data independently.

Let’s suppose we have a large corpus of images taken from cameras on a street. Each image might contain many different objects important for developing algorithms for driverless cars (e.g., vehicles or traffic signals). We must first define a hierarchical representation of the information that we want to capture from the images (see below for an example of what such a label taxonomy may look like). We then begin the labeling process by taking these raw, unlabeled images and labeling them with the high-level classes (e.g., ‘vehicles’, ‘traffic-signals’, and ‘pedestrian’.

In this blog post, we’ll show you how to accomplish such hierarchical labeling with Amazon SageMaker Ground Truth by chaining jobs and making use of the augmented manifest functionality.

How is this typically solved?

In supervised machine learning, we typically use a labeled dataset that contains both the raw data and the associated label for each data object. For example, you can have a training dataset of street images and classify them into “traffic-signals” or “no-traffic-signals” (where the label 0 and 1 correspond to the two classes). These labels are usually stored in a formats, such as CSV or JSON with the first column representing the raw data and the second column representing the label.

However, if you want to further label the same set of images (for example, to identify the type of traffic signals in the “traffic-signals” set), we typically create a new dataset by performing a filtering operation on the first dataset to select only those images with traffic signals in them. This reduces the dataset into another subset containing only “traffic-signals” (label 0) in it. Then we can add new label to classify traffic signals as ‘stop-sign’, ‘speed-limit’, and so on.

These kinds of filtering operations might become costly and time-consuming for large datasets. We might also want to mark all the stop signs and pedestrians in an image by an object detection (bounding box) algorithm. This typically requires us to create a third dataset by adding object detection labels around each of the stop signs and pedestrians in it. You can see that, as we continue classifying deep into the taxonomy, the number and complexity of the training datasets increases at approximately same rate as the fanout factor of the taxonomy (exponential in the most complex case).

In short when working with a hierarchical taxonomy, you need to be able to do all of the following:

Associate multiple layers of labels to an image, and be able to store and retrieve them efficiently and cost effectively.
Create a filtered dataset containing a given label efficiently and cost effectively.
Train a model for a given label in the dataset containing multiple labels.

Amazon SageMaker Ground Truth helps you accomplish these tasks easily using the job chaining and augmented manifest functionality.

Job chaining

When you label datasets, often there will be different types of labels (image classification, bounding boxes, semantic segmentation, etc.) that might require a different UI or dramatically different labeling context that would in turn require different instructions for the labelers for optimal quality. In such cases, it can become necessary to split the labeling job up into multiple runs. Amazon SageMaker Ground Truth supports this workflow through job chaining. Job chaining refers to the workflow where the output of one labeling job will feed into the next via the output augmented manifest. At each step we can also apply a filter based on the labels from the previous job.

Job chaining can be a cost-saving measure, if you only run more expensive tasks (for example, semantic segmentation) on images that have already been identified as containing the features that you want to bound. It can also provide the opportunity to mix worker types. Use public workers to perform simple labeling and filtering tasks and use a private and curated workforce to perform tasks that require more precision or domain expertise.

In our street-scenes example, we start with a manifest containing images, then we progressively add each level of the labels. In the process we segment the dataset down with a filter. The workflow will look something like this:

Collect the initial unlabeled data.
Job1: Label road-objects (image classification job). The output will be an augmented manifest with labels for vehicles, traffic signals, or pedestrians.
Job2: Select all images with vehicles and draw bounding boxes around each car in these images.

What is an augmented manifest?

An augmented manifest is a UTF-8 encoded JSON Lines file where each line is a complete and valid JSON object. Each line is delimited by a standard line break, n or rn. Since each line must be a valid JSON object, you can’t have unescaped line break characters within JSON. For more information about the data format, see JSON Lines.

Augmented manifests must contain a source field that defines a dataset object and also optionally includes attribute fields. Each labeling job outputs 2 additional attribute fields: one containing the label and another containing metadata associated with the label. The term augmented comes from the fact that the ground truth labels for the dataset objects are augmented inline. A new label for a dataset object is augmented as a new attribute field to the corresponding JSON line in the augmented manifest.

Normally with Amazon SageMaker training jobs there is one channel for training the actual image and an additional channel for the label. With augmented manifest one channel can stream both image and label. This cuts the number of channels in half, and it reduces the complexity of associating a label file with its corresponding image file.

This single, consistent format can be used as input to labeling jobs and input to training jobs without any additional transformation or reformatting. The format is transitive because the output of the labeling job is also the same format. This means that the output of a labeling job can be fed as input to another labeling job, thus facilitating the chaining of labeling jobs without any transformation or reformatting.

Let’s build an example augmented manifest to solve the taxonomy problem we described earlier.

Ok, let’s do this!

Let’s assume there are millions of images taken from cameras mounted in cars driving the public roadways. These images are stored in an Amazon S3 bucket location called s3://mybucket/datasets/streetscenes/. To start a labeling job to classify the images into vehicles, traffic signals, or pedestrian, we first need to create a manifest to be fed to Amazon SageMaker Ground Truth. The only mandatory field for a manifest is a field defining the dataset object. A dataset object can be an object in an Amazon S3 bucket, such as an image represented by a field “source-ref” pointing to s3Uri of the object or text that can be directly represented as “source” in the manifest. In this example, we’ll use the “source-ref” to point to our street scenes images. See the input section of Amazon SageMaker Ground Truth for more details.

Step 1: Downloading the example dataset

For this example, I’m going to use the CBCL StreetScenes dataset. This dataset has over 3000 images, but we’ll just use a selection of 10 images. The full dataset is approximately 2 GB. You can choose to upload all of the images to Amazon S3 for labeling, or just a selection of them.

Download the images.zip from here: Download.
Extract the zip archive to a folder. (By default the folder will be “Output.”)

Create a small sample dataset to work with:

$ mkdir streetscenes
$ cp Original/SSDB00001.JPG ./streetscenes/
$ cp Original/SSDB00006.JPG ./streetscenes/
$ cp Original/SSDB00016.JPG ./streetscenes/
$ cp Original/SSDB00021.JPG ./streetscenes/
$ cp Original/SSDB00042.JPG ./streetscenes/
$ cp Original/SSDB00003.JPG ./streetscenes/
$ cp Original/SSDB00011.JPG ./streetscenes/
$ cp Original/SSDB00020.JPG ./streetscenes/
$ cp Original/SSDB00025.JPG ./streetscenes/
$ cp Original/SSDB00279.JPG ./streetscenes/

Go to the Amazon S3 console and create the ‘streetscenes’ folder in your bucket. (Note: Amazon S3 is a key-value store, so there is no concept of folders. However, the AmazonS3 console gives a sense of folder structure by using forward slashes in the key. So we use the console to create the folder.)

Upload the following files to your Amazon S3 bucket (s3://mybucket/datasets/streetscenes/). You can use the Amazon S3 console or this AWS CLI command:

aws s3 sync streetscenes/ s3://cnidus-ml-iad/datasets/streetscenes/
upload: streetscenes/.DS_Store to s3://cnidus-ml-iad/datasets/streetscenes/.DS_Store
upload: streetscenes/SSDB00011.JPG to s3://cnidus-ml-iad/datasets/streetscenes/SSDB00011.JPG
upload: streetscenes/SSDB00020.JPG to s3://cnidus-ml-iad/datasets/streetscenes/SSDB00020.JPG
upload: streetscenes/SSDB00042.JPG to s3://cnidus-ml-iad/datasets/streetscenes/SSDB00042.JPG
upload: streetscenes/SSDB00001.JPG to s3://cnidus-ml-iad/datasets/streetscenes/SSDB00001.JPG
upload: streetscenes/SSDB00016.JPG to s3://cnidus-ml-iad/datasets/streetscenes/SSDB00016.JPG
upload: streetscenes/SSDB00006.JPG to s3://cnidus-ml-iad/datasets/streetscenes/SSDB00006.JPG
upload: streetscenes/SSDB00021.JPG to s3://cnidus-ml-iad/datasets/streetscenes/SSDB00021.JPG
upload: streetscenes/SSDB00025.JPG to s3://cnidus-ml-iad/datasets/streetscenes/SSDB00025.JPG
upload: streetscenes/SSDB00279.JPG to s3://cnidus-ml-iad/datasets/streetscenes/SSDB00279.JPG
upload: streetscenes/SSDB00003.JPG to s3://cnidus-ml-iad/datasets/streetscenes/SSDB00003.JPG

Step 2: Creating an input manifest

In the Amazon SageMaker console for Ground Truth, there is a crawling tool (see the “create manifest file” link in the input to labeling job) you can use for Ground Truth. This tool helps us create the manifest by crawling an Amazon S3 location containing raw data (image or text). For images, the crawler takes an input s3Prefix and crawls all of the image files (with extensions .jpg, .jpeg, .png) in that prefix and creates a manifest with each line as {“source-ref”:”<s3-location-of-crawled-image>”}. For text, the crawler takes an input s3Prefix and crawls all text files (with extensions .txt, .csv) in that prefix and reads each line of each of the text files in the prefix, and creates a manifest with each line as {“source”:”<one-line-of-text>”}.

In the Amazon SageMaker console, start the process by creating a labeling job. First choose Labeling jobs in the left navigation pane, and then choose the Create labeling job button:

Next choose Create manifest file.

This opens the create manifest file page. Enter the s3 path that you uploaded the files to (be sure to include the trailing slash). Next choose Create and then Use this manifest. (It will take a few seconds to create the manifest.)

For our taxonomy example, the objects are images in Amazon S3, so we can use the crawling to create the initial manifest with each line of JSON containing a field “source-ref” pointing to the s3Uri of an image.

{"source-ref":"s3://cnidus-ml-iad/datasets/streetscenes/SSDB00001.JPG"}
{"source-ref":"s3://cnidus-ml-iad/datasets/streetscenes/SSDB00006.JPG"}
{"source-ref":"s3://cnidus-ml-iad/datasets/streetscenes/SSDB00016.JPG"}
...
...

Job 1: Labeling road objects

Now, from the console we can start a labeling job using the image classification task type to classify images as containing vehicles, traffic signals, or pedestrians. We use this file as input and “streetscenes-road-objects1” as the job name (or you can start using the AWS API with the LabelAttributeName set to “streetscenes-road-objects1”). See this previous article on how to start a labeling job.

The output of the labeling job is an augmented manifest with the corresponding label augmented in each of the previous JSON lines. See the output data documentation for details on the format for different modalities. Note that if we enable automated data labeling we will also get a model as another output artifact (see this blog post for more details on automated data labeling).

{"source-ref":"s3://cnidus-ml-iad/datasets/streetscenes/SSDB00001.JPG","streetscenes-road-objects1":0,"streetscenes-road-objects1-metadata":{"confidence":0.95,"job-name":"labeling-job/streetscenes-road-objects1","class-name":"vehicles" ,"human-annotated":"yes","creation-date":"2018-12-12T01:30:14.449763","type":"groundtruth/image-classification"}}
{"source-ref":"s3://cnidus-ml-iad/datasets/streetscenes/SSDB00006.JPG","streetscenes-road-objects1":0,"streetscenes-road-objects1-metadata":{"confidence":0.95,"job-name":"labeling-job/streetscenes-road-objects1","class-name":"vehicles" ,"human-annotated":"yes","creation-date":"2018-12-12T01:26:08.019726","type":"groundtruth/image-classification"}}
{"source-ref":"s3://cnidus-ml-iad/datasets/streetscenes/SSDB00011.JPG","streetscenes-road-objects1":1,"streetscenes-road-objects1-metadata":{"confidence":0.95,"job-name":"labeling-job/streetscenes-road-objects1","class-name":"no-vehicles" ,"human-annotated":"yes","creation-date":"2018-12-12T01:26:08.019714","type":"groundtruth/image-classification"}}
...
...

Job 2: Adding a bounding box around ‘cars’

Now that I have a dataset with labels for the road objects that are present in each image, I can define a job to add the next layer of the taxonomy. The next layer is to add bounding boxes around the individual objects in the images.

Note: I could run an intermediate second classification job to split vehicles into cars, bicycles, trucks, etc., but for this example, I’ll just create a bounding box job for cars and feed all images with vehicles. In practice, with larger datasets you can choose to perform the intermediate classification because it will reduce the number of objects for each job and also provide the opportunity to run the jobs in parallel.

In the following screenshot, you can see that I’m following a naming scheme for the second job that is similar to the first job. I’ve also selected the output.manifest from Job1.

Filter: selecting vehicles

After classifying the images into those containing three types of road objects (first level of the taxonomy), we now intend to filter the dataset to contain only vehicles, so that we can start another labeling job to identify objects (bounding boxes) representing cars. The Amazon SageMaker console is equipped with a query engine powered by S3 Select to facilitate the filtering of the dataset to clean up or create subset of data.

In this case, we can apply the following query in the query box to filter the augmented manifest and create a subset containing only images with “vehicles” in it.

select * from s3Object s where s."streetscenes-road-objects1-metadata"."class-name" = 'vehicles';

Next choose Create subset and then Use this subset. This will produce a new manifest (in this example, 7 rows) as follows:

The new augmented manifest will look something like this:

{"source-ref":"s3://cnidus-ml-iad/datasets/streetscenes/SSDB00001.JPG","streetscenes-road-objects1":0,"streetscenes-road-objects1-metadata":{"confidence":0.95,"job-name":"labeling-job/streetscenes-road-objects1","class-name":"vehicles" ,"human-annotated":"yes","creation-date":"2018-12-12T01:30:14.449763","type":"groundtruth/image-classification"}}
{"source-ref":"s3://cnidus-ml-iad/datasets/streetscenes/SSDB00003.JPG","streetscenes-road-objects1":0,"streetscenes-road-objects1-metadata":{"confidence":0.95,"job-name":"labeling-job/streetscenes-road-objects1","class-name":"vehicles" ,"human-annotated":"yes","creation-date":"2018-12-12T01:21:57.370330","type":"groundtruth/image-classification"}}
{"source-ref":"s3://cnidus-ml-iad/datasets/streetscenes/SSDB00006.JPG","streetscenes-road-objects1":0,"streetscenes-road-objects1-metadata":{"confidence":0.95,"job-name":"labeling-job/streetscenes-road-objects1","class-name":"vehicles", "human-annotated":"yes","creation-date":"2018-12-12T01:26:08.019726","type":"groundtruth/image-classification"}}
{"source-ref":"s3://cnidus-ml-iad/datasets/streetscenes/SSDB00016.JPG","streetscenes-road-objects1":0,"streetscenes-road-objects1-metadata":{"confidence":0.95,"job-name":"labeling-job/streetscenes-road-objects1","class-name":"vehicles" ,"human-annotated":"yes","creation-date":"2018-12-12T01:19:53.472224","type":"groundtruth/image-classification"}}
{"source-ref":"s3://cnidus-ml-iad/datasets/streetscenes/SSDB00020.JPG","streetscenes-road-objects1":0,"streetscenes-road-objects1-metadata":{"confidence":0.95,"job-name":"labeling-job/streetscenes-road-objects1","class-name":"vehicles" ,"human-annotated":"yes","creation-date":"2018-12-12T01:26:08.019736","type":"groundtruth/image-classification"}}
{"source-ref":"s3://cnidus-ml-iad/datasets/streetscenes/SSDB00021.JPG","streetscenes-road-objects1":0,"streetscenes-road-objects1-metadata":{"confidence":0.95,"job-name":"labeling-job/streetscenes-road-objects1","class-name":"vehicles" ,"human-annotated":"yes","creation-date":"2018-12-12T01:19:53.472244","type":"groundtruth/image-classification"}}
{"source-ref":"s3://cnidus-ml-iad/datasets/streetscenes/SSDB00042.JPG","streetscenes-road-objects1":0,"streetscenes-road-objects1-metadata":{"confidence":0.94,"job-name":"labeling-job/streetscenes-road-objects1","class-name":"vehicles",
"human-annotated":"yes","creation-date":"2018-12-12T01:25:03.089097","type":"groundtruth/image-classification"}}

Job2 settings

After we have our filtered augmented manifest, we need to again select a workforce and write instructions for the job. In this case, I’m going to select a private workforce consisting of me, myself, and I.

As with Job1, we could have selected a public workforce, but this is to demonstrate that you can use a different workforce with more domain expertise on a smaller dataset. For a simple task like bounding cars, likely any workforce option would work well with good instructions. However, in another example, like medical imaging, you might want to have trained radiologists classify cancerous cells after a more simple filtering/classification has been performed by a less expensive workforce.

After defining the job parameters, I’ll need to write some sensible instructions. Ideally you would draw bounding boxes around the objects to show how you expect them to be in the output. However, in this case, since I will be the annotator, I’ll use the default image to describe the task.

Results

When the job is complete, you can see what the bounding boxes look like in the console job output.

The completed manifest is augmented with “streetscenes-road-objects2” and “streetscenes-road-objects2-metadata” fields in the above manifest. For example, the first JSON line in this manifest will become:

{
"source-ref":"s3://cnidus-ml-iad/datasets/streetscenes/SSDB00001.JPG",
"streetscenes-road-objects1":0,
"streetscenes-road-objects1-metadata":
{
"confidence":0.95,
"job-name":"labeling-job/streetscenes-road-objects1",
"class-name":"vehicles",
"human-annotated":"yes",
"creation-date":"2018-12-12T01:30:14.449763",
"type":"groundtruth/image-classification"
},
"streetscenes-road-objects2": { "annotations":[ {"class_id":0,"width":38,"top":490,"height":24,"left":351}, {"class_id":0,"width":85,"top":505,"height":53,"left":450}, {"class_id":0,"width":65,"top":489,"height":43,"left":592}, {"class_id":0,"width":59,"top":480,"height":43,"left":524}, {"class_id":0,"width":354,"top":471,"height":150,"left":567} ], "image_size":[{"width":1280,"depth":3,"height":960}]}, "streetscenes-road-objects2-metadata": { "job-name":"labeling-job/streetscenes-road-objects2", "class-map":{"0":"car"}, "human-annotated":"yes", "objects":[ {"confidence":0.09}, {"confidence":0.09}, {"confidence":0.09}, {"confidence":0.09}, {"confidence":0.09} ], "creation-date":"2018-12-12T18:40:15.710919", "type":"groundtruth/object-detection" }
}

Multi-label?

You might notice in the results or in the labeler UI that only a single label can be selected for Job1: Labeling Road objects. This will manifest as each image containing only a single label (vehicle, traffic signal or pedestrian). For this dataset, it’s perfectly valid for a given image to contain multiple labels, for example, there could be a car, pedestrian, AND a stop sign in a single image.

Currently the image-classifier in Ground Truth only supports labeling an image with a single label. For the purpose of this example, I opted to keep it simple and use the default image classifier. To extend to multi-label there are a couple of options:

Image classification jobs per label: Vehicles, pedestrians and traffic signals would be separate jobs. Each image would be run using all jobs (in parallel if desired).
Create a custom labeling workflow: Ground Truth provides a workflow where the customer can provide the HTML for worker input. Using this method, you could create a workflow that allows for multiple labels to be applied to a single image in a single pass.

Next steps: Training with an augmented manifest that contains multiple labels

A key feature of the augmented manifest is that the same manifest can contain labels from many different labeling jobs using the chaining method described in this blog post. We can use the augmented manifest to train a model for any desired label in it. For example, the manifest in this blog post contains labels from two jobs: “streetscenes-road-objects1” and “streetscenes-road-objects2”.

We can train an image classification model to classify road objects by directly using this output manifest without any transformation to start an Amazon SageMaker training job using S3DataType to AugmentedManifestFile and AttributeNames to [“source-ref”, “streetscenes-road-objects1″].

The same manifest can be used to train an object detection model to identify cars by directly using this output manifest without any transformation to start an Amazon SageMaker training job using S3DataType to AugmentedManifestFile and AttributeNames to [“source-ref”, “streetscenes-road-objects2”].

See this sample notebook to start an Amazon SageMaker training job using Augmented Manifest.

Conclusion

The blog post shows you how job chaining and augmented manifest can be used to associate multiple labels across your hierarchical label taxonomy. The augmented manifest contains all of the labels inline in a single manifest, and you can use this manifest directly in Amazon SageMaker training jobs. In addition, you learned how to create a subset of the dataset based on labels or metadata using the Ground Truth filtering and sampling capabilities.

We hope this post was informative, and we have just scratched the surface of what Amazon SageMaker Ground Truth can do. The service is available today in the following AWS Regions: US East (Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo). Please let us know what you think!

About the authors

Doug Youd is a Solutions Architect with AWS covering strategic accounts. He has a background in networking and virtualization, but more recently has been working on ML projects for his customers. In his spare time he enjoys tinkering with classic cars & motorsport.

Zahid Rahman is a SDE in AWS AI where he builds large scale distributed systems to solve complex machine learning problems . He is primarily focused on innovating technologies that can ‘Divide and Conquer’ Big Data problem.

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Global

Step 1: Navigate to the Vendor tab for Labeling Workforces

Step 2: Subscribe to the labeling services of a vendor through AWS Marketplace

Step 3: Select a vendor when setting up your labeling job

Now Available

About the Author

Step 1: Launch SageMaker notebook instance and set up exercise code

Step 2: Set up Amazon SageMaker role and download data

Step 3: Upload input data for Amazon SageMaker

Step 4: Create preprocessing script

Step 5: Fit the data preprocessor

Step 6: Batch transform training data

Step 7: Fit the Linear Learner model with preprocessed data

Step 8: Create inference pipeline

Step 9: Make a prediction using the inference pipeline

Step 10: Delete endpoint

Conclusion

Citations

About the Authors

Understanding the dataset

Preparing the image dataset

Training the object detection model using the Amazon SageMaker built-in algorithm

Hosting the model using an Amazon SageMaker endpoint

Testing the model

Running your model on the edge using AWS DeepLens

Convert your model artifacts before deploying to AWS DeepLens

Optimize the model from your AWS Lambda function on AWS DeepLens

Performing model inference on AWS DeepLens

Conclusion

About the author

Learn more

About the Author

The benefits of Amazon Elastic Inference

Getting started with inference by using Resnet 152v1 model

Step 1: Activate the MXNet EI environment

Step 2: Import dependencies and download

Step 3: Import ONNX model in MXNet and perform inference

Step 4: Generate prediction on input image

Evaluate your output and improve performance

Conclusion

About the Authors

The National Health Service Business Services Authority

The Belgian public employment service VDAB

The Royal Institute of Blind People

About the AWS Institute

About the Author

How is this typically solved?

Job chaining

What is an augmented manifest?

Ok, let’s do this!

Step 1: Downloading the example dataset

Step 2: Creating an input manifest

Job 1: Labeling road objects

Job 2: Adding a bounding box around ‘cars’

Filter: selecting vehicles

Job2 settings

Results

Multi-label?

Next steps: Training with an augmented manifest that contains multiple labels

Conclusion

About the authors