Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Global

Custom deep reinforcement learning and multi-track training for AWS DeepRacer with Amazon SageMaker RL Notebook

AWS DeepRacer, launched at re:Invent 2018, helps developers get hands on with reinforcement learning (RL).  Since then, thousands of people have developed and raced their models at 21 AWS DeepRacer League events at AWS Summits across the world, and virtually via the AWS DeepRacer console. Beyond the summits there have been several events at AWS Lofts, developer meetups, partner sessions, and corporate events.

The enthusiasm among developers to learn and experiment in AWS DeepRacer is exceptionally high. Many want to explore further and have greater ability to modify the neural network architecture, modify the training presets, or train on multiple tracks in parallel.

AWS DeepRacer makes use of several other AWS services: Amazon SageMaker, AWS RoboMaker, Amazon Kinesis Video Streams, Amazon CloudWatch, and Amazon S3. To give you more fine-grained control on each of these components to extend the simulation environment and modeling environment, this post includes a notebook environment that helps provision and manage these environments so you can modify any aspect of the AWS DeepRacer experience. For more information, see the GitHub repo for this post.

This post explores how to set up an environment, dives into the main components of the AWS DeepRacer code base, and walks you through modifying your neural network and training presets, customizing your action space, and training on multiple tracks in parallel. By the end, you should understand how to modify the AWS DeepRacer model training using Amazon SageMaker.

By utilizing the tools behind the AWS DeepRacer console, developers can customize and modify every aspect of their AWS DeepRacer training and models, allowing them to download models to race in person and participate in the AIDO 3 challenge at NeurIPS.

Setting up your AWS DeepRacer notebook environment

To get started, log in to the AWS Management Console and complete the following steps:

  1. From the console, under SageMaker, choose Notebook instances.
  2. Choose Create notebook instance.
  3. Give your notebook a name. For example, DeepracerNotebook.

Because AWS RoboMaker and Amazon SageMaker do the heavy lifting in training, the notebook itself does not need much horsepower.

  1. Leave the instance type as the default ml.t2.medium.
  2. Choose Additional configuration.
  3. For Volume size, set it to at least 25 GB.

This size gives enough room to rebuild the training environment and the simulation application.

  1. Choose Create a new role.
  2. Choose Any S3 bucket.
  3. Choose Create role.

If this is not your first time using Amazon SageMaker Notebooks, select a valid role from the drop-down list.

  1. Leave all other settings as the default.
  2. Choose Create notebook instance.

Here is a screencast showing you how to set up the notebook environment.

It takes a few minutes for the Notebook instance to start. When it’s ready, choose Open Jupyter.

Loading your notebook

To load the AWS DeepRacer sample notebook, complete the following steps:

  1. Choose SageMaker Examples.
  2. Choose Reinforcement Learning.
  3. Next to deepracer_rl.ipynb, choose Use.
  4. Choose Create copy.

This process copies the AWS DeepRacer notebook stack to your notebook instance (found under the Files tab under a rl_deepracer_robomaker_coach_gazebo_YYYY-MM-DD directory), and opens the main notebook file in a new tab.

Here is a screencast of this process:

The AWS DeepRacer notebook environment

You can modify the following files to customize the AWS DeepRacer training and evaluations in any way desired:

  • src/training_worker.py – This file handles either loading a pre-trained model or creating a new neural network (using a presets file), setting up the data store, and starting up a Redis server for the communication between Amazon SageMaker and AWS RoboMaker.
  • src/markov/rollout_worker.py – This file runs on the Amazon SageMaker training instance, and downloads the model checkpoints from S3 (initially created by the training_worker.py, and updated by previous runs of rollout_worker.py) and runs the training loops.
  • src/markov/evaluation_worker.py – This file is used during evaluation to evaluate the model. It downloads the model from S3 and runs the evaluation loops.
  • src/markov/sagemaker_graph_manager.py – This file runs on the Amazon SageMaker training instance, and instantiates the RL class, including handling the hyperparameters passed in, and sets up the input filters, such as converting the camera input to grayscale.
  • src/markov/environments/deepracer_racetrack_env.py – This file is loaded twice—both on the Amazon SageMaker training instance, and the AWS RoboMaker instance. It uses the environmental variable NODE_TYPE to determine which environment is running. The AWS RoboMaker instance runs the Robotics Operating System (ROS) code. This file does most of the work of interacting with the AWS RoboMaker environment, such as resetting the car when it goes off the track, collecting the reward function parameters, executing the reward function, and logging to CloudWatch.

You can also add files to the following directories for further customization:

  • src/markov/rewards – This directory stores sample reward functions. These are copied to S3 and passed on to Amazon SageMaker in the notebook. The notebook copies the selected one to S3, where the deepracer_racetrack_env.py fetches and runs it.
  • src/markov/actions – This directory contains a series of JSON files that define the action taken for each of the nodes in the last row of the neural network. The one selected (or any new ones created) should match the number of output nodes in your neural network. The notebook copies the selected one to S3, where the rollout_worker.py script fetches it.
  • src/markov/presets – This directory contains files in which one can modify the RL algorithm and modify other parameters such as the size and shape of the neural network. The notebook copies the selected one to S3, where the rollout_worker.py script fetches it.
  • Dockerfile – This contains directions for building the container that is deployed to the Amazon SageMaker training instance. The container is built on a standard Ubuntu base, and the src/markov directory is copied into the container. It also has a series of packages installed that AWS DeepRacer uses.

Customizing neural network architectures for RL

You may be interested in how to customize the neural network architecture to do things such as add an entry, change the algorithm, or change the size and shape of the network.

As of this writing, AWS DeepRacer uses the open source package Intel RL Coach to run state-of-the-art RL algorithms. In Intel RL Coach, you can edit the RL algorithm hyperparameters, including but not limited to training batch size, exploration method, and neural network architecture by creating a new presets file.

For examples from the GitHub repo, see defaults.py and preset_attention_layer.py. Specific to your notebook setup, when you make changes to the preset file, you also need to modify sagemaker_graph_manager.py to reflect any appropriate changes to the hyperparameters or algorithm settings to match the new preset file.

Once you have the new file located in the presets/ directory, modify the notebook file to use the new presets file by editing the “Copy custom files to S3 bucket so that Amazon SageMaker and AWS RoboMaker can pick it up” section. See the following code:

s3_location = "s3://%s/%s" % (s3_bucket, s3_prefix)
print(s3_location)

# Clean up the previously uploaded files
!aws s3 rm --recursive {s3_location}

# Make any changes to the environment and preset files below and upload these files
!aws s3 cp src/markov/environments/deepracer_racetrack_env.py {s3_location}/environments/deepracer_racetrack_env.py

!aws s3 cp src/markov/rewards/default.py {s3_location}/rewards/reward_function.py

!aws s3 cp src/markov/actions/model_metadata_10_state.json {s3_location}/model_metadata.json

#!aws s3 cp src/markov/presets/default.py {s3_location}/presets/preset.py
!aws s3 cp src/markov/presets/preset_attention_layer.py {s3_location}/presets/preset.py

The modified last line copies the preset_attention_layer.py instead of the default.py to the S3 bucket. Amazon SageMaker and AWS RoboMaker copy the changed files from the S3 bucket during the initialization period before starting to train.

Customizing the action space and noise injection

The action space defines the output layer of the neural network and how the car acts upon choosing the corresponding output node. The output of the neural network is an array of size equal to the number of actions. The array contains the probabilities taking a particular action. This post uses the index of the output node with the highest probability.

You can obtain the action, speed, and steering angle corresponding to the index of the maximum probability output node via a mapping written in standard JSON. The AWS RoboMaker simulation application uses the JSON file to determine the speed and steering angle during training as well as evaluation phases. The following code example defines five nodes with the same speed, varying only by the steering angle:

{
    "action_space": [
        {
            "steering_angle": -30,
            "speed": 0.8,
            "index": 0
        },
        {
            "steering_angle": -15,
            "speed": 0.8,
            "index": 1
        },
        {
            "steering_angle": 0,
            "speed": 0.8,
            "index": 2
        },
        {
            "steering_angle": 15,
            "speed": 0.8,
            "index": 3
        },
        {
            "steering_angle": 30,
            "speed": 0.8,
            "index": 4
        }
    ]
}

The units for steering angle and speed are degrees and meters per second, respectively. Deepracer_env.py loads the JSON file to execute a given action for a specified output node. This file is also bundled with the exported model for loading on the physical car for the same reason, that is, to map the neural network output nodes to the corresponding steering angle and speed from the simulation to the real world.

The more permutations you have in your action space, the more nodes there are in the output layer of the neural network. More nodes mean bigger matrices for mathematical operations during training; therefore, training takes longer.

The following Python code helps generate custom action spaces:

#!/usr/bin/env python

import json

min_speed = 4
max_speed = 8
speed_resolution = 2

min_steering_angle = -30
max_steering_angle = 30
steering_angle_resolution = 15

output = {"action_space":[]}
index = 0
speed = min_speed
while speed <= max_speed:
    steering_angle = min_steering_angle
    while steering_angle <= max_steering_angle:
        output["action_space"].append( {"index":index,
                                         "steering_angle":steering_angle,
                                         "speed":speed}
                                     )
        steering_angle += steering_angle_resolution
        index += 1
    speed += speed_resolution

print json.dumps(output,indent=4)

Improving your simulation-to-real world transfer

Robotics research has shown that introducing entropy and noise into the simulation helps the model identify more appropriate features and react more appropriately to real-world conditions, leading to better a simulation-to-real world transfer. Keep this in mind while developing new algorithms and networks.

For example, AWS DeepRacer already includes some random noise for the steering angle and speed to account for the changes in the friction and deviations in the mechanical components during manufacturing. You can see this in the following code in src/markov/environments/deepracer_racetrack_env.py:

   def step(self, action):
        self.steering_angle = float(self.json_actions[action]['steering_angle']) * math.pi / 180.0
        self.speed = float(self.json_actions[action]['speed']) + 
    
        ## NOISE ##    
        # Add random NOISE in to both the steering angle and speed
        self.steering_angle += 0.01 * np.random.normal(0, 1.0, 1)
        self.speed += 0.1 * np.random.normal(0, 1.0, 1)

In addition to steering and speed noise, you may want to account for variations in lighting, track material, track conditions, and battery charge levels. You can modify these in the environment code or the AWS RoboMaker world configuration files.

Multi-track training in parallel

You can train your models faster by training on multiple simulation environments with a single training job. For example, one simulation environment may use a road with concrete material, while the other uses carpet. As the parallel AWS RoboMaker environments generate batches, the training instance uses the information from all the simulations to train the model. This strategy helps make sure that the model can identify features of the road instead of some aspect of a single map, or operate under various textures or lighting conditions.

AWS RoboMaker uses Gazebo, an open source 3D robotics simulator. World files define Gazebo environments and use model definitions and collada files to build an environment. The standard AWS DeepRacer simulation application includes several word files: reinvent_base, reinvent_carpet, reinvent_concrete, reinvent_wood, AWS_track, Bowtie_track, Oval_track, and Straight_track. New tracks are released regularly as part of the virtual league; you can identify them by the WORLD_NAME environmental variable on the AWS RoboMaker simulation job.

To run parallel simulation applications with varying world configurations, modify the “Launch the Simulation job on AWS RoboMaker” section of the notebook. See the following code:

import datetime #need microsecond precision to avoid collisions 

envriron_vars = {
    "KINESIS_VIDEO_STREAM_NAME": "SilverstoneStream",
    "SAGEMAKER_SHARED_S3_BUCKET": s3_bucket,
    "SAGEMAKER_SHARED_S3_PREFIX": s3_prefix,
    "TRAINING_JOB_ARN": job_name,
    "APP_REGION": aws_region,
    "METRIC_NAME": "TrainingRewardScore",
    "METRIC_NAMESPACE": "AWSDeepRacer",
    "REWARD_FILE_S3_KEY": "%s/rewards/reward_function.py" % s3_prefix,
    "MODEL_METADATA_FILE_S3_KEY": "%s/model_metadata.json" % s3_prefix,
    "METRICS_S3_BUCKET": s3_bucket,
    "METRICS_S3_OBJECT_KEY": s3_bucket + "/training_metrics.json",
    "TARGET_REWARD_SCORE": "None",
    "NUMBER_OF_EPISODES": "0",
    "ROBOMAKER_SIMULATION_JOB_ACCOUNT_ID": account_id
}

vpcConfig = {"subnets": deepracer_subnets,
             "securityGroups": deepracer_security_groups,
             "assignPublicIp": True}

worldsToRun = ["reinvent_base","reinvent_carpet","reinvent_concrete","reinvent_wood"]

responses = []
for world_name in worldsToRun:
    envriron_vars["WORLD_NAME"]=world_name
    simulation_application = {"application":simulation_app_arn,
                              "launchConfig": {"packageName": "deepracer_simulation_environment",
                                               "launchFile": "distributed_training.launch",
                                               "environmentVariables": envriron_vars}
                              }
    client_request_token = datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S-%f") 
    response =  robomaker.create_simulation_job(iamRole=sagemaker_role,
                                            clientRequestToken=client_request_token,
                                            maxJobDurationInSeconds=job_duration_in_seconds,
                                            failureBehavior="Continue",
                                            simulationApplications=[simulation_application],
                                            vpcConfig=vpcConfig
                                            )
    responses.append(response)

print("Created the following jobs:")
job_arns = [response["arn"] for response in responses]
for response in responses:
    print("Job ARN", response["arn"]) 

The modified list loops over the new worldsToRun list, and the definition of the simulation_application dictionary is inside the loop (because the envriron_vars dictionary needs to update with a new WORLD_NAME each time). Additionally, the modified clientRequestToken uses microseconds with the datetime module because the old method may have resulted in an error if two jobs were submitted within the same second.

Custom evaluation

The standard AWS DeepRacer console evaluation runs three episodes. If a car goes off the track, that episode is over, and the percentage completed and time thus far is recorded. The number of episodes can be passed in, as the sample notebook demonstrates with the NUMBER_OF_TRIALS assignment in the envriron_vars dictionary. However, you can modify this behavior in the evaluation_worker.py file. To get as many runs in as possible in four minutes, change the following code (lines 37–39):

    while curr_num_trials < number_of_trials:
        graph_manager.evaluate(EnvironmentSteps(1))
        curr_num_trials += 1

The following is the updated code:

    import time
    starttime = time.time()
    while time.time()-starttime < 240:  #240 seconds = 4 minutes
        graph_manager.evaluate(EnvironmentSteps(1))
        curr_num_trials += 1

This lets the car run for four minutes, as per the AWS Summit Physical track rules.

To take this further and simulate the AWS Summit physical race reset rules, wherein a car can be moved back onto the track up to three times before the episode ends, modify the infer_reward_state() function in deepracer_racetrack_env.py. See the following code (lines 396 and 397):

             done = True
             reward = CRASHED

The following is the updated code:

            reward = CRASHED
            try:
              self.resets +=1
            except:
              self.resets = 1 #likely this is the first reset and the variable hadn't been defined before
            if self.resets > 3:
              done = True
            else:
              done = False
              #Now reset everything back onto the track
              self.steering_angle = 0
              self.speed = 0
              self.action_taken = 0
              self.send_action(0, 0)
              for joint in EFFORT_JOINTS:
                  self.clear_forces_client(joint)
              current_ndist -= model_point.distance(self.prev_point)/2  #Try to get close to where the car went off
              prev_index, next_index = self.find_prev_next_waypoints(current_ndist)
              self.reset_car_client(current_ndist, next_index)
              #Clear the image queue so that we don't train on an old state from before moving the car back to the track
              _ = self.image_queue.get(block=True, timeout=None)
              self.set_next_state()

Conclusion

AWS DeepRacer is a fun way to get started with reinforcement learning. To build your autonomous model, all you need is to write a proper reward function in Python. For developers that want to dive deep into the code and environment to extend AWS DeepRacer, this post also provides a notebook environment to do so.

This post showed you how to get started with the notebook environment, customize the training algorithm, modify the action space, train on multiple tracks, and run custom evaluation methods. Please share what you come up with!

A subsequent post dives into modifying the AWS RoboMaker simulation application to train and evaluate on your custom tracks. The post gives tips and tricks on shaping the tracks, shares code for generating tracks, and discusses how to package them for AWS DeepRacer.


About the authors

Neal McFee is a Partner Solutions Architect with AWS. He is passionate about solutions that span Robotics, Computer Vision, and Autonomous systems. In his spare time, he flies drones and works with AWS customers to realize the potential of reinforcement learning via DeepRacer events.

 

 

 

Don Barber is a Senior Solutions Architect, with over 20 years of experience helping customers solve business problems with technology in regulated industries such as finance, pharma, and government. He has a Bachelors in Computer Science from Marietta College and a MBA from the University of Maryland. Outside of the office he spends time with his family and hobbies such as amateur radio and repairing electronics.

 

 

Sunil Mallya is a Senior Solutions Architect in the AWS Deep Learning team. He helps our customers build machine learning and deep learning solutions to advance their businesses. In his spare time, he enjoys cooking, sailing and building self driving RC autonomous cars.

 

 

Sahika Genc is a senior applied scientist at Amazon artificial intelligence (AI). Her research interests are in smart automation, robotics, predictive control and optimization, and reinforcement learning (RL), and she serves in the industrial committee for the International Federation of Automatic Control. She leads science teams in scalable autonomous driving and automation systems, including consumer products such as AWS DeepRacer and SageMaker RL. Previously, she was a senior research scientist in the Artificial Intelligence and Learning Laboratory at the General Electric (GE) Global Research Center

 

 

 

SPICE: Self-Supervised Pitch Estimation

A sound’s pitch is a qualitative measure of its frequency, where a sound with a high pitch is higher in frequency than one of low pitch. Through tracking relative differences in pitch, our auditory system is able to recognize audio features, such as a song’s melody. Pitch estimation has received a great deal of attention over the past decades, due to its central importance in several domains, ranging from music information retrieval to speech analysis.

Traditionally, simple signal processing pipelines were proposed to estimate pitch, working either in the time domain (e.g., pYIN) or in the frequency domain (e.g., SWIPE). But until recently, machine learning methods have not been able to outperform such hand-crafted signal processing pipelines. This was due to the lack of annotated data, which is particularly tedious and difficult to obtain at the temporal and frequency resolution required to train fully supervised models. The CREPE model was able to overcome these limitations to achieve state-of-the-art results by training on a synthetically generated dataset combined with other manually annotated datasets.

In our recent paper, we present a different approach to training pitch estimation models in the absence of annotated data. Inspired by the observation that for humans, including professional musicians, it is typically much easier to estimate relative pitch (the frequency interval between two notes) than absolute pitch (the true fundamental frequency), we designed SPICE (Self-supervised PItCh Estimation) to solve a similar task. This approach relies on self-supervision by defining an auxiliary task (also known as a pretext task) that can be learned in a completely unsupervised way.

Constant-Q transform of an audio clip, superimposed on a representation of pitch as estimated by SPICE (video).

The SPICE model consists of a convolutional encoder, which produces a single scalar embedding that maps linearly to pitch. To accomplish this, we feed two signals to the encoder, a reference signal along with a signal that is pitch shifted from the reference by a random, known amount. Then, we devise a loss function that forces the difference between the scalar embeddings to be proportional to the known difference in pitch. For convenience, we perform pitch shifting in the domain defined by the constant-Q transform (CQT), because this corresponds to a simple translation along the log-spaced frequency axis.

Pitch is well defined only when the underlying signal is harmonic, i.e., when it contains components with integer multiples of the fundamental frequency. So, an important function of the model is to determine when the output is meaningful and reliable. For example, in the figure below, there is no harmonic signal in the interval between 1.2s and 2s resulting in low enough confidence in the pitch estimation that no pitch estimate is generated. SPICE is designed to learn the level of confidence of the pitch estimation in a self-supervised fashion, instead of relying on handcrafted solutions.

SPICE model architecture (simplified). Two pitch-shifted versions of the same CQT frame are fed to two encoders with shared weights. The loss is designed to make the difference between the outputs of the encoders proportional to the relative pitch difference. In addition (not shown), a reconstruction loss is added to regularize the model. The model also learns to produce the confidence of the pitch estimation.

We evaluate our model against publicly available datasets and show that we outperform handcrafted baselines while matching the level of accuracy attained by CREPE, despite having no access to ground truth labels. In addition, by properly augmenting our data during training, SPICE is also able to operate in noisy conditions, e.g., to extract pitch from the singing voice when this is mixed in with background music. The chart below shows a comparison between SWIPE (a hand-crafted signal-processing method), CREPE (a fully supervised model) and SPICE (a self-supervised model) on the MIR-1k dataset.

Evaluation on the MIR-1k dataset, mixing in background music at different signal-to-noise ratios.

The SPICE model has been deployed in FreddieMeter, a web app in which singers can score their performance against Freddie Mercury.

Acknowledgments

The work described here was authored by Beat Gfeller, Christian Frank, Dominik Roblek, Matt Sharifi, Marco Tagliasacchi and Mihajlo Velimirović. We are grateful for all discussions and feedback on this work that we received from our colleagues at Google. The SingingVoices dataset used for training the models in this work has been collected by Alexandra Gherghina as part of FreddieMeter, which is using SPICE and a vocal timbre similarity model to understand how closely a singer matches Freddie Mercury.

Developing a business strategy by combining machine learning with sensitivity analysis

Machine learning (ML) is routinely used by countless businesses to assist with decision making. In most cases, however, the predictions and business decisions made by ML systems still require the intuition of human users to make judgment calls.

In this post, I show how to combine ML with sensitivity analysis to develop a data-driven business strategy. This post focuses on customer churn (that is, the defection of customers to competitors), while covering problems that often arise when using ML-based analysis. These problems include difficulties with handling incomplete and unbalanced data, deriving strategic options, and quantitatively evaluating the potential impact of those options.

Specifically, I use ML to identify customers who are likely to churn and then use feature importance combined with scenario analysis to derive quantitative and qualitative recommendations. The results can then be used by an organization to make proper strategic and tactical decisions to reduce future churn. This use case illustrates several common issues that arise in the practice of data science, such as:

  • A low signal-to-noise ratio and a lack of clear correlation between features and churn rates
  • Highly imbalanced datasets (wherein 90% of customers in the dataset do not churn)
  • Using probabilistic prediction and adjustment to identify a decision-making mechanism that minimizes the risk of over-investing in churn issues

End-to-end implementation code is available in Amazon SageMaker and as a standalone on Amazon EC2.

In this use case, I consider a fictional company that provides different types of products. I will  call its two key offerings products A and B. I only have partial information about the company’s products and customers. The company has recently seen an increase in customer defection to competitors, also known as churn. The dataset contains information on the diverse attributes of thousands of customers, collected and sorted over several months. Some of these customers have churned, and some have not. Using the list of specific customers, I will predict the probability that any one individual will churn. During this process, I attempt to answer several questions: Can we create a reliable predictive model of customer churn? What variables might explain a customer’s likelihood of churning? What strategies can the company implement to decrease churn?

This post will address the following steps for using ML models to create churn reduction strategies:

Exploring data and engineering new features

I first cover how to explore customer data by looking at simple correlations and associations between individual input features and the churn label. I also examine the associations (called cross-correlations, or covariances) between the features themselves. This allows me to make algorithmic decisions—notably, deciding which features to derive, change, or delete.

Developing an ensemble of ML models

Then, I build several ML algorithms, including automatic feature selection, and combine multiple models to improve performance.

Evaluating and refining ML model performance

In the third section, I test the performance of the different models I have developed. From there, I identify a decision-making mechanism that minimizes the risk of overestimating the number of customers who will churn.

Applying ML models to business strategy design

Finally, in a fourth section, I use the ML results to understand the factors that impact customer churn, derive strategic options, and quantitatively evaluate the impact of those options on churn rates. I do so by performing a sensitivity analysis, where I modify some factors that can be controlled in real life (such as the discount rate) and predict the corresponding reduction in churn expected for different values of this control factor. All predictions will be carried out with the optimal ML model identified in section 3.

Exploring data and engineering new features

Critical issues that often present problems during ML model development include the presence of collinear and low-variance features in the input data, the presence of outliers, and missing data (missing features and missing values for some features). This section describes how to handle each of these issues in Python 3.4 using Amazon SageMaker. (I also evaluated the standalone code on an Amazon EC2 instance with a Deep Learning AMI. Both are available.)

This kind of timestamped data can contain important patterns within certain metrics. I aggregated these metrics into daily, weekly, and monthly segments, which allowed me to develop new features to account for the metrics’ dynamic nature. (See the accompanying notebook for details.)

I then look at simple one-to-one (a.k.a. marginal) correlation and association measures between each individual feature, both original and new. I also look at the correlations between the features and the churn label. (See the following diagrams).

Low-variance features, features that do not change significantly when the churn label changes, can be handled by using marginal correlation and Hamming/Jaccard distances, as depicted in the following table. Hamming/Jaccard distances are measures of similarity designed specifically for binary outcomes. These measures provide perspective on the degree to which each feature might be indicative of churn.

It’s good practice to remove low-variance features as they tend not to change significantly no matter what you’re trying to predict. Consequently, their presence is unlikely to help your analysis and can actually make the learning process less efficient.

The following table shows the top correlations and binary dissimilarities between features and churn. Only the top features are shown out of 48 original and derived features. The Filtered column contains the results that I obtained when I filtered the data for outliers and missing values.

     Pearson correlations with churn
Feature Original Filtered
Margins 0.06 0.1
Forecasted product A 0.03 0.04
Prices 0.03 0.04
$value of discount 0.01 0.01
Current subscription 0.01 0.03
Forecasted product B 0.01 0.01
Number of products -0.02 -0.02
Customer loyalty -0.07 -0.07
     Binary dissimilarities with churn
Feature Hamming Jaccard
Sales channel 1 0.15 0.97
Sales channel 2 0.21 0.96
Sales channel 3 0.45 0.89

The key takeaways from the preceding table are that three sales channels seem inversely correlated to churn and that most marginal correlations with churn are very small (≤ 0.1). Applying filters for outliers and missing values leads to marginal correlations with improved statistical significance. The right column of the preceding table depicts this effect.

The issue of collinear features can be addressed by computing the covariance matrix between all features, as shown in the following diagram. This matrix provides new perspective on the amount of redundancy some features might have. It’s a good practice to remove redundant features because they create biases and demand more computation, again making the learning process less efficient.

The left graph in the preceding diagram indicates that some features, such as prices and some forecasted metrics, are collinear, with ρ > 0.95. I kept only one of each when I designed the ML models that I describe in the next section, which left me with about 40 features, as the right graph in the preceding diagram shows.

The issues of missing and outlier data are often handled by instituting empirical rules, such as deleting observations (customers) when some of their recorded data values are missing, or when they exceed three times the standard deviation across the sample.

Because missing data is a frequent concern, you can impute a missing value with the mean or median across the sample or population as an alternative to deleting observations. That’s what I did here: I replaced missing values with the median for each feature—except for features where more than 40% of data was missing, in which case I deleted the entire feature. The reader should note that a more advanced, best practice approach to imputing missing data is to train a supervised learning model to impute based on other features, but this can require a very large amount of effort so I do not cover it here. When I encountered outliers in the data, I deleted the customers with values beyond six standard deviations from the mean. In total, I deleted 140 out of 16096 (< 1%) observations.

Developing an ensemble of ML models

In this section, I develop and combine multiple ML models to harness the power of multiple ML algorithms. Ensemble modeling also makes it possible to use information from the entire dataset, even though the distribution of the churn label is highly unbalanced, as shown in the following flowchart.

Predicted probability

p0 = (p1 + p2 + p3) / 3

As it’s good practice to remove low-variance features, I further restricted the feature space to the most important features by applying a quick and simple variance filter. This filter removes features that display no variance for more than 95% of customers. To filter features based on their combined effects on customer churn, as opposed to their marginal effects, I carried out an ML-based feature selection using a grid search with stepwise regression. See details in the next section.

Before implementing the ML models, I randomly split the data into two groups, holding out a 30% test set. As discussed in the next section, I also used a 10-fold cross-validation on top of the 70%/30% split. K-folding is an iterative cycle that averages the performance over K evaluations, each testing on a separate K% holdout set of the data.

Three ML algorithms—logistic regression, support vector machine, and random forest—were trained separately, then combined in an ensemble, as depicted in the preceding flowchart. The ensemble approach is referred to as soft-voting in the literature because it takes the average probability of the different models and uses it to classify customer churn (also visible in the preceding flowchart).

Customers with churn represent only 10% of the data; therefore, the dataset is unbalanced. I tested two approaches to deal with class imbalance.

  • In the first, simplest approach, the training is based on a random sampling of the abundant class (customers who didn’t churn) to match the size of the rare class (customers who did).
  • In the second approach (shown in the following chart), I based the training on an ensemble of nine models using nine random samples of the abundant class (without replacement) and a full sample of the rare class for each model. I chose a 9-fold because the class imbalance is approximately 1-to-9 (as shown in the histogram in the following diagram). Therefore, 1-9 is the amount of sampling required to use all or nearly all of the data in the abundant class. This approach is more complex, but it uses all available information, improving generalization. I evaluate its effectiveness in the following section.

For both approaches, the performance is evaluated on a test set wherein class imbalance is maintained to account for real-world circumstances.

Evaluating and refining ML model performance

In this section, I test the performance of the different models I developed in the previous section. I then identify a decision-making mechanism that minimizes the risk of overestimating the number of customers who might churn (called the false positive rate).

The so-called receiver-operator characteristic (ROC) curve is often used in ML performance evaluation to complement contingency tables. The ROC curve provides an invariant measure of accuracy when changing the probability threshold to infer positive and negative classes (in this project, churn and no churn, respectively). It involves plotting all accurate positive predictions (true positives) against false positives, also known as fall out. See the following table.

The probabilities predicted by the different ML models are by default calibrated so that values where p > 0.5 correspond to one class and values where p < 0.5 correspond to the other class. This threshold is a hyperparameter that can be fine-tuned to minimize misclassified instances of one class. This is at the expense of increasing misclassification in the other, which can affect the accuracy and precision of different performance measures. In contrast, the area under the ROC curve is an invariant measure of performance—it remains the same for any threshold.

The following table depicts the performance of different ML models with a random sampling of the rare class (baseline) and the 9-fold ensemble of learners. You can see that the random forest has the best performance, and further that the 9-fold ensemble is better at generalizing, with an ROC AUC score of 0.68. This model is the best performer.

     Performance measures
Algorithm Accuracy Brier ROC
Logit 56% 0.24 0.64
Stepwise 56% 0.24 0.64
SVM 57% 0.24 0.63
RF 65% 0.24 0.67
Ensemble 61% 0.23 0.66
9-Logit Ensemble 55% 0.26 0.64
9-SVM Ensemble 61% 0.25 0.63
9-RF Ensemble 70% 0.24 0.68
9-Ensemble Ensemble 61% 0.25 0.65

The following chart depicts the performance of the overall best learner (the 9-fold ensemble of random forest learners) and the optimization for precision and fall out. When using a probability threshold of 0.5, the best performer can predict 69% of the customers who might churn though with significant fall out of 42%.

Looking at the ROC curve, you can see that the same model can predict 30% of customers who will churn, with fall out minimized at 10%. Using a grid search, I found that the threshold is p = 0.56. If you want to minimize the risk of overestimating the number of customers who will churn (for example, because the attempts we make to keep those customers could be expensive), this is the model you might want to use.

Applying ML models to business strategy design

In this section, I use the ML models that I have developed to better understand the factors that impact customer churn, to derive strategic options for decreasing churn, and to evaluate the quantitative impact that deploying those options might have on churn rates.

I used a stepwise logistic regression to assess the importance of features while taking into account their combined effect on churn. As shown in the following graph, the regression identifies 12 key features. The prediction score is highest when I include these 12 features in the regression model.

Among these 12 factors, the net margin, the forecasted purchase of products A and B, and the index that indicates multiple-product customers are the features that have the greatest tendency to induce churn. The factors that tended to reduce churn included three sales channels, one marketing campaign, the value of the discount, overall subscriptions, the loyalty of the customer, and the overall number of products purchased.

Therefore, providing a discount to customers with the highest propensity to churn seems to be a simple and effective strategy. Other strategic levers have also been identified, including boosting synergy between products other than A and B, sales channels 1–3, the marketing campaign, and long-term contracts. According to the data, pulling these levers is likely to decrease customer churn.

Finally, I used a sensitivity analysis: I applied a discount of up to 40% to customers that the ML model identified as likely to churn, then re-ran the model to evaluate how many customers were still predicted to churn after incorporating the discount.

When I set the model at a p threshold of 0.6 to minimize fall out to 10%, my analysis predicts that a 20% discount reduces churn by 25%. Given that the true positive rate at this threshold is about 30%, this analysis indicates that a 20% discount approach could eliminate at least 8% of churn. See the following graph for details. The discount strategy is a simple first step that an organization experiencing customer churn might consider taking to mitigate the issue.

Conclusion

In this post, I demonstrated how to do the following:

  • Explore data and derive new features in order to minimize issues stemming from missing data and a low signal-to-noise ratio.
  • Design an ensemble of ML models to handle strongly unbalanced datasets.
  • Select the best-performing models and refine the decision threshold to maximize precision and minimize fall out.
  • Use the results to derive strategic options and quantitatively assess their impact on churn rates.

In this particular use case, I developed a model that can identify 30% of customers who are likely to churn while limiting fall out to 10%. This study supports the efficacy of deploying a short-term tactic of offering discounts and instituting a long-term strategy based on building synergy between services and sales channels to retain more customers.

If you would like to run the code that produce the data and insights described in this blog post, just download the notebook and associated data file, then run each each cell one at a time.


About the author

Jeremy David Curuksu is a data scientist and consultant in AI-ML at the Amazon Machine Learning Solutions Lab (AWS). He holds a MSc and a PhD in applied mathematics, and was a research scientist at EPFL (Switzerland) and MIT (US). He is the author of multiple scientific peer-reviewed articles and the book Data Driven which introduces management consulting in the new age of data science.

Introducing the Next Generation of On-Device Vision Models: MobileNetV3 and MobileNetEdgeTPU

On-device machine learning (ML) is an essential component in enabling privacy-preserving, always-available and responsive intelligence. This need to bring on-device machine learning to compute and power-limited devices has spurred the development of algorithmically-efficient neural network models and hardware capable of performing billions of math operations per second, while consuming only a few milliwatts of power. The recently launched Google Pixel 4 exemplifies this trend, and ships with the Pixel Neural Core that contains an instantiation of the Edge TPU architecture, Google’s machine learning accelerator for edge computing devices, and powers Pixel 4 experiences such as face unlock, a faster Google Assistant and unique camera features. Similarly, algorithms, such as MobileNets, have been critical for the success of on-device ML by providing compact and efficient neural network models for mobile vision applications.

Today we are pleased to announce the release of source code and checkpoints for MobileNetV3 and the Pixel 4 Edge TPU-optimized counterpart MobileNetEdgeTPU model. These models are the culmination of the latest advances in hardware-aware AutoML techniques as well as several advances in architecture design. On mobile CPUs, MobileNetV3 is twice as fast as MobileNetV2 with equivalent accuracy, and advances the state-of-the-art for mobile computer vision networks. On the Pixel 4 Edge TPU hardware accelerator, the MobileNetEdgeTPU model pushes the boundary further by improving model accuracy while simultaneously reducing the runtime and power consumption.

Building MobileNetV3
In contrast with the hand-designed previous version of MobileNet, MobileNetV3 relies on AutoML to find the best possible architecture in a search space friendly to mobile computer vision tasks. To most effectively exploit the search space we deploy two techniques in sequence — MnasNet and NetAdapt. First, we search for a coarse architecture using MnasNet, which uses reinforcement learning to select the optimal configuration from a discrete set of choices. Then we fine-tune the architecture using NetAdapt, a complementary technique that trims under-utilized activation channels in small decrements. To provide the best possible performance under different conditions we have produced both large and small models.

Comparison of accuracy vs. latency for mobile models on the ImageNet classification task using the Google Pixel 4 CPU.

MobileNetV3 Search Space
The MobileNetV3 search space builds on multiple recent advances in architecture design that we adapt for the mobile environment. First, we introduce a new activation function called hard-swish (h-swish) which is based on the Swish nonlinearity function. The critical drawback of the Swish function is that it is very inefficient to compute on mobile hardware. So, instead we use an approximation that can be efficiently expressed as a product of two piecewise linear functions.

Next we introduce the mobile-friendly squeeze-and-excitation block, which replaces the classical sigmoid function with a piecewise linear approximation.

Combining h-swish plus mobile-friendly squeeze-and-excitation with a modified version of the inverted bottleneck structure introduced in MobileNetV2 yielded a new building block for MobileNetV3.

MobileNetV3 extends the MobileNetV2 inverted bottleneck structure by adding h-swish and mobile friendly squeeze-and-excitation as searchable options.

These parameters defined the search space used in constructing MobileNetV3:

  • Size of expansion layer
  • Degree of squeeze-excite compression
  • Choice of activation function: h-swish or ReLU
  • Number of layers for each resolution block

We also introduced a new efficient last stage at the end of the network that further reduced latency by 15%.

MobileNetV3 Object Detection and Semantic Segmentation
In addition to classification models, we also introduced MobileNetV3 object detection models, which reduced detection latency by 25% relative to MobileNetV2 at the same accuracy for the COCO dataset.

In order to optimize MobileNetV3 for efficient semantic segmentation, we introduced a low latency segmentation decoder called Lite Reduced Atrous Spatial Pyramid Pooling (LR-SPP). This new decoder contains three branches, one for low resolution semantic features, one for higher resolution details, and one for light-weight attention. The combination of LR-SPP and MobileNetV3 reduces the latency by over 35% on the high resolution Cityscapes Dataset.

MobileNet for Edge TPUs
The Edge TPU in Pixel 4 is similar in architecture to the Edge TPU in the Coral line of products, but customized to meet the requirements of key camera features in Pixel 4. The accelerator-aware AutoML approach substantially reduces the manual process involved in designing and optimizing neural networks for hardware accelerators. Crafting the neural architecture search space is an important part of this approach and centers around the inclusion of neural network operations that are known to improve hardware utilization. While operations such as squeeze-and-excite and swish non-linearity have been shown to be essential in building compact and fast CPU models, these operations tend to perform suboptimally on Edge TPU and hence are excluded from the search space. The minimalistic variants of MobileNetV3 also forgo the use of these operations (i.e., squeeze-and-excite, swish, and 5×5 convolutions) to allow easier portability to a variety of other hardware accelerators such as DSPs and GPUs.

The neural network architecture search, incentivized to jointly optimize the model accuracy and Edge TPU latency, produces the MobileNetEdgeTPU model that achieves lower latency for a fixed accuracy (or higher accuracy for a fixed latency) than existing mobile models such as MobileNetV2 and minimalistic MobileNetV3. Compared with the EfficientNet-EdgeTPU model (optimized for the Edge TPU in Coral), these models are designed to run at a much lower latency on Pixel 4, albeit at the cost of some loss in accuracy.

Although reducing the model’s power consumption was not a part of the search objective, the lower latency of the MobileNetEdgeTPU models also helps reduce the average Edge TPU power use. The MobileNetEdgeTPU model consumes less than 50% the power of the minimalistic MobileNetV3 model at comparable accuracy.

Left: Comparison of the accuracy on the ImageNet classification task between MobileNetEdgeTPU and other image classification networks designed for mobile when running on Pixel4 Edge TPU. MobileNetEdgeTPU achieves higher accuracy and lower latency compared with other models. Right: Average Edge TPU power in Watts for different classification models running at 30 frames per second (fps).

Objection Detection Using MobileNetEdgeTPU
The MobileNetEdgeTPU classification model also serves as an effective feature extractor for object detection tasks. Compared with MobileNetV2 based detection models, MobileNetEdgeTPU models offer a significant improvement in model quality (measured as the mean average precision; mAP) on the COCO14 minival dataset at comparable runtimes on the Edge TPU. The MobileNetEdgeTPU detection model has a latency of 6.6ms and achieves mAP score of 24.3, while MobileNetV2-based detection models achieve an mAP of 22 and takes 6.8ms per inference.

The Need for Hardware-Aware Models
While the results shown above highlight the power, performance, and quality benefits of MobileNetEdgeTPU models, it is important to note that the improvements arise due to the fact that these models have been customized to run on the Edge TPU accelerator.
MobileNetEdgeTPU when running on a mobile CPU delivers inferior performance compared with the models that have been tuned specifically for mobile CPUs (MobileNetV3). MobileNetEdgeTPU models perform a much greater number of operations, and so, it is not surprising that they run slower on mobile CPUs, which exhibit a more linear relationship between a model’s compute requirements and the runtime.

MobileNetV3 is still the best performing network when using mobile CPU as the deployment target.

For Researchers and Developers
The MobileNetV3 and MobileNetEdgeTPU code, as well as both floating point and quantized checkpoints for ImageNet classification, are available at the MobileNet github page. Open source implementation for MobileNetV3 and MobileNetEdgeTPU object detection is available in the Tensorflow Object Detection API. Open source implementation for MobileNetV3 semantic segmentation is available in TensorFlow through DeepLab.

Acknowledgements:
This work is made possible through a collaboration spanning several teams across Google. We’d like to acknowledge contributions from Berkin Akin, Okan Arikan, Gabriel Bender, Bo Chen, Liang-Chieh Chen, Grace Chu, Eddy Hsu, John Joseph, Pieter-jan Kindermans, Quoc Le, Owen Lin, Hanxiao Liu, Yun Long, Ravi Narayanaswami, Ruoming Pang, Mark Sandler, Mingxing Tan, Vijay Vasudevan, Weijun Wang, Dong Hyuk Woo, Dmitry Kalenichenko, Yunyang Xiong, Yukun Zhu and support from Hartwig Adam, Blaise Agüera y Arcas, Chidu Krishnan and Steve Molloy.

At RSNA, Healthcare Startups Shine Spotlight on AI for Radiology

Radiology leaders have gathered for over 100 years running at RSNA, the annual meeting of the Radiological Society of North America, to discuss the industry’s latest challenges and opportunities. In recent years, AI in medical imaging has become a key focus — with startups at the center of the conversation.

Startups around the world are building AI solutions for a universal problem in medical imaging: limited time. Faced with rising numbers of patients being imaged, as well as the growing size of MRI and CT scans, radiologists must interpret one image every three or four seconds to keep up with the workload.

Agile startups are well-suited to tackle the demands of a rapidly evolving field like deep learning. In medical imaging, many are using AI to develop applications that target areas that slow radiologists down.

Healthcare startups raised more than $26 billion in venture capital funding last year and are partnering with major research institutions, hospitals and medical instrument manufacturers. They’re also receiving regulatory validation for clinical use: over three dozen healthcare AI startups have FDA clearance for algorithms that detect conditions including cancer, stroke and brain hemorrhages from medical scans.

At RSNA 2019, taking place in Chicago, Dec. 1-6, more than 50 attending startups are part of the NVIDIA Inception virtual accelerator program, which provides AI training and tools to fuel the growth of thousands of companies building GPU-powered applications, including over 700 healthcare startups.

Scan the Show for NVIDIA Inception Startups

Accelerated by NVIDIA GPUs, AI can speed up the acquisition, annotation and analysis of medical images to more quickly spot critical cases. It can also give experts quantitative insights that are too time-consuming to acquire using traditional methods.

Dozens of Inception companies will share their medical imaging applications for every phase of the radiology workflow at the RSNA AI Theater and the NVIDIA booth, including:

  • Higher-quality scans: Subtle Medical has developed the first and only AI software solutions FDA-cleared for medical imaging enhancement — SubtlePET for faster PET exams and SubtleMR for higher-quality MRI exams. Its software smoothly integrates with any scanner to enhance images during acquisition without altering the existing workflow, increasing efficiency and patient comfort. The company uses the NVIDIA DGX Station and NVIDIA DGX-1 to accelerate training, and NVIDIA T4 GPUs for inference.
  • Enabling AI-assisted annotation: TrainingData.io’s web platform helps researchers and companies manage their data labeling workflows, running on NVIDIA T4 GPUs for inference in Google Cloud. The startup leverages AI-assisted segmentation tools through the NVIDIA Clara Train SDK to label medical images that in turn train deep learning models for radiologists. And Palo Alto-based Fovia Ai, Inc. provides its customers with AI-assisted annotation powered by the NVIDIA Clara SDK in its tools for 2D and 3D visualization of medical images, which can seamlessly integrate into the clinical workflow.
  • Analyzing medical images: Tokyo startup LPIXEL develops deep learning image analysis tools using NVIDIA GPUs, including one to identify brain aneurysms from MRA, recently approved for clinical use in Japan. For lung tumor detection, China-based InferVISION’s AI tools identify and label lung nodules from CT scans in under 30 seconds. The company uses NVIDIA T4 GPUs for inference, achieving speedups of 4x over CPUs.
  • Processing surgical video: Doctors performing minimally invasive surgeries rely on live video feeds from tiny cameras to view the area they’re operating on. Kaliber Labs is building deep learning models that interpret these video feeds in real time for orthopedic surgery, identifying and measuring aspects of the patient’s anatomy and pathology, and providing intraoperative guidance to surgeons. The startup is using NVIDIA RTX GPUs for training and the NVIDIA Jetson AGX Xavier AI computing module for inference at the edge.

Rounding Out RSNA

In NVIDIA booth 10939 and beyond, we’ll be exhibiting the latest AI tools for medical imaging, from training to deployment.

We’ll also showcase demos of the NVIDIA Clara medical imaging platform, which combines NVIDIA GPU hardware and the NVIDIA Clara software development kit to accelerate the training and inference of deep learning applications for healthcare. The platform includes APIs for AI-assisted annotation of medical images, a transfer learning toolkit, a medical model development environment and tools for AI deployment at scale.

A Clara developer meetup will be held on Tuesday, Dec. 3 at 11:30 a.m. CT.

The following RSNA panels feature NVIDIA speakers:

For more information, check out the full RSNA agenda.

The post At RSNA, Healthcare Startups Shine Spotlight on AI for Radiology appeared first on The Official NVIDIA Blog.

New Insights into Human Mobility with Privacy Preserving Aggregation

Understanding human mobility is crucial for predicting epidemics, urban and transit infrastructure planning, understanding people’s responses to conflict and natural disasters and other important domains. Formerly, the state-of-the-art in mobility data was based on cell carrier logs or location “check-ins”, and was therefore available only in limited areas — where the telecom provider is operating. As a result, cross-border movement and long-distance travel were typically not captured, because users tend not to use their SIM card outside the country covered by their subscription plan and datasets are often bound to specific regions. Additionally, such measures involved considerable time lags and were available only within limited time ranges and geographical areas.

In contrast, de-identified aggregate flows of populations around the world can now be computed from phones’ location sensors at a uniform spatial resolution. This metric has the potential to be extremely useful for urban planning since it can be measured in a direct and timely way. The use of de-identified and aggregated population flow data collected at a global level via smartphones could shed additional light on city organization, for example, while requiring significantly fewer resources than existing methods.

In “Hierarchical Organization of Urban Mobility and Its Connection with City Livability”, we show that these mobility patterns — statistics on how populations move about in aggregate — indicate a higher use of public transportation, improved walkability, lower pollutant emissions per capita, and better health indicators, including easier accessibility to hospitals. This work, which appears in Nature Communications, contributes to a better characterization of city organization and supports a stronger quantitative perspective in the efforts to improve urban livability and sustainability.

Visualization of privacy-first computation of the mobility map. Individual data points are automatically aggregated together with differential privacy noise added. Then, flows of these aggregate and obfuscated populations are studied.

Computing a Global Mobility Map While Preserving User Privacy
In line with our AI principles, we have designed a method for analyzing population mobility with privacy-preserving techniques at its core. To ensure that no individual user’s journey can be identified, we create representative models of aggregate data by employing a technique called differential privacy, together with k-anonymity, to aggregate population flows over time. Initially implemented in 2014, this approach to differential privacy intentionally adds random “noise” to the data in a way that maintains both users’ privacy and the data’s accuracy at an aggregate level. We use this method to aggregate data collected from smartphones of users who have deliberately chosen to opt-in to Location History, in order to better understand global patterns of population movements.

The model only considers de-identified location readings aggregated to geographical areas of predetermined sizes (e.g., S2 cells). It “snaps” each reading into a spacetime bucket by discretizing time into longer intervals (e.g., weeks) and latitude/longitude into a unique identifier of the geographical area. Aggregating into these large spacetime buckets goes beyond protecting individual privacy — it can even protect the privacy of communities.

Finally, for each pair of geographical areas, the system computes the relative flow between the areas over a given time interval, applies differential privacy filters, and outputs the global, anonymized, and aggregated mobility map. The dataset is generated only once and only mobility flows involving a sufficiently large number of accounts are processed by the model. This design is limited to heavily aggregated flows of populations, such as that already used as a vital source of information for estimates of live traffic and parking availability, which protects individual data from being manually identified. The resulting map is indexed for efficient lookup and used to fuel the modeling described below.

Mobility Map Applications
Aggregate mobility of people in cities around the globe defines the city and, in turn, its impact on the people who live there. We define a metric, the flow hierarchy (Φ), derived entirely from the mobility map, that quantifies the hierarchical organization of cities. While hierarchies across cities have been extensively studied since Christaller’s work in the 1930s, for individual cities, the focus has been primarily on the differences between core and peripheral structures, as well as whether cities are mono- or poly-centric. Our results instead show that the reality is much more rich than previously thought. The mobility map enables a quantitative demonstration that cities lie across a spectrum of hierarchical organization that strongly correlates with a series of important quality of life indicators, including health and transportation.

Below we see an example of two cities — Paris and Los Angeles. Though they have almost the same population size, those two populations move in very different ways. Paris is mono-centric, with an “onion” structure that has a distinct high-mobility city center (red), which progressively decreases as we move away from the center (in order: orange, yellow, green, blue). On the other hand, Los Angeles is truly poly-centric, with a large number of high-mobility areas scattered throughout the region.

Mobility maps of Paris (left) and Los Angeles (right). Both cities have similar population sizes, but very different mobility patterns. Paris has an “onion” structure exhibiting a distinct center with a high degree of mobility (red) that progressively decreases as we move away from the center (in order: orange, yellow, green, blue). In contrast, Los Angeles has a large number of high-mobility areas scattered throughout the region.

More hierarchical cities — in terms of flows being primarily between hotspots of similar activity levels — have values of flow hierarchy Φ closer to the upper limit of 1 and tend to have greater levels of uniformity in their spatial distribution of movements, wider use of public transportation, higher levels of walkability, lower pollution emissions, and better indicators of various measures of health. Returning to our example, the flow hierarchy of Paris is Φ=0.93 (in the top quartile across all 174 cities sampled), while that of Los Angeles is 0.86 (bottom quartile).

We find that existing measures of urban structure, such as population density and sprawl composite indices, correlate with flow hierarchy, but in addition the flow hierarchy conveys comparatively more information that includes behavioral and socioeconomic factors.

Connecting flow hierarchy Φ with urban indicators in a sample of US cities. Proportion of trips as a function of Φ, broken down by model share: private car, public transportation, and walking. Sample city names that appear in the plot: ATL (Atlanta), CHA (Charlotte), CHI (Chicago), HOU (Houston), LA (Los Angeles), MIN (Minneapolis), NY (New York City), and SF (San Francisco). We see that cities with higher flow hierarchy exhibit significantly higher rates of public transportation use, less car use, and more walkability.

Measures of urban sprawl require composite indices built up from much more detailed information on land use, population, density of jobs, and street geography among others (sometimes up to 20 different variables). In addition to the extensive data requirements, such metrics are also costly to obtain. For example, censuses and surveys require a massive deployment of resources in terms of interviews, and are only standardized at a country level, hindering the correct quantification of sprawl indices at a global scale. On the other hand, the flow hierarchy, being constructed from mobility information alone, is significantly less expensive to compile (involving only computer processing cycles), and is available in real-time.

Given the ongoing debate on the optimal structure of cities, the flow hierarchy, introduces a different conceptual perspective compared to existing measures, and can shed new light on the organization of cities. From a public-policy point of view, we see that cities with greater degree of mobility hierarchy tend to have more desirable urban indicators. Given that this hierarchy is a measure of proximity and direct connectivity between socioeconomic hubs, a possible direction could be to shape opportunity and demand in a way that facilitates a greater degree of hub-to-hub movement than a hub-to-spoke architecture. The proximity of hubs can be generated through appropriate land use, that can be shaped by data-driven zoning laws in terms of business, residence or service areas. The presence of efficient public transportation and lower use of cars is another important factor. Perhaps a combination of policies, such as congestion-pricing, used to disincentivize private transportation to socioeconomic hubs, along with building public transportation in a targeted fashion to directly connect the hubs, may well prove useful.

Next Steps
This work is part of our larger AI for Social Good efforts, a program that focuses Google’s expertise on addressing humanitarian and environmental challenges.These mobility maps are only the first step toward making an impact in epidemiology, infrastructure planning, and disaster response, while ensuring high privacy standards.

The work discussed here goes to great lengths to ensure privacy is maintained. We are also working on newer techniques, such as on-device federated learning, to go a step further and enable computing aggregate flows without personal data leaving the device at all. By using distributed secure aggregation protocols or randomized responses, global flows can be computed without even the aggregator having knowledge of individual data points being aggregated. This technique has also been applied to help secure Chrome from malicious attacks.

Acknowledgements
This work resulted from a collaboration of Aleix Bassolas and José J. Ramasco from the Institute for Cross-Disciplinary Physics and Complex Systems (IFISC, CSIC-UIB), Brian Dickinson, Hugo Barbosa-Filho, Gourab Ghoshal, Surendra A. Hazarie, and Henry Kautz from the Computer Science Department and Ghoshal Lab at the University of Rochester, Riccardo Gallotti from the Bruno Kessler Foundation, and Xerxes Dotiwalla, Paul Eastham, Bryant Gipson, Onur Kucuktunc, Allison Lieber, Adam Sadilek at Google.

The differential privacy library used in this work is open source and available on our GitHub repo.

First Time’s the Charm: Sydney Startup Uses AI to Improve IVF Success Rate

In vitro fertilization, a common treatment for infertility, is a lengthy undertaking for prospective parents, involving ultrasounds, blood tests and injections of fertility medications. If the process doesn’t end up in a successful pregnancy — which is often the case — it can be a major emotional and financial blow.

Sydney-based healthcare startup Harrison.ai is using deep learning to improve the odds of success for thousands of IVF patients. Its AI model, IVY, is used by Virtus Health, a global provider of assisted reproductive services, to help doctors evaluate which embryo candidate has the best chance of implantation into the patient.

Founded by brothers Aengus and Dimitry Tran in 2017, Harrison.ai builds customized predictive algorithms that integrate into existing clinical workflows to inform critical healthcare decisions and improve patient outcomes.

Ten or more eggs can be harvested from a patient during a single cycle of IVF. The embryos are incubated in the lab for five days before the most promising candidate (or candidates) are implanted into the patient’s uterus. Yet, the success rate of implantation for five-day embryos is under 50 percent, and closer to 25 percent for women over the age of 40, according to the U.S. Centers for Disease Control and Prevention.

“In the past, people used to have to implant three or four embryos and hope one works,” said Aengus Tran, cofounder and medical AI director of Harrison.ai, a member of the NVIDIA Inception virtual accelerator program, which offers go-to-market support, expertise, and technology for AI startups revolutionizing industries. “But sometimes that works a little too well and patients end up with twins or triplets. It sounds cute, but it can be a dangerous pregnancy.”

Built using NVIDIA V100 Tensor Core GPUs on premises and in the cloud, IVY processes time-lapse video of fertilized eggs developing in the lab, predicting which are most likely to result in a positive outcome.

The goal: a single embryo transfer that leads to a single successful pregnancy.

Going Frame by Frame

Embryologists manually analyze time-lapse videos of embryo growth to pick the highest-quality candidates. It’s a subjective process, with no universal grading system and low agreement between experts. And with five days of footage for every embryo, it’s nearly impossible for doctors to look at every frame.

Harrison.ai’s IVY deep learning model analyzes the full five-day video feed from an embryoscope, helping it surpass the performance of AI tools that provide insights based on still images.

“Most of the visual AI tools we see these days are image recognition,” said Aengus. “But with an early multi-cell embryo, the development process matters a lot more than how it looks at the end of five days. The critical event could have happened days before, and the damage already done.”

The company trained its deep learning models on a dataset from Virtus Health including more than 10,000 human embryos from eight IVF labs across four countries. Instead of annotating each video with detailed morphological features of the embryos, the team classified each embryo with a single label: positive or negative outcome. A positive outcome meant that a patient’s six-week ultrasound showed a fetus with a heartbeat — a key predictor of successful live births.

In a recent study, IVY was able to predict which embryos would develop a heartbeat with 93 percent accuracy. Aengus and Dimitry say the tool could help standardize embryo selection by reducing disagreement among human readers.

To keep up with Harrison.ai’s growing training datasets, the team upgraded their GPU clusters from four GeForce cards to the NVIDIA DGX Station, the world’s fastest workstation for deep learning. Training on the Tensor Core GPUs allowed them to leverage mixed-precision computing, shrinking their training time by 4x.

“It’s almost unreal to have that much power at your fingertips,” Aengus said. Using the DGX Station, Harrison.ai was able to boost productivity and improve their deep learning models by training with bigger datasets.

The company uses the deskside DGX Station for experimentation, research and development. For training their biggest datasets, they scale up to larger clusters of NVIDIA V100 GPUs in Amazon EC2 P3 cloud instances — relying on NGC containers to seamlessly shift their workflows from on-premises systems to the cloud.

IVY has been used in thousands of cases in Virtus Health clinics so far. Harrison.ai is also collaborating with Vitrolife, a major embryoscope manufacturer, to more smoothly integrate its neural networks into the clinical workflow.

While Harrison.ai’s first project is for IVF, the company is also developing tools for other healthcare applications.

The post First Time’s the Charm: Sydney Startup Uses AI to Improve IVF Success Rate appeared first on The Official NVIDIA Blog.

Amid Surging Demand for AI Skills, Top Educators Talk Strategy at GTC DC

Employers are scrambling to find people with AI, machine learning and data science skills and higher education is responding. Leaders from a group of top universities gathered at GTC DC Wednesday to discuss how universities can meet this demand.

Martial Hebert, dean of the School of Computer Science at Carnegie Mellon University, was joined by Cammy Abernathy, dean and professor of materials science and engineering at the University of Florida; Kenneth Ball, dean of the Volgenau School of Engineering at George Mason University; and Joe Paris, director for research computing at Northwestern University.

GTC DC has become the premier AI conference in the nation’s capital, this year attended by more than 3,600 developers, researchers, educators and CIOs focusing on the intersection of AI, policy and industry.

Wednesday’s panel, moderated by NVIDIA’s Jonathan Bentz, a solutions architect for higher education and research, life science, and high performance computing, discussed the importance of democratizing AI and data science tools and concepts for students.

The panelists explored three ways to better democratize AI: new degree programs, new coursework and building skills.

“We are distributing important digital skills throughout every course and major — from humanities to fine arts to healthcare to genomics — and developing brand new degrees to meet the needs of the changing workforce,” Ball said.

A major challenge to building skills, however, remains access to computing resources. Hebert described computing as “one of the biggest obstacles” faced by institutions limiting the number of students who can be involved with cutting-edge work.

In addition to access to the most capable machines, students need to be equipped with the knowledge and tools to address bias in AI.

“As we head down this path, it’s not lost on us the examples where our biases as programmers are finding their way into codes that are being applied to important tasks,” Paris said.

Abernathy said she’s “amazed” to see how quickly AI and machine learning have embedded themselves in almost every discipline. As the technology spreads, she stressed the importance of reaching out to and preparing underrepresented groups.

“It’s pretty clear if you want to be employable and a leader in your profession, you need to have skills in these domains,” Abernathy said. “It’s important that we provide access to a wider range of people.”

At GTC DC, the NVIDIA Deep Learning Institute offered a bevy of sold-out courses, workshops and hands-on training in AI, accelerated computing and data science and it announced a dozen new courses on Monday.

Resources:

The post Amid Surging Demand for AI Skills, Top Educators Talk Strategy at GTC DC appeared first on The Official NVIDIA Blog.

In the AI of the Storm: Accelerating Disaster Relief Efforts with Artificial Intelligence

With lives at stake, and the clock ticking, mastering disaster may be the ultimate AI challenge.

Teams from Johns Hopkins University, Lockheed Martin, the U.S. Department of Defense’s Joint Artificial Intelligence Center and NVIDIA Wednesday outlined how they’re working to put AI to work speeding disaster relief to where it’s needed most.

The teams spoke about their work at GTC DC, the Washington edition of NVIDIA’s GPU Technology Conference, which brought together more than 3,500 registered attendees — policymakers, business leaders and researchers among them — to discuss and learn about the latest in AI and data science.

Their presentations underscored GTC DC’s role as Washington’s premier AI conference. They represent the latest efforts, detailed at the event over the past several years, to put the benefits of AI into the hands of policymakers and first-responders.

Detecting Damage with Satellite Imagery

A team from the Johns Hopkins Applied Physics Laboratory and the Joint AI Center (JAIC) spoke about how they’re using GPU-powered deep learning algorithms to track the damage caused by major storms from airborne and satellite imagery data processing.

Speakers included software engineer Beatrice Garcia and senior engineer Gordon Christie, both from the university’s Applied Physics Laboratory, and Captain Dominic Garcia, project lead at JAIC.

While their work hasn’t been deployed — yet — in disaster zones, their goal is to create AI systems that harness satellite and aerial imagery, along with other data, to point first responders and military and government decision-makers and analysts to where the need is greatest.

Such images will help first responders see, at a glance, where to deploy their resources, Christie said, as he showed an AI-enhanced map assessing the damage caused by a tornado that struck Joplin, Mississippi, in 2011.

The lab and JAIC have applied deep learning algorithms to the imagery of a number of severe storms collected from airborne platforms to accelerate detection of flooding and damaged infrastructure.

Based on the algorithms they developed and techniques they learned, the joint team is now creating a scalable environment that would provide these capabilities to any analysts. Users would have access to AI and machine learning algorithms, enabling a faster response to a variety of natural disasters.

Lockheed Prepares with Earthquake Simulation

Andrew Walsh, a senior staff systems engineer at Lockheed Martin, explained how the company is building an open dataset that can be used to train AI for better responses to earthquakes.

Lockheed Martin next explained the work that they’ve done in conjunction with a team from NVIDIA to build an open dataset for multi-platform, multi-sensor machine learning research and development.

The dataset, focused on humanitarian assistance and disaster relief, is being developed using a combination of real-world data collection events as well as simulation.  The current emphasis is on earthquake scenarios.

Andrew Walsh, a senior staff systems engineer at Lockheed Martin, joined May Casterline, a senior solutions architect at NVIDIA, to explain how they choreographed a real-world collection event that  included multiple sensors, aircraft, ground vehicles and teams of actors in a series of simulated earthquake scenarios. They also detailed the effort required to spatiotemporally align all the disparate data sources and described the challenges around labeling such a massive dataset.

Their dataset will be used to train AI and machine learning systems to improve responses to real earthquakes.

Disaster Planning with Data Science

Sean Griffin, president of Disaster Intelligence, spoke late Wednesday afternoon about his company’s approach to disaster prevention and response. His D.C.-based firm is working to create a common web platform that collects datasets relevant to natural and manmade disasters, which are then displayed graphically.

Users — from first responders to everyday citizens — can access the data to make more educated choices before and after a disaster.

“We used to share situational awareness by PDF or sharepoint sites,” said Griffin. But high performance computing is making it possible to update larger audiences with more relevant data.

“It’s our objective as a company to have complete saturation across the U.S. to have outage data in our platforms so that not only do we know that the power’s out, but that we can intersect that information with other key points of interest like healthcare facilities or water systems.”

Griffin presented two use cases. The first showed how Disaster Intelligence’s platform can model the consequences, cost and options for different disaster relief strategies. The second addressed how the platform improves coastal evacuations during hurricanes.

Route Planning with RAPIDS

NVIDIA is hosting a webinar on how RAPIDS, the company’s GPU-accelerated data science software stack, can help speed up route replanning for civilian and military disaster response assets. Register for the webinar, taking place Dec. 17 at 10 am PT, here.

The post In the AI of the Storm: Accelerating Disaster Relief Efforts with Artificial Intelligence appeared first on The Official NVIDIA Blog.