Category: Global

AWS Machine Learning Research Awards Call for Proposal

Written on October 31, 2019. Posted in Amazon.

Academic research and open-source software development are at the forefront of machine learning (ML) technology development. Since 2017, the AWS Machine Learning Research Awards (MLRA) has been aiming to advance machine learning by funding innovative research, training students, and providing researchers with access to the latest technology. MLRA has supported over 100 cutting-edge ML projects, with topics such as ML algorithms, computer vision, natural language processing, medical research, neuroscience, social science, physics, and robotics. Many of the MLRA-backed projects have received media coverage, for example, Researchers are Using Machine Learning to Screen for Autism in Children, The Robotic Future: Where Bots Operate Together and Learn from Each Other, Autonomous Vehicles: The Answer to Our Growing Traffic Woes, Amazon Gives AI to Harvard Hospital in Tech’s Latest Health Push, and Facebook’s Fight to Prevent Deepfake Dystopia Gets a Powerful Partner in Amazon Web Services.

AWS is pleased to announce that MLRA is now calling for proposals for the Q4 2019 cycle, and welcomes faculty members at accredited (Ph.D. granting) academic institutions and researchers at non-profit organizations to apply. The following types of projects are eligible for MLRA funding:

Development of open-source tools and research that benefit the ML community at large.
Impactful research that uses any of the following AWS ML solutions: Amazon SageMaker, Amazon SageMaker Ground Truth, Amazon SageMaker Neo, Apache MXNet on AWS, and AWS AI Services.

MLRA may provide unrestricted cash funds, AWS Promotional Credit, and training resources, including tutorials on how to run ML on AWS and hands-on sessions with Amazon scientists and engineers.

The average awarded amount is no more than $70,000 cash and $100,000 AWS Promotional Credits for individual projects. The actual amount awarded depends on the nature of the project. An internal advisory board at AWS reviews the proposals and makes funding decisions based on potential impact to the ML community, quality of the scientific content, and extent of usage of AWS AI/ML Services.

The submission deadline is at 11:59 PM (PST), December 8, 2019, and decision letters are sent out approximately three months after the submission deadline.

To get started with your application, please consult the MLRA website or send an email to aws-ml-research-awards@amazon.com. We look forward to receiving your applications!

About the Author

An Luo, PhD, is a Senior Technical Program Manager at AWS. An spent many years applying machine learning to biomedical research. Now, she focuses on enabling and accelerating machine learning research leveraging AWS AI/ML technologies.

Bumper Crop of AI Helps Farmers Whack Weeds, Pesticide Use

Written on October 31, 2019. Posted in NVIDIA.

Weeds compete with neighboring crops for light, water and nutrients, costing the farming industry billions each year in agricultural yield.

To keep a better eye on fields, improve crop yields and reduce the use of pesticides, farmers and agriculture researchers are turning to AI.

“We believe the digital agriculture revolution will help in reducing the use of chemical products in agriculture,” said Adel Hafiane, an associate professor at the Institut National des Sciences Appliquées, in France’s Centre Val de Loire. Hafiane is working with colleagues from the University of Orléans to develop AI that detects weeds from drone images of beet, bean and spinach crops.

“If farmers can map the location of weeds,” he said, “they don’t need to spray chemical products over an entire field — they can just target specific areas, intervening at the right time and site.”

Using the georeferenced coordinates of where an aerial image was captured, farmers can determine the location of weeds in a field. The insights provided by the researchers’ deep learning network could then be deployed in agricultural robots on the ground that can remove or spray weeds in large fields.

Hafiane and his colleagues used a cluster of NVIDIA Quadro GPUs to train the neural networks. Their work was supported by France’s Centre-Val de Loire region.

Deep Learning on Cropped Images

From a few hundred feet in the air, using low-resolution images, it’s not easy to tell the difference between weeds and crops — both are green and leafy. But with sufficient image resolution and enough training data, neural networks can learn to differentiate the two.

Using a dataset of tens of thousands of images for each crop (some labeled, some unlabeled), the team relied on transfer learning based on the popular ImageNet model to develop its deep learning models.

To partially automate the data labeling process, the researchers developed an algorithm that used geometric information in the images to label weeds and crops. Crops are often arranged in neat lines, with open patches of soil between the rows. When spots of green are visible in the space between crop rows, the AI knows it’s likely a weed.

A more complex challenge is detecting weeds within the crop rows. The researchers are working to improve their model’s results on spotting these trickier pests.

Developed using the TensorFlow and Caffe deep learning frameworks, the model recognizes weeds in fields of beets, spinach and beans. At a precision of 93 percent, the AI produced the best results analyzing beet crops.

Hafiane says using NVIDIA Quadro GPUs shrunk training time from one week on a high-end CPU down to a few hours. While the dataset used large, 36-megapixel images, the researchers say further increasing the image resolution captured by the drones would help boost the performance of their neural networks.

The researchers are also using NVIDIA GPUs to train neural networks to detect crop diseases in vineyards, and plan to collaborate with international colleagues to develop similar solutions to monitor other crops.

The post Bumper Crop of AI Helps Farmers Whack Weeds, Pesticide Use appeared first on The Official NVIDIA Blog.

Learning to Assemble and to Generalize from Self-Supervised Disassembly

Written on October 30, 2019. Posted in Google.

Posted by Kevin Zakka, Research Intern and Andy Zeng, Research Scientist, Robotics at Google

Our physical world is full of different shapes, and learning how they are all interconnected is a natural part of interacting with our surroundings — for example, we understand that coat hangers hook onto clothing racks, power plugs insert into wall outlets, and USB cables fit into USB sockets. This general concept of “how things fit together” based on their shapes is something that people acquire over time and experience, and it helps to increase the efficiency with which we perform tasks, like assembling DIY furniture kits or packing gifts into a box. If robots could learn “how things fit together,” then perhaps they could become more adaptable to new manipulation tasks involving objects they have never seen before, like reconnecting severed pipes, or building makeshift shelters by piecing together debris during disaster response scenarios.

To explore this idea, we worked with researchers from Stanford and Columbia Universities to develop Form2Fit, a robotic manipulation algorithm that uses deep neural networks to learn to visually recognize how objects correspond (or “fit”) to each other. To test this algorithm, we tasked a real robot to perform kit assembly, where it needed to accurately assemble objects into a blister pack or corrugated display to form a single unit. Previous systems built for this task required extensive manual tuning to assemble a single kit unit at a time. However, we demonstrate that by learning the general concept of “how things fit together,” Form2Fit enables our robot to assemble various types of kits with a 94% success rate. Furthermore, Form2Fit is one of the first systems capable of generalizing to new objects and kitting tasks not seen during training.

Form2Fit learns to assemble a wide variety of kits by finding geometric correspondences between object surfaces and their target placement locations. By leveraging geometric information learned from multiple kits during training, the system generalizes to new objects and kits.

While often overlooked, shape analysis plays an important role in manipulation, especially for tasks like kit assembly. In fact, the shape of an object often matches the shape of its corresponding space in the packaging, and understanding this relationship is what allows people to do this task with minimal guesswork. At its core, Form2Fit aims to learn this relationship by training over numerous pairs of objects and their corresponding placing locations across multiple different kitting tasks – with the goal to acquire a broader understanding of how shapes and surfaces fit together. Form2Fit improves itself over time with minimal human supervision, gathering its own training data by repeatedly disassembling completed kits through trial and error, then time-reversing the disassembly sequences to get assembly trajectories. After training overnight for 12 hours, our robot learns effective pick and place policies for a variety of kits, achieving 94% assembly success rates with objects and kits in varying configurations, and over 86% assembly success rates when handling completely new objects and kits.

Data-Driven Shape Descriptors For Generalizable Assembly
The core component of Form2Fit is a two-stream matching network that learns to infer orientation-sensitive geometric pixel-wise descriptors for objects and their target placement locations from visual data. These descriptors can be understood as compressed 3D point representations that encode object geometry, textures, and contextual task-level knowledge. Form2Fit uses these descriptors to establish correspondences between objects and their target locations (i.e., where they should be placed). Since these descriptors are orientation-sensitive, they allow Form2Fit to infer how the picked object should be rotated before it is placed in its target location.

Form2Fit uses two additional networks to generate valid pick and place candidates. A suction network gets fed a 3D image of the objects and generates pixel-wise predictions of suction success. The suction probability map is visualized as a heatmap, where hotter pixels indicate better locations to grasp the object at the 3D location of the corresponding pixel. In parallel, a place network gets fed a 3D image of the target kit and outputs pixel-wise predictions of placement success. These, too, are visualized as a heatmap, where higher confidence values serve as better locations for the robot arm to approach from a top-down angle to place the object. Finally, the planner integrates the output of all three modules to produce the final pick location, place location and rotation angle.

Overview of Form2Fit. The suction and place networks infer candidate picking and placing locations in the scene respectively. The matching network generates pixel-wise orientation-sensitive descriptors to match picking locations to their corresponding placing locations. The planner then integrates it all to control the robot to execute the next best pick and place action.

Learning Assembly from Disassembly
Neural networks require large amounts of training data, which can be difficult to collect for tasks like assembly. Precisely inserting objects into tight spaces with the correct orientation (e.g., in kits) is challenging to learn through trial and error, because the chances of success from random exploration can be slim. In contrast, disassembling completed units is often easier to learn through trial and error, since there are fewer incorrect ways to remove an object than there are to correctly insert it. We leveraged this difference in order to amass training data for Form2Fit.

An example of self-supervision through time-reversal: rewinding a disassembly sequence of a deodorant kit over time generates a valid assembly sequence.

Our key observation is that in many cases of kit assembly, a disassembly sequence – when reversed over time – becomes a valid assembly sequence. This concept, called time-reversed disassembly, enables Form2Fit to train entirely through self-supervision by randomly picking with trial and error to disassemble a fully-assembled kit, then reversing that disassembly sequence to learn how the kit should be put together.

Generalization Results
The results of our experiments show great potential for learning generalizable policies for assembly. For instance, when a policy is trained to assemble a kit in only one specific position and orientation, it can still robustly assemble random rotations and translations of the kit 90% of the time.

Form2Fit policies are robust to a wide range of rotations and translations of the kits.

We also find that Form2Fit is capable of tackling novel configurations it has not been exposed to during training. For example, when training a policy on two single-object kits (floss and tape), we find that it can successfully assemble new combinations and mixtures of those kits, even though it has never seen such configurations before.

Form2Fit policies can generalize to novel kit configurations such as multiple versions of the same kit and mixtures of different kits.

Furthermore, when given completely novel kits on which it has not been trained, Form2Fit can generalize using its learned shape priors to assemble those kits with over 86% assembly accuracy.

Form2Fit policies can generalize to never-before-seen single and multi-object kits.

What Have the Descriptors Learned?
To explore what the descriptors of the matching network from Form2Fit have learned to encode, we visualize the pixel-wise descriptors of various objects in RGB colorspace through use of an embedding technique called t-SNE.

The t-SNE embedding of the learned object descriptors. Similarly oriented objects of the same category display identical colors (e.g. A, B or F, G) while different objects (e.g. C, H) and same objects but different orientation (e.g. A, C, D or H, F) exhibit different colors.

We observe that the descriptors have learned to encode (a) rotation — objects oriented differently have different descriptors (A, C, D, E) and (H, F); (b) spatial correspondence — same points on the same oriented objects share similar descriptors (A, B) and (F, G); and (c) object identity — zoo animals and fruits exhibit unique descriptors (columns 3 and 4).

Limitations & Future Work
While Form2Fit’s results are promising, its limitations suggest directions for future work. In our experiments, we assume a 2D planar workspace to constrain the kit assembly task so that it can be solved by sequencing top-down picking and placing actions. This may not work for all cases of assembly – for example, when a peg needs to be precisely inserted at a 45 degree angle. It would be interesting to expand Form2Fit to more complex action representations for 3D assembly.

You can learn more about this work and download the code from our GitHub repository.

Acknowledgments
This research was done by Kevin Zakka, Andy Zeng, Johnny Lee, and Shuran Song (faculty at Columbia University), with special thanks to Nick Hynes, Alex Nichol, and Ivan Krasin for fruitful technical discussions; Adrian Wong, Brandon Hurd, Julian Salazar, and Sean Snyder for hardware support; Ryan Hickman for valuable managerial support; and Chad Richards for helpful feedback on writing.

NoTraffic, No Problems: AI Startup Improves Intersections

Written on October 30, 2019. Posted in NVIDIA.

Israel-based technology company NoTraffic is using AI to transform intersections from danger zones to intelligent decision makers, cutting time delays and carbon dioxide emissions.

Next week, NoTraffic’s Yoav Valinsky, a computer vision researcher, will be going to GTC DC to discuss how the company is operating at the edge in a presentation called, “From Theory to Practice: Computer Vision on Edge Devices for Real-Time Optimization.”

Founded by Tal Kreisler, Or Sela, and Uriel Katz, and a member of NVIDIA’s Inception startup incubator, NoTraffic uses AI sensor units at intersections to analyze traffic and optimize traffic lights accordingly.

This proactive approach contrasts with today’s usual intersection technology, such as inductive-loop traffic detectors. Induction loops are installed underground — making it challenging to upgrade or replace — and work like metal detectors to sense cars.

Those traffic detectors are also constrained by the intersection’s fixed time plan. While they can minimize or maximize a light’s duration, the detectors can’t override a light — even if there are no cars coming in that direction.

Using AI at the edge, NoTraffic’s system reduces the cost of installation and maintenance, and gives intersections the ability to prepare for vehicles rather than just react to them.

Rewiring the Rules of the Road

NoTraffic starts by installing AI sensor units aimed in every direction of an intersection. The units use the NVIDIA Jetson platform and GPU-accelerated frameworks to fuse machine vision and radar, processing data roughly 15 times a second, according to Valinsky.

The sensor units also integrate connected vehicle capabilities based on dedicated short range communications (DSRC) and cellular vehicle-to-everything (CV2X). DSRC is a system of wireless communication channels between vehicles and infrastructure; and CV2X technology provides communication between vehicles, infrastructure, and any related entities.

NoTraffic’s units can then detect and classify all road users — including cars, buses, trucks, bicycles, and pedestrians — at the edge. The processed data is then streamed to the Optimization Engine, installed in the traffic signal control cabinets that are already present at most intersections.

There, the data is used to optimize and manage traffic lights, both at the individual intersection and across the city grid. By placing compute closer to the point of action, NoTraffic’s edge system saves bandwidth and lowers latency for faster calculations.

NoTraffic securely sends data from each intersection to the cloud for further processing and city-grid optimization, and presents the information in a dashboard designed for city engineers. They can use it for big data analytics, remote monitoring of intersections, and the implementation of new traffic policies.

NoTraffic, No Problems

Analyzing in real-time provides capabilities such as collision prediction. NoTraffic’s Director of Business Development Ilan Rozenberg explained that the sensor units calculate the speed, acceleration, and direction of vehicles, so they can infer when two cars can’t see each other and will probably collide.

The sensors’ ability to classify vehicles also makes it possible to prioritize certain road users. If a city wanted to prioritize public transportation or pedestrians at the intersections surrounding schools on weekday mornings, city engineers would input that policy in their dashboards. NoTraffic’s system would make the changes autonomously.

The company is currently focused on the U.S., and is conducting pilot projects in several cities and counties across the country. Annually, NoTraffic is reducing delays by 2,700 hours and preventing 33 tons of emissions per intersection.

The company looks forward to making their technology even smarter — the longer it’s implemented, the better it’ll be able to predict vehicle and pedestrian patterns.

To learn more about the AI powering NoTraffic, come out to GTC DC, Nov. 4-6.

The post NoTraffic, No Problems: AI Startup Improves Intersections appeared first on The Official NVIDIA Blog.

Optimizing portfolio value with Amazon SageMaker automatic model tuning

Written on October 30, 2019. Posted in Amazon.

Financial institutions that extend credit face the dual tasks of evaluating the credit risk associated with each loan application and determining a threshold that defines the level of risk they are willing to take on. The evaluation of credit risk is a common application of machine learning (ML) classification models. The determination of a classification threshold, though, is often treated as a secondary concern and set in an ad hoc, unprincipled manner. As a result, institutions may be creating underperforming portfolios and leaving risk-adjusted return on the table.

In this blog post, we describe how to use Amazon SageMaker automatic model tuning to determine the classification threshold that maximizes the portfolio value of a lender choosing a subset of borrowers to lend to. More generally, we describe a method of choosing an optimal threshold, or set of thresholds, in a classification setting. The method we describe doesn’t rely on rules of thumb or generic metrics. It is a systematic and principled method that relies on a business success metric specific to the problem at hand. The method is based upon utility theory and the idea that a rational individual makes decisions so as to maximize her expected utility, or subjective value.

In this post, we assume that the lender is attempting to maximize the expected dollar value of her portfolio by choosing a classification threshold that divides loan applications into two groups: those she accepts and lends to, and those she rejects. In other words, the lender is searching over the space of potential threshold values to find the threshold that results in the highest value for the function that describes her portfolio value.

This post uses Amazon SageMaker automatic model tuning to find that optimal threshold. The accompanying Jupyter notebook demonstrates the code supporting this use case. This is a novel use of the automatic model tuning functionality, which is typically used to choose the hyperparameters that optimize model performance. This post uses it as a general tool to maximize a function over some specific parameter space.

This approach has several advantages over the typical threshold determination approach. Typically, a classification threshold is set (or allowed to default) to 0.5. This threshold doesn’t generate the maximum possible result in the majority of use cases. In contrast, the approach described here chooses a threshold that generates the maximum possible result for the specific business use case being addressed. In the use case in this post, choosing the optimal threshold in the way we describe increases portfolio value by 2.1%.

Also, this approach moves beyond using general rules of thumb and expert judgment in determining an optimal threshold. It lays out a structured framework that can be systematically applied to any classification problem. Additionally, this approach requires the business to explicitly state its cost matrix based on the specific actions to be taken on model predictions and their benefits and costs. This evaluation process moves well beyond simply assessing the classification results of the model. This approach can drive challenging discussions in the business, and force differing implicit decisions and valuations onto the table for open discussion and agreement. This drives the discussion from a simple “maximize this value”, to a more informative analysis that allows more complex economic trade-offs, which provides more value back to the business.

About this blog post
Time to read	20 minutes
Time to complete	1.5 hours
Cost to complete	~ $2
Learning level	Advanced (300)
AWS services	Amazon SageMaker

Background

Assume that a lender is attempting to construct a portfolio from a pool of potential loans. To tackle this use case, the lender must first assess the credit risk associated with each loan in the pool by calculating a probability of default for each loan; the higher the probability of default associated with a loan, the higher the credit risk associated with a loan. To calculate a loan’s probability of default, the lender uses an ML classification model, such as a logistic regression or random forest.

Given that the lender has estimated a default probability model, how does she choose the threshold that sets the maximum default probability that a loan can have and she be willing to extend the loan? Users of classification models often set the value of a threshold to the conventional default value of 0.5. Even if they do attempt to set a use case-specific threshold, they do so based upon maximizing some threshold-based metric such as precision or recall. One issue with these metrics is that they ignore certain parts of the discrete outcomes described in the classification matrix. For example, precision overlooks true and false negative outcomes. Additionally, these metrics do not incorporate the dollar costs and benefits associated with each cell of the classification matrix. For example, in the case we examine in this post, the interest rate and loss given a default associated with each loan would be ignored in the calculation of typical threshold-based measures. This situation is less than ideal because, ultimately, what a business values is not the precision or recall of its model, but the dollar value of the incremental profit from using a specific model and threshold.

Therefore, instead of using a generic metric, it is likely more profitable and meaningful to the business to design a threshold-based metric that captures the cost and benefit structure of the specific business use case at hand. The lender we describe in this post is deciding whether to lend or not to set of borrowers. Therefore, a metric that incorporates the expected interest earned and losses from each loan given a predicted probability of default is much more relevant to the business and its decision-making process than some generic metric such as precision or recall. Specifically, the portfolio value metric that we define classifies each loan into one of four buckets: True Positive (TP), False Negative (FN), True Negative (TN), and False Positive (FP); and then calculates the value of each bucket of loans using the following guidelines:

TP value = -Fixed_Cost

FN value = -Fixed_Cost – Loss_Given_Default * Outstanding_Principal_Balance

TN value = -Fixed_Cost + Interest_Rate * Outstanding_Principal_Balance

FP value = -Fixed_Cost

Fixed_Cost captures the costs associated with processing a loan, whether it is approved or not.

Outstanding_Principal_Balance is the principal remaining at the time of default or full repayment.

Interest_Rate is a borrower-specific rate that is set based upon the probability of default associated with a specific loan application plus the expected return desired by the lender.

Loss_Given_Default is the proportion of principal expected to be lost if a loan defaults.

To calculate the total value of a specific bucket of loans, the value of all loans is summed. This total is what the lender is attempting to maximize by choosing a threshold.

Once the lender has clearly defined a quantitative measure of portfolio value, she must then choose the threshold that maximizes that measure. We use Amazon SageMaker automatic model tuning to find the optimal threshold. Amazon SageMaker automatic model tuning is a powerful tool for not only tuning the hyperparameters of an ML model, but also for maximizing an arbitrary function. In this case, we use automatic model tuning in two ways:

Finding the choice of a threshold that maximizes the lender’s portfolio value.
Mapping out the relationship between threshold and portfolio value more generally.

Understanding the relationship between the threshold choice and portfolio value allows us to more fully understand the economic trade-offs of increasing or decreasing the threshold. This is important as lenders frequently want to consider additional goals beyond simply maximizing the dollar value of their portfolio. Some lenders have idiosyncratic, secondary goals. For example, a lender may want to maximize her portfolio value while also emphasizing lending to a particular sector of the economy or certain subgroup of the overall population. Knowing how the portfolio’s value changes when the threshold moves allows the lender to set a reasonable threshold that addresses both her primary goal of portfolio maximization and her additional secondary goals.

We make several assumptions in this work. We assume that the lender has access to the capital necessary to extend all the loans associated with default probabilities below a chosen threshold. The problem is unconstrained in that sense. Additionally we assume that if a loan is approved, the applicant accepts the terms of the loan no matter what interest rate the lender offers. Lastly, we assume that the lender is risk-neutral, that is, we assume that the lender’s utility function is the identity function. In other words, the utility that a lender gains from a certain portfolio value is equal to the portfolio value itself.

The Amazon SageMaker notebook containing the executable code is available on this GitHub repo. You need to run this notebook within an Amazon SageMaker notebook instance to use Amazon SageMaker automatic model tuning. To do this, download the Jupyter notebook associated with this post from the preceding GitHub link. Create an Amazon SageMaker notebook instance and upload the Jupyter notebook onto this notebook instance. Lastly, open the notebook and step through the code. For more information, see Create a Notebook Instance. This post provides an HTML version so that you can review the code without needing to execute it.

Solution overview

The next sections walk through the following steps:

Preparing a set of loan data for model training.
Training a random forest classifier using the Amazon SageMaker built-in Scikit-learn Estimator.
Analyzing the performance of the initial model.
Using automatic model tuning to find the threshold that gives the highest portfolio value.
Analyzing portfolio performance compared to the portfolio that uses the default threshold.
Incorporating additional business goals and analyzing their impact on the portfolio.

Loan data

The data consists of a set of US Small Business Administration (SBA)-guaranteed loans from 1987 to 2014. These are loans extended to US-based small businesses by private banks, though the US SBA guarantees a large percentage of the principal in the event of borrower default. On average, the SBA guarantees about 70% of the principal for each of the loans in this dataset. This sizable guarantee offsets much of the credit risk associated with each loan and encourages private banks to extend credit to small businesses to which they might not otherwise. For the data itself, and a more detailed description of the data, see the supplementary material of Li, Mickel, and Taylor. You should also read the license associated with the use of this research paper.

Our goal is to construct a model that predicts the probability that a specific loan will default, thus the target variable is MIS_Status. MIS_Status takes on two values: “P I F” if a loan has been paid in full, or “CHGOFF” if a loan has defaulted and the bank has taken the resulting loss.

The accompanying notebook shows that the target variable is imbalanced—about 18% of the observations have defaulted. Our approach in dealing with this imbalance is to estimate the model with the data as-is, and then set the decision threshold to optimize the economic value of our credit portfolio.

Training the model

Next we train a random forest classifier using the Amazon SageMaker built-in Scikit-learn estimator. We chose a random forest after comparing its performance to both that of a Logistic Regression and a Gradient Boosted Classifier. With the Amazon SageMaker built-in estimator, you can build and deploy custom Scikit-learn models without needing to create and manage a custom Docker container.

For more information, see Using Scikit-learn with the Amazon SageMaker Python SDK.

For the code detailing the training of the random forest, see the “Training the Model” section of the notebook associated with this post.

Analyzing model performance (part 1)

For comparison, we create a naïve model, classifying all observations to the majority class, that is, predicting that no loans will default. Does the random forest perform better than the naive model?

We have not yet determined the optimal threshold to classify the prediction of the random forest model into default or non-default classes. Therefore, the only performance metrics available to us to answer the above question are those based upon the predicted class probabilities output by our model. Metrics based on class predictions, for example, accuracy, precision or recall, are dependent on our as-yet-undefined threshold. So to answer this question initially, we compare the log loss of the random forecast and naive models. Log loss calculates how far predicted class probabilities are off from the true labels. Therefore, log loss is metric that can be determined without reference to a threshold.

We will more thoroughly analyze model performance, using the more familiar threshold-based metrics, after we have calculated the optimal threshold.

Calculating log loss

Does the random forest perform better than the naive model? Remember that a smaller log loss indicates a smaller error and better performance. The following output from the model runs shows the results:

Naive Log Loss: 6.1230
Random Forest Log Loss: 0.2039

The answer is yes, the random forest improves on the log loss of the naive model by a significant amount. This implies that the random forest model assigned predicted class probabilities to each observation that are much closer to the truth than the naive model’s predictions.

Plotting the model predictions

In each of the following plot sets, the top histogram plots the distribution of predicted scores for all actual negatives, that is, the predicted scores for borrowers that do not default. In essence, it represents the score distributions associated with specificity. The bottom histogram plots predicted scores for actual positives, that is, the predicted scores for borrowers that do default, thus representing the score distributions for sensitivity.

The correctly classified observations on each plot are colored blue, and the incorrectly classified observations are colored orange. We use the default threshold value of 0.5 to color these plots. This is the typical threshold used to classify the results of a classification model, chosen without attempting to maximize the user’s success—or value—metric.

The threshold choice does not affect the actual predicted scores, shape, or level of the plots, only the coloring. It does, however, effect metric results, including sensitivity, specificity, and most commonly used model performance metrics.

These two graph shows that while the scores for the true negatives are clustered close to 0, the scores for the false negatives are distributed relatively evenly from 0 to the current cutoff at 0.5. The dataset doesn’t include data items that would allow strong discrimination between true and false negatives.

This distribution may point to a significant amount of potential income being missed from this portfolio of approved and rejected loans. Using the default threshold score of 0.5 for approving a loan is not optimal for this dataset. Let’s explore how the portfolio value can be further increased by optimizing the threshold.

Calculating portfolio value based upon a 0.5 threshold

Lastly, we calculate the portfolio values for the naive model and the random forest model based upon a 0.5 threshold. These portfolio values act as reference points to determine if choosing an optimal threshold increases the value of the loan portfolio.

Note that any non-zero threshold results in the same portfolio value in the naive model because the probability of default for all loans is 0. The following output shows the calculated portfolio values:

Naive Portfolio Value (Threshold=0.5): $203,498,022
Random Forest Portfolio Value (Threshold=0.5): $823,674,285

Determining the optimal classification threshold with automatic model tuning

Could we do even better, by choosing a different threshold? And how do we go about finding the optimal threshold to balance the lender’s risk and reward?

In this section, the optimal threshold for classifying loans as default or non-default is determined with Amazon SageMaker automatic model tuning. The optimal threshold is the threshold that maximizes the user’s value metric. In this case, the metric that is being maximized is total portfolio value, as described previously.

To use Amazon SageMaker automatic model tuning to optimize the classification threshold, we construct a Docker container that takes the random forest model trained previously and the test set as input. Given a threshold, the container calculates the total value of the portfolio if the lender extended all loans classified as non-default, and the borrower accepted them. Amazon SageMaker automatic model tuning generates a range of thresholds between 0 and 1 and chooses the threshold that maximizes portfolio value. For the code detailing the automatic model tuning job, see the “Determining the Optimal Classification Threshold with Automatic Modeling Tuning” section of the notebook associated with this post.

Running the automatic model tuning job

To use the Amazon SageMaker automatic model tuning feature, we first need to define the metric that we want Amazon SageMaker to optimize, the parameter space we want the tuning job to search over to find the optimal threshold, and any additional metrics we want calculated during the tuning job.

In the notebook associated with this blog post, we define the metrics we wish each job to return. As we’d like to explore the characteristics of the portfolio generated in some detail, we generate a list of metrics that describe the approved and rejected loans. These metrics are reported from each training job that runs via automatic model tuning. The additional metrics allow us to explore the characteristics of the maximized portfolio.

Of all the metrics we define, we need to specify which metric the automatic model tuning job should use to optimize the threshold. We do this by specifying the objective_metric_name in the following HyperparameterTuner object. In the same object, we specify the hyperparameter range to search over; in this case, we specify all continuous values between 0 and 1 to search over for the optimal threshold.

Lastly, we specify that we want Amazon SageMaker to run 200 individual training jobs. Each of these 200 training jobs uses a specific threshold value to calculate a different portfolio value. After Amazon SageMaker calculates the 200 portfolio values, each based upon a different threshold, it outputs the threshold that maximizes portfolio value.

This job takes up to 1 hour to run.

Analyzing model performance (part 2)

In this section, we continue analyzing the performance of the naive and random forest models, but now that we have determined the optimal threshold, we are able to incorporate threshold-based metrics in the analysis.

Plotting the automatic model tuning job results

The flatness in the following scatter plots is due to the precision of predictions, which is a function of the number of trees in the random forest model. Because there are 100 trees in the random forest model, the precision of the predictions is two decimal places. This implies that all thresholds, for example, >.32 and <=.33, give the same result.

Plotting prediction distributions given the optimal threshold

Now that we know the optimal threshold, we are able to plot the probability predictions of the random forest model and classify each as correct or incorrect. The top histogram plots the distribution of predicted scores for all actual negatives, that is, predicted scores for actual non-defaulters. The bottom histogram plots predicted scores for actual defaulters. The correctly classified observations on each plot are blue, and the incorrectly classified observations are orange.

The plot shows that the optimal threshold is below 0.5 and to the left of the bulk of the actual positives. The threshold seems to be at the point where the rate of change of true negatives as the threshold increases is slowing and the rate of change of false negatives is speeding up. The automatic model tuning job seems to have chosen a threshold that balances the two rates of change. To better understand the choice of optimal threshold, we would need to dig deeper into the portfolio value calculation and understand the costs and benefits associated with a change in threshold.

Determining maximum portfolio value

The following graphs plot the output of the automatic model tuning job. That is, they plot the portfolio value (on the y-axis) given a specific threshold (on the x-axis). Each point on a plot represents the outcome of a single training job from the overall automatic model tuning job. Recall that the goal is to find the classification threshold that optimizes the overall portfolio value. In each plot, the optimal threshold is the vertical, orange line.

The graph on the far left plots all 200 training job outcomes. The middle graph plots the top 100 training jobs as ranked by portfolio value, and the far-right graph plots the top 50 training jobs, also ranked by portfolio value.

Interestingly, the magnitude of the rate of change as we increase the threshold beyond its optimum value is generally much lower than the magnitude of the rate of change as we increase the threshold from 0 to its optimal value. This asymmetry is due to the SBA guarantee. The guarantee limits the downside risk that the lender takes on as she loosens her borrowing standards. If the SBA guarantee were not in place, we would expect the right side of this graph to decrease much more steeply.

Looking at the right two graphs, we zoom in on the peak of the curve and see that it is more symmetric around the optimal threshold. Additionally, the curve is not strictly decreasing after the optimal threshold; at times, the curve increases briefly. The following output shows the portfolio values for each model:

Naive Portfolio Value (Threshold=0.5):  $203,498,022
Random Forest Portfolio Value (Threshold=0.5): $823,674,285
Random Forest Portfolio Value (Optimal Threshold=0.359): $841,421,888

The top portfolio value returned from the random forest model with an optimized threshold is higher than both that generated by the naive model and by the random forest model with a 0.5 threshold. The increased portfolio value by adjusting the threshold is $17.7M, or 2.1%—a substantial increase in potential return.

Interestingly, the optimal threshold is less than 0.5, so the lender can increase the overall value of her portfolio by decreasing the credit risk of the loans in the portfolio (by decreasing the threshold). If the lender had used a 0.5 threshold (the typical default value), she would likely have created a portfolio likely with more credit risk and lower portfolio value. If the SBA guarantee were not in place for these loans, the portfolio value at a threshold of 0.5 would likely have been much lower.

Analyzing the return associated with maximum portfolio value

This section shifts from focusing on the dollar return of the portfolio to the percentage return. The following set of graphs is similar to the previous set except that the graphs plot the net return on the portfolio associated with each of the 200 training jobs in the automatic model tuning run. The orange, vertical line is again the optimal threshold—optimal in the sense of maximizing portfolio value, not portfolio return—and the x-axis is the threshold. The y-axis is the portfolio return.

From left to right, these graphs plot all 200 training job outcomes, the top 100 outcomes (based upon portfolio values), and the top 50 outcomes (based upon portfolio values). These return curves are much flatter than the portfolio value curves in the previous set of graphs. This is because the lender actively set interest rates on each of the loans she extends so that the return on the overall portfolio is expected to be about 5%. Additionally, note that the optimal threshold does not mark the peak in portfolio return. This is because, when maximizing portfolio value, it doesn’t matter whether adding more loans increases the percentage return on the portfolio, only that adding more loans adds to the dollar return on the portfolio. We can add lower percentage return loans to the portfolio and still add positive value in dollar terms, and that is what we are attempting to maximize.

The following output shows the results of calculating the return:

Naive Model Portfolio Return (Threshold=0.5): 0.012
Random Forest Portfolio Return (Threshold=0.5): 0.051
Random Forest Portfolio Return (Optimal Threshold=0.359): 0.054

Likewise, the portfolio return from the random forest model with an optimized threshold is much higher than that generated by the naive model, though the returns from the two random forest models are similar. This is because in both of those models, the lender can set borrower-specific interest rates to compensate for borrower-specific levels of credit risk. If the threshold increases and higher risk loans enter the portfolio, the lender can set higher interest rates on those loans and on average keep her return the same.

Adjusting the optimal threshold based upon additional business considerations

Now we investigate how to determine if we should make marginal adjustments to the optimal threshold. Why would we want to adjust the optimal threshold calculated previously? There may be certain idiosyncratic goals that a lender wants to achieve that a generic portfolio value calculation doesn’t capture. For example, a lender may want to maximize her portfolio value while also emphasizing lending to a certain sector of the economy or subgroup of the overall population. Adding this additional constraint to the portfolio value calculation itself may be difficult, if not impossible. Tackling these problems in two steps—finding the generic optimum and then adjusting that optimum based upon idiosyncratic preferences—is likely much easier and more intuitive of a calculation.

As an example, say that the lender would like to extend more credit to the Construction sector of the economy. She wishes to determine if she should increase the optimal threshold to achieve this goal. Essentially she needs to determine the price she is willing to pay to include one more Construction sector loan in the portfolio, and the effect on portfolio value of including that loan. If the price is greater than the cost, then she should increase the threshold.

More specifically, to answer the question of whether the lender should increase the threshold by 0.01 (the smallest increment possible in our case), she needs to do the following:

Determine the price P that she is willing to pay for each additional Construction loan.
Calculate the decrease in portfolio value resulting from increasing the threshold by 0.01.
Calculate the number of Construction sector loans added to the portfolio when the threshold increases.
Calculate the average cost of each additional Construction loan by dividing the change in portfolio value by the number of Construction loans added. This is the mean cost C of each additional Construction loan in dollar terms.
Compare price P that the lender is willing to pay for each additional Construction to the cost C that she must actually pay for each additional Construction loan.
- If the willingness-to-pay price is greater than the cost (P >= -C), increase the threshold by 0.01.
- Otherwise, keep the threshold as-is.
Continue to iterate on steps 2 to 5, until it is no longer advantageous to increase the threshold.

For the code detailing the following calculations, see the notebook associated with this post.

Step 1: Determining the lender’s willingness-to-pay

The lender must first determine the amount of portfolio value she is willing to forfeit for each additional Construction sector loan. Assume the lender’s willing-to-pay P in this example is $75,000.

P = 75000

Step 2: Determining the decrease in portfolio value

The lender must calculate the portfolio value at the optimal threshold and the next highest threshold value, and then calculate the difference to determine how much the portfolio value decreases as she increases the threshold by the minimum unit. This calculates as follows:

Decrease in Portfolio Value: -$1,640,192

Step 3: Determining the increase in number of construction loans

Next, calculate the number of Construction sector loans that are added to the portfolio when the threshold increases by 0.01. The result is as follows:

Increase in Number of Construction Loans: 26

Step 4: Determining the cost of each construction loan

The cost is calculated according to the following formula:

Cost of each Additional Construction Loan: -$63,084

Step 5: Comparing the cost to willingness-to-pay

If the price P is greater than or equal to the cost C x -1 (because the cost is negative), move the threshold. In this example, the lender should move the threshold because the cost of $63,084 is less than the lender’s willingness-to-pay of $75,000, and make those 26 additional loans.

The lender would not stop with this one step. She would continue to ask if she should increase the threshold by another 0.01 and iterate through the previous steps until she reaches a point at which she chooses not to increase the threshold.

We assume that the lender always has access to the required capital if her willingness-to-pay is greater than the cost of an additional Construction sector loan. If desired, we can include a capital budget W for the lender as well. This change would modify the final step so that the lender checks both if P >= -C and if there is a sufficient amount of capital remaining in W to cover the sum of the principal of the additional loans.

Other model metrics

How do the naive, random forest with 0.5 threshold, and random forest with optimal threshold models compare according to the more traditional performance metrics, such as accuracy, precision, and recall?

The following table reports the accuracy, precision, and recall for all three models:

	Accuracy	Precision_0	Precision_1	Recall_0	Recall_1
Naive Model	0.822721	0.822721	NaN	1.000000	0.000000
Random Forest Model (0.5 Threshold)	0.935302	0.944246	0.883336	0.979177	0.731683
Random Forest Model (Optimal Threshold)	0.934975	0.960350	0.817026	0.960626	0.815937

According to this table, which model is the best? That question can’t be truly answered unless we know the benefits and costs to the lender associated with each cell of the confusion matrix, that is, the benefits associated with the true positives and true negatives and the costs associated with the false positives and false negatives.

It’s clear from the preceding table that both random forest models strictly dominate the naive model (assuming that the cost of a false positive isn’t significantly larger than the cost of a false negative). Additionally, there isn’t a clear-cut winner between the two random forest models. The answer depends upon the relative costs of misclassification to the lender. We know from the business context of the problem described in the introduction that there is a significantly higher cost associated with a false negative than with a false positive. Given that information, it is more valuable for the lender to minimize false negatives, and as such, Recall_1 or Precision_0 are the most salient metric.

This discussion illustrates the fact that determining the so-called best model requires knowledge of the business use case that this ML model addresses, and the benefits and costs associated with each potential classification outcome; only then can we determine the metric that best captures what success means to the business. Additionally, precision and recall only include information about two of the four cells of the confusion matrix, but the lender cares about the net benefits associated with all four cells. Using these typical metrics ignores half of the outcomes that the lender cares about and also ignores the specific costs and benefits associated with all outcomes. Because of this, these metrics are lacking, and one should calculate a single problem-specific metric that incorporates the specific costs and benefits associated with all cells of the confusion matrix to determine the optimal threshold. In this post, this metric is portfolio value.

This optimization approach can be used more generally to test whether a threshold is optimal for the problem and data at hand.

Cleaning Up

If you created a new Amazon SageMaker notebook instance to run the code, remember to stop or delete it to minimize costs.

Conclusion

This post showed how to find the optimal threshold in a binary classification problem. Specifically, we describe how to use Amazon SageMaker automatic model tuning to determine the classification threshold that maximizes the portfolio value of a lender when choosing which subset of borrowers to extend credit to. More generally, the method of choosing an optimal threshold we describe can be applied to situations in which you need to choose multiple thresholds. The main modification needed is to incorporate multiple thresholds into the problem-specific, threshold-based metric. After doing that, you could use Amazon SageMaker automatic model tuning to find a vector of thresholds, as opposed to a single threshold, that maximizes your metric.

The threshold determination approach we describe has several substantial advantages. First, it makes the logic and rationale used in determining a threshold explicit. Second, it requires the business to clearly state its cost matrix, based on the specific actions to take on the model predictions and their associated benefits and costs. Making the logic and cost structure explicit can drive challenging discussions in the business, and force differing implicit decisions and valuations onto the table for open discussion and agreement. In addition, though explainable ML is beyond the scope of this post, the explicit statement of the logic and cost structure of threshold determination encouraged by our approach fits well with the goals of that line of research.

Lastly, this approach can also potentially be used to address the issue of imbalanced data. The issue with imbalanced data is often not that one target class has a much larger representation in the data than another target class, it’s that the misclassification costs (that is, the cost of a false positive versus a false negative), are dramatically different from one another. Instead of using sampling to balance the training data, you can clearly define the misclassification costs in the problem-specific metric, and use that metric to find an optimal threshold. This approach makes the issue less a technical one of using a trick of modifying the distribution of data to more of a business one of clearly specifying the cost structure of a problem. That may address the true issue of imbalanced data more directly, which is the issue of imbalanced misclassification costs.

For any of your business use cases that requires setting a classification threshold, consider using Amazon SageMaker automatic model tuning and the method this post describes. To get started, open the Amazon SageMaker console and the code from the GitHub repo that generated results in this post. If you have thoughts on business use cases that you could apply this method to, or any questions, please leave them in the comments. For more information on training models that have asymmetric classification costs, see Training models with unequal economic error costs using Amazon SageMaker.

Sources and references:

Friedman, Milton, and L. J. Savage. “The Utility Analysis of Choices Involving Risk.” Journal of Political Economy 56, no. 4 (1948): 279–304.

Data sourced from: Li, Min, Amy Mickel, and Stanley Taylor. “‘Should This Loan Be Approved or Denied?’: A Large Dataset with Class Assignment Guidelines.” Journal of Statistics Education 26, no. 1 (January 2, 2018): 55–66. https://doi.org/10.1080/10691898.2018.1434342.

Metz, Charles E. “Basic Principles of ROC Analysis.” Seminars in Nuclear Medicine 8, no. 4 (October 1978): 283–98. https://doi.org/10.1016/S0001-2998(78)80014-2

Wu, Yirong, Craig K. Abbey, Xianqiao Chen, Jie Liu, David C. Page, Oguzhan Alagoz, Peggy Peissig, Adedayo A. Onitilo, and Elizabeth S. Burnside. “Developing a Utility Decision Framework to Evaluate Predictive Models in Breast Cancer Risk Estimation.” Journal of Medical Imaging 2, no. 4 (October 2015). https://doi.org/10.1117/1.JMI.2.4.041005.

Zadrozny, Bianca, and Charles Elkan. “Learning and Making Decisions When Costs and Probabilities Are Both Unknown,” 204–13. ACM Press, 2001. https://doi.org/10.1145/502512.502540.

Veronika Megler and Scott Gregoire. “Training models with unequal economic error costs using Amazon SageMaker,“ AWS Machine Learning Blog, 18 Sept 2018.

About the Authors

Scott Gregoire is a Data Scientist with AWS Professional Services. He holds a PhD in Economics from the University of Texas at Austin and has advised clients in sectors ranging from international finance to retail. Currently, he is working with customers to develop innovative machine learning solutions on AWS.

Veronika Megler, PhD, is a senior consultant for AWS Professional Services. She enjoys adapting innovative big data, AI and ML technologies to help customers solve new problems, and to solve old problems more efficiently and effectively.

Picture-Perfect Product Help: AI Startup Brings Computer Vision to Customer Service

Written on October 29, 2019. Posted in NVIDIA.

When your appliances break, the last thing you want to do is spend an hour on the phone trying to reach a customer service representative.

Using computer vision, Drishyam.AI is eliminating service lines to help consumers more quickly.

Satish Mandalika, the CEO and founder of the deep learning-based image recognition platform, spoke with AI Podcast host Noah Kravitz about the company.

“Customer support is ripe for disruption,” Mandalika said. Drishyam.AI is changing the game by giving customers an app that they use to take a picture of the product they need help with at any time of day or night, rather than calling a help line.

Using computer vision, Drishyam.AI analyzes the issue and communicates directly with manufacturers, rather than going through retail outlets. This is more efficient because a product’s lifetime warranty is usually held by the company that made it, rather than the stores selling it like Home Depot and Lowe’s.

Since Drishyam.AI’s founding two years ago, the company is only pursuing relationships with manufacturers, but that could change in the future Mandalika said, by collecting data more and more data. “We build that intelligence across product lines in a domain, and then we can turn around and help the consumer directly,” Mandalika said.

A member of NVIDIA’s Inception startup incubator, Drishyam.AI’s pilot projects include two large faucet manufacturing companies, which will soon be converted into paying client.

The home improvement domain is Drishyam.AI’s beachhead, given the numerous amount of products in that field that have lifetime warranties and require customer support. However, they’re expanding into a variety of fields.

Mandalika’s vision for Drishyam.AI is that eventually, “You should be able to get support for any product that you need by just pointing your mobile device at it. And platforms like ours will then help you identify the products, troubleshoot, and even order parts and all that.”

To find out more about Drishyam.AI, visit their website or their twitter.

How to Tune in to the AI Podcast

Get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Your favorite not listed here? Email us at aipodcast [at] nvidia [dot] com or fill out this short listener survey.

The post Picture-Perfect Product Help: AI Startup Brings Computer Vision to Customer Service appeared first on The Official NVIDIA Blog.

En garde! Wearable IoT and AI keep fencers on point

Written on October 29, 2019. Posted in Microsoft.

The post En garde! Wearable IoT and AI keep fencers on point appeared first on The AI Blog.

Connecting the Dots: Domino Data Lab Drops Into Data Science Wave

Written on October 28, 2019. Posted in NVIDIA.

Event opportunity: Join Josh Poduska, Domino Data Lab’s chief data scientist, who will be presenting at GTC DC on Tuesday, Nov. 5.

As Wall Street was morphing into a game of quants, Nick Elprin, Christopher Yang and Matthew Granade saw something big shaping up on the horizon: a data science wave swelling across industries.

So, the three left Bridgewater Associates, the world’s largest hedge fund, and shortly thereafter started Domino Data Lab, an open source data science platform now making a splash with AI developers worldwide.

“My co-founders and I built a lot of the internal platforms and technology that those quants used at Bridgewater to do their quantitative research — what the rest of the world now calls data science,” said Elprin, the company’s CEO.

The San Francisco company, a member of the NVIDIA Inception program that helps startups scale, in August landed on Inc. magazine’s annual list of the fastest-growing private companies.

Bridgewater to Domino

After leaving Bridgewater in 2013, the three found that what companies lacked most was an industrialized platform for data science teams, so they started Domino Data Lab to fill the void.

“The experience and perspective at Bridgewater let us see the white space in the market to see what technology and products could do,” said Elprin.

Domino’s software platform automates infrastructure for data scientists, enabling users to accelerate research, deploy models and track projects.

Under Domino’s Hood

A data science supercharger, Domino’s customizable environment provides users with data science tools to speed workflows.

Its Domino Analytics Distribution offers a scientific computing stack for programming in Python, R, Julia and other popular languages. Domino offers access to commonly used interactive tools and notebooks, including Jupyter, RStudio, Zeppelin and Beaker.

Domino also provides deep learning packages and GPU drivers, including access to frameworks such as TensorFlow, Theano and Keras. The platform enables access to any NVIDIA GPUs in the cloud.

“Working with NVIDIA has helped Domino build products that allow our mutual customers to automate deployment of workloads to GPUs,” Elprin said. “NVIDIA Inception has also helped us grow our Fortune 500 customer base through podcasts and conference talks.”

Customer Domino Effect

Companies are lining up. Red Hat, Dell, Bayer, AllState, Gap and Bristol-Myers Squibb are all using Domino to accelerate their data science workflows.

“Our investment in Domino has really paid off — probably a return around 10x in terms of efficiency of our data science community,” said Heidi Lanford, vice president of enterprise data and analytics at Red Hat, in a video.

Image credit: Photo by Shalom Jacobovitz, licensed under Creative Commons.

The post Connecting the Dots: Domino Data Lab Drops Into Data Science Wave appeared first on The Official NVIDIA Blog.

On-Device Captioning with Live Caption

Written on October 28, 2019. Posted in Google.

Posted by Michelle Tadmor-Ramanovich and Nadav Bar, Senior Software Engineers, Google Research, Tel-Aviv

Captions for audio content are essential for the deaf and hard of hearing, but they benefit everyone. Watching video without audio is common — whether on the train, in meetings, in bed or when the kids are asleep — and studies have shown that subtitles can increase the duration of time that users spend watching a video by almost 40%. Yet caption support is fragmented across apps and even within them, resulting in a significant amount of audio content that remains inaccessible, including live blogs, podcasts, personal videos, audio messages, social media and others.

Recently we introduced Live Caption, a new Android feature that automatically captions media playing on your phone. The captioning happens in real time, completely on-device, without using network resources, thus preserving privacy and lowering latency. The feature is currently available on Pixel 4 and Pixel 4 XL, will roll out to Pixel 3 models later this year, and will be more widely available on other Android devices soon.

When media is playing, Live Caption can be launched with a single tap from the volume control to display a caption box on the screen.

Building Live Caption for Accuracy and Efficiency
Live Caption works through a combination of three on-device deep learning models: a recurrent neural network (RNN) sequence transduction model for speech recognition (RNN-T), a text-based recurrent neural network model for unspoken punctuation, and a convolutional neural network (CNN) model for sound events classification. Live Caption integrates the signal from the three models to create a single caption track, where sound event tags, like [APPLAUSE] and [MUSIC], appear without interrupting the flow of speech recognition results. Punctuation symbols are predicted while text is updated in parallel.

Incoming sound is processed through a Sound Recognition and ASR feedback loop. The produced text or sound label is formatted and added to the caption.

For sound recognition, we leverage previous work that was done for sound events detection, using a model that was built on top of the AudioSet dataset. The Sound Recognition model is used not only to generate popular sound effect labels but also to detect speech periods. The full automatic speech recognition (ASR) RNN-T engine runs only during speech periods in order to minimize memory and battery usage. For example, when music is detected and speech is not present in the audio stream, the [MUSIC] label will appear on screen, and the ASR model will be unloaded. The ASR model is only loaded back into memory when speech is present in the audio stream again.

In order for Live Caption to be most useful, it should be able to run continuously for long periods of time. To do this, Live Caption’s ASR model is optimized for edge-devices using several techniques, such as neural connection pruning, which reduced the power consumption to 50% compared to the full sized speech model. Yet while the model is significantly more energy efficient, it still performs well for a variety of use cases, including captioning videos, recognizing short queries and narrowband telephony speech, while also being robust to background noise.

The text-based punctuation model was optimized for running continuously on-device using a smaller architecture than the cloud equivalent, and then quantized and serialized using the TensorFlow Lite runtime. As the caption is formed, speech recognition results are rapidly updated a few times per second. In order to save on computational resources and provide a smooth user experience, the punctuation prediction is performed on the tail of the text from the most recently recognized sentence, and if the next updated ASR results do not change that text, the previously punctuated results are retained and reused.

Looking forward
Live Caption is now available in English on Pixel 4 and will soon be available on Pixel 3 and other Android devices. We look forward to bringing this feature to more users by expanding its support to other languages and by further improving the formatting in order to improve the perceived accuracy and coherency of the captions, particularly for multi-speaker content.

Acknowledgements
The core team includes Robert Berry, Anthony Tripaldi, Danielle Cohen, Anna Belozovsky, Yoni Tsafir, Elliott Burford, Justin Lee, Kelsie Van Deman, Nicole Bleuel, Brian Kemler, and Benny Schlesinger. We would like to thank the Google Speech team, especially Qiao Liang, Arun Narayanan, and Rohit Prabhavalkar for their insightful work on the ASR model as well as Chung-Cheng Chiu from Google Brain Team; Dan Ellis and Justin Paul for their help with integrating the Sound Recognition model; Tal Remez for his help in developing the punctuation model; Kevin Rocard and Eric Laurent‎ for their help with the Android audio capture API; and Eugenio Marchiori, Shivanker Goel, Ye Wen, Jay Yoo, Asela Gunawardana, and Tom Hume for their help with the Android infrastructure work.

DC Startup Casts an AI Net to Stop Phishing and Malware

Written on October 28, 2019. Posted in NVIDIA.

When the price went way up on a key service a small Washington, D.C., firm was using to protect its customers’ internet connectivity, the company balked.

After not finding a suitable alternative, the company decided to build its own. The result was a whole new business, called DNSFilter, which is casting a wide net around the market to combat phishing and malware.

Its innovation: It ditched the crowdsourcing model that has served for more than a decade as the bedrock for identifying whether websites are valid or corrupt. It opted, instead, for GPU-powered AI to make web surfing safer by identifying threats and objectionable content much faster than traditional offerings.

“We figured that if we built a whole new DNS from the ground up, built on artificial intelligence and machine learning, we could find threats faster and more effectively,” said Rustin Banks, chief revenue officer and one of four principals at DNSFilter.

Spinning Up Phishing Protection

DNS, or domain name system, is the naming system for computers, phones and services that connect to the internet. DNSFilter’s aim is to protect these assets from malicious websites and attacks.

The company’s algorithm takes seconds to compare websites to a machine learning model generated from 30,000 known phishing sites. To date, its AI prevents over 90 percent of new requests to visit potentially corrupt sites.

It’s this speed that largely separates DNSFilter from the rest of the industry, Banks said. It gets results in near real time, while competitors typically take around 24 hours.

The company’s algorithm has been built and trained in the cloud using NVIDIA P4 GPU clusters.

“NVIDIA GPUs allow us to rapidly train AI, while being able to use cutting-edge frameworks. It’s not a job I would want to do without them,” said Adam Spotton, chief data scientist at DNSFilter.

Inferencing occurs at 48 locations worldwide, hosted by 10 vendors who’ve passed DNSFilter’s rigorous security standards.

Banks said the company’s rivals primarily use a company in the Philippines that has a team of 150 people classifying sites all day. But for DNSFilter, the more corrupt sites it identifies, the faster and more accurate its algorithm becomes. (Disclosure: NVIDIA is one of the company’s biggest customers.)

Moreover, DNSFilter’s solution works at the network level so there’s no plug-in necessary and the solution works with any email client, protecting organizations regardless of where employees are or what device they’re using.

“If the CFO uses his Yahoo mail on his mobile device, it doesn’t matter,” said Banks. “It’s built right into the fabric of the internet request.”

Upping the Ante

Banks estimates that DNS filtering represents a billion-dollar market, and he’s confident that the $10 billion firewall market is in play for DNSFilter.

Already, the startup is fielding more than a billion DNS requests a day. Banks foresees that number rising to 10 billion by the end of 2020. He also expects accuracy will come to exceed 99 percent as the dataset of corrupt sites grows.

The company isn’t stopping there. More services are planned, including a log -analysis product currently in beta. It scans logos on sites linked from phishing emails and compares them against a database of approved sites to determine whether the logo is real. It then blocks phishing sites in real time.

Eventually, Banks said, the company intends to evolve from its current machine learning feedback loop to a neural network with sufficient cognition to identify things that its algorithms can’t find.

This, he said, would be like having an extra pair of eyes inside an organization’s security team, constantly monitoring suspicious web surfing wherever employees may be working.

“This is taking phishing protection to a new level,” said Banks. “It’s like network-level protection that comes with you wherever you go.”

The post DC Startup Casts an AI Net to Stop Phishing and Malware appeared first on The Official NVIDIA Blog.

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT