Learn About Our Meetup

4500+ Members

Category: Global

Lessons Learned from Developing ML for Healthcare

Machine learning (ML) methods are not new in medicine — traditional techniques, such as decision trees and logistic regression, were commonly used to derive established clinical decision rules (for example, the TIMI Risk Score for estimating patient risk after a coronary event). In recent years, however, there has been a tremendous surge in leveraging ML for a variety of medical applications, such as predicting adverse events from complex medical records, and improving the accuracy of genomic sequencing. In addition to detecting known diseases, ML models can tease out previously unknown signals, such as cardiovascular risk factors and refractive error from retinal fundus photographs.

Beyond developing these models, it’s important to understand how they can be incorporated into medical workflows. Previous research indicates that doctors assisted by ML models can be more accurate than either doctors or models alone in grading diabetic eye disease and diagnosing metastatic breast cancer. Similarly, doctors are able to leverage ML-based tools in an interactive fashion to search for similar medical images, providing further evidence that doctors can work effectively with ML-based assistive tools.

In an effort to improve guidance for research at the intersection of ML and healthcare, we have written a pair of articles, published in Nature Materials and the Journal of the American Medical Association (JAMA). The first is for ML practitioners to better understand how to develop ML solutions for healthcare, and the other is for doctors who desire a better understanding of whether ML could help improve their clinical work.

How to Develop Machine Learning Models for Healthcare
In “How to develop machine learning models for healthcare” (pdf), published in Nature Materials, we discuss the importance of ensuring that the needs specific to the healthcare environment inform the development of ML models for that setting. This should be done throughout the process of developing technologies for healthcare applications, from problem selection, data collection and ML model development to validation and assessment, deployment and monitoring.

The first consideration is how to identify a healthcare problem for which there is both an urgent clinical need and for which predictions based on ML models will provide actionable insight. For example, ML for detecting diabetic eye disease can help alleviate the screening workload in parts of the world where diabetes is prevalent and the number of medical specialists is insufficient. Once the problem has been identified, one must be careful with data curation to ensure that the ground truth labels, or “reference standard”, applied to the data are reliable and accurate. This can be accomplished by validating labels via comparison to expert interpretation of the same data, such as retinal fundus photographs, or through an orthogonal procedure, such as a biopsy to confirm radiologic findings. This is particularly important since a high-quality reference standard is essential both for training useful models and for accurately measuring model performance. Therefore, it is critical that ML practitioners work closely with clinical experts to ensure the rigor of the reference standard used for training and evaluation.

Validation of model performance is also substantially different in healthcare, because the problem of distributional shift can be pronounced. In contrast to typical ML studies where a single random test split is common, the medical field values validation using multiple independent evaluation datasets, each with different patient populations that may exhibit differences in demographics or disease subtypes. Because the specifics depend on the problem, ML practitioners should work closely with clinical experts to design the study, with particular care in ensuring that the model validation and performance metrics are appropriate for the clinical setting.

Integration of the resulting assistive tools also requires thoughtful design to ensure seamless workflow integration, with consideration for measurement of the impact of these tools on diagnostic accuracy and workflow efficiency. Importantly, there is substantial value in prospective study of these tools in real patient care to better understand their real-world impact.

Finally, even after validation and workflow integration, the journey towards deployment is just beginning: regulatory approval and continued monitoring for unexpected error modes or adverse events in real use remains ahead.

Two examples of the translational process of developing, validating, and implementing ML models for healthcare based on our work in detecting diabetic eye disease and metastatic breast cancer.

Empowering Doctors to Better Understand Machine Learning for Healthcare
In “Users’ Guide to the Medical Literature: How to Read Articles that use Machine Learning,” published in JAMA, we summarize key ML concepts to help doctors evaluate ML studies for suitability of inclusion in their workflow. The goal of this article is to demystify ML, to assist doctors who need to use ML systems to understand their basic functionality, when to trust them, and their potential limitations.

The central questions doctors ask when evaluating any study, whether ML or not, remain: Was the reference standard reliable? Was the evaluation unbiased, such as assessing for both false positives and false negatives, and performing a fair comparison with clinicians? Does the evaluation apply to the patient population that I see? How does the ML model help me in taking care of my patients?

In addition to these questions, ML models should also be scrutinized to determine whether the hyperparameters used in their development were tuned on a dataset independent of that used for final model evaluation. This is particularly important, since inappropriate tuning can lead to substantial overestimation of performance, e.g., a sufficiently sophisticated model can be trained to completely memorize the training dataset and generalize poorly to new data. Ensuring that tuning was done appropriately requires being mindful of ambiguities in dataset naming, and in particular, using the terminology with which the audience is most familiar:

The intersection of two fields: ML and healthcare creates ambiguity in the term “validation dataset”. An ML validation set is typically used to refer to the dataset used for hyperparameter tuning, whereas a “clinical” validation set is typically used for final evaluation. To reduce confusion, we have opted to refer to the (ML) validation set as the “tuning” set.

Future outlook
It is an exciting time to work on AI for healthcare. The “bench-to-bedside” path is a long one that requires researchers and experts from multiple disciplines to work together in this translational process. We hope that these two articles will promote mutual understanding of what is important for ML practitioners developing models for healthcare and what is emphasized by doctors evaluating these models, thus driving further collaborations between the fields and towards eventual positive impact on patient care.

Key contributors to these projects include Yun Liu, Po-Hsuan Cameron Chen, Jonathan Krause, and Lily Peng. The authors would like to acknowledge Greg Corrado and Avinash Varadarajan for their advice, and the Google Health team for their support.

Building an AR/AI vehicle manual using Amazon Sumerian and Amazon Lex

Auto manufacturers are continuously adding new controls, interfaces, and intelligence into their vehicles. They publish manuals detailing how to use these functions, but these handbooks are cumbersome. Because they consist of hundreds of pages in several languages, it can be difficult to search for relevant information about specific features. Attempts to replace paper-based manuals with video or mobile apps have not improved the experience. As a result, not all owners know about and take advantage of all the innovations offered by the auto manufacturers.

This post describes how you can use Amazon Sumerian and other AWS services to create an interactive auto manual. This solution uses augmented reality, an AI chatbot, and connected car data provided through AWS IoT. This is not a comprehensive step-by-step tutorial, but it does provide an overview of the logical components.

AWS services

This blog post uses the following six services:

  1. Amazon Sumerian lets you create and run virtual reality (VR), augmented reality (AR), and 3D applications quickly and easily without requiring any specialized programming or 3D graphics expertise. Created 3D scenes can be published with one click and then distributed on the web, in VR headsets and in mobile applications. In this post, Sumerian is used to render a 3D model of both interior and the exterior (optional) of the vehicle and animate it.
  2. Amazon Lex is a service for building conversational interfaces into any application using voice and text. Amazon Lex is powered by the same technology that powers Amazon Alexa. Amazon Lex democratizes deep learning technologies by putting the power of Alexa within reach of all developers. In this post, Amazon Lex is used to recognize voice commands and determine the function or feature being enquired by the owner.
  3. Amazon Polly is a text-to-speech service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. Amazon Polly allows you to create applications that talk and build entirely new categories of speech-enabled products. Amazon Polly supports dozens of voices, across a variety of languages, to enable applications working in different countries. In this post, Amazon Polly is used to vocalize Amazon Lex answers into lifelike speech.
  4. Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. DynamoDB is fully managed, has built-in security, backup and restore, and in-memory caching for internet-scale applications. In this post, you see the use of DynamoDB as a document store of steps for interacting within the interior of the vehicle.
  5. AWS Lambda lets you run code without provisioning or managing servers. In this demo, a Lambda function is used to populate an AWS IoT Core shadow document to contain the required
  6. AWS IoT Core is a managed cloud service that lets connected devices easily and securely interact with cloud applications and other devices. AWS IoT Core enables billions of devices and trillions of messages connect reliably and securely to AWS endpoints and to other devices. AWS IoT Core supports the concept of device shadows that store the latest state of connected devices whether these are online or not. In this post, a device shadow document is used to exchange information between Amazon Lex, DynamoDB, Sumerian, and a virtual representation of the car.

The following diagram illustrates the architectural relationships between these services.

The diagram shows AWS services in relation to each other and in relation to the end user and the vehicle. The owner’s journey starts with the mobile application that embeds the Sumerian scene containing the model of the car. The user can then tap the button to activate Amazon Lex and Amazon Polly. Once activated, the user can interact with the application to execute a series of steps to perform.

The content of the manual is stored in DynamoDB. Amazon Lex pulls this information by placing a Lambda call. The Lambda function queries the DynamoDB table and retrieves a JSON structure describing:

  1. the steps, ordered by a time and marked with start and end, to signal when the control should eventually be highlighted. For example,  …{“LeftTemperatureDial”: {“start”: 0, “end”: 2 }}…
  2. the prompt that needs to be announced while steps are shown in the Sumerian model. For example, “Press down left temperature dial for 2 seconds.”

This JSON document is then passed onto AWS IoT Core device shadow document. Sumerian then periodically polls for state change of the document and makes Sumerian model reflect the steps by highlighting interface controls accordingly.

For a better visual and aural representation, see the AWS Auto Demo video.

How to build this demo

Follow these steps and build the demo:

  1. Create a basic scene.
  2. Label the control elements.
  3. Create the DynamoDB table.
  4. Create the Amazon Lex bot.
  5. Use the Lambda function.
  6. Create a state machine in Sumerian.
  7. Position the AR camera in the scene.
  8. Publish the scene.
  9. Link to the Amazon Lex bot.
  10. Deploy the application.

Step 1: Create a basic scene

Create a basic scene, with entities and AWS configuration.

  1. Using the Augmented Reality template, create a scene and import the 3D asset of the commercially available car. This model is sourced from the 3D model marketplace but can be imported from free 3D galleries or from 3D design software in any of the supported formats.
  2. Create an Amazon Cognito identity pool, allowing Sumerian to use both Amazon Lex and AWS IoT Core. This identity pool should have the appropriate policies to access AWS IoT, Amazon Lex, and Amazon Polly. For more information, see Amazon Cognito Setup Using AWS CloudFormation.
  3. Provide the created identity pool ID to the AWS Configuration component in the Sumerian scene and enable the check box on the AWS IoT Data Client.

Step 2: Label the control elements

Create 3D labels or entities covering most of the control elements (dial, button, flap, display, sign, etc.) that are present in the interior. I colored these markers red and made them semitransparent, so that they still allow the view of the actual control underneath. I named these entities to more easily identify them in my scripts. I also hid them, to mimic the initial state, where only the actual interior is visible, as seen in the following screenshot.

Step 3: Create the DynamoDB table

Create a table in DynamoDB and populate it with several vehicle functions and appropriate steps for enabling, disabling, setting, or unsetting that function. These instructions contain start/end times and durations for each child model entity that must appear, honoring the order in which you want to show them, as shown in the following screenshot.

Step 4: Create the Amazon Lex bot

Create the Amazon Lex bot and populate it with intents and utterances. You are enabling Amazon Lex to understand owners’ questions. Amazon Lex determines which function the owner is asking about and sends this information into the Lambda function.

As seen in the two screenshots above, you are creating an intent called airconditioningManual. This intent then contains several sample utterances containing three custom slots:

  • {option} to describe the activity needed to perform, examples include “turn on”, “increase”, “remove” and others
  • {action} to describe the function, such as “temperature”, “fan speed” and others
  • {conjunction} to allow for optional conjunctions, like “with”, “on”, “of”, etc.

You can add more intents for other interactions or other parts of the vehicle.

Step 5: Use the Lambda function

The Lambda function contains code that performs the following steps.

  1. It queries the DynamoDB table to obtain a document of ordered instructions including start times, end times, and durations of the control elements (dial, button, flap, display, sign, etc.) being visible or highlighted.
    response = dynamo_client.get_item(
                                'action_name': {
                                    'S': toget

  2. It converts and stores this set of instructions into AWS IoT Core, via a device shadow document.
     action = iot_client.update_thing_shadow(
                                "desired": {
                                    "steps": actionList

  3. It returns a response object to Amazon Lex, fulfilling the request from the owner of the manual. This response object contains instructions to be performed, wrapped in the sentence, which is played back.
    rtrn = {
            "dialogAction": {
                "type": "Close",
                "fulfillmentState": "Fulfilled",
                "message": {
                    "contentType": "PlainText",
                    "content": rtrnmessage

Step 6: Create a state machine in Sumerian

Create a state machine in Sumerian using these steps.

  1. This state machine is continuously listening to changes that happen on device shadow document. There are three states in the state machine, as shown in the following diagram:
    1. loadSDK, which loads the AWS SDK
    2. getShadow (see the following step)
    3. A waiting state that calls the getShadow state in a looping routine.

    To learn more about state machines in Sumerian, see State Machine Basics. These changes are executed on the model, according to instructions provided by the IoT shadow, showing marking elements according to start/end time and the duration specified. The device shadow then gets reset.

  2. The getShadow state in the state machine in the preceding step is executing the script to retrieve the IoT device shadow, performing the actual animation of individual layers. To learn more about scripting and retrieving IoT device shadows, see IoT Thing, Shadow, and Script Actions. The example snippets of the script-performing steps (showing the highlight entity→waiting→hiding the highlight entity) follow:
    function showControl(control, ctx, controlName) {
            var myWorld =
            var controlEnt = myWorld.getEntityByName(controlName)
            }, (control.end-control.start)*1000);
        }, control.start*1000);

Step 7: Position the AR camera in the scene

Position the AR camera entity into the scene facing the dashboard of the vehicle. I also scale the car accordingly, so the user of the mobile application and vehicle owner can see the relative size of control elements (dial, button, flap, display, sign, etc.) compared to the reality of the physical vehicle.

Step 8: Publish the scene

Publish the scene and embed the URL into an example iOS/Android placeholder application available on GitHub. These applications are open source and available for both iOS and Android.

private let sceneURL = URL(string: "")!

Step 9: Link to the Amazon Lex bot

Last but not the least, I add an Amazon Lex button from another example project on GitHub and link it with the published Amazon Lex bot from Step 4.

func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
        let credentialProvider = AWSCognitoCredentialsProvider(regionType: AWSRegionType.USEast1, identityPoolId: "us-east-1:STUVWXYZ-0000-1111-2222-LKJIHGFEDCBA")
        let configuration = AWSServiceConfiguration(region: AWSRegionType.USEast1, credentialsProvider: credentialProvider)
        AWSServiceManager.default().defaultServiceConfiguration = configuration
        let chatConfig = AWSLexInteractionKitConfig.defaultInteractionKitConfig(withBotName: "XXXAWSYYY", botAlias: "$LATEST")
        chatConfig.autoPlayback = true
        AWSLexInteractionKit.register(with: configuration!, interactionKitConfiguration: chatConfig, forKey: "AWSLexVoiceButton")
        AWSLexInteractionKit.register(with: configuration!, interactionKitConfiguration: chatConfig, forKey: "chatConfig")
        return true

Step 10: Deploy the application

The final step is to deploy the application onto the iOS-enabled device and test the functionality. The demo video can be seen in the AWS services section of this post.


This is not meant to be a comprehensive guide to every single component plugged in to the manual, but it describes all logical components. Based on this post, you should feel confident enabling and deploying 3D models of any assets that need an interactive manual with both visual and aural feedback into the cloud.

Your solution can use Sumerian and other AI, compute, or storage services. You now understand how these services integrate, what role they play in the experience and how they can be extended beyond the scope of this use case.

Start by reviewing the steps above, subscribe to the Amazon Sumerian video channel, read more about integrations with Amazon Lex and Amazon Polly and IoT Shadow, and get building!

About the Author

Miro Masat is a Solutions Architect at Amazon Web Services, based out of London, UK. He is focusing on Engineering accounts, mainly in the automotive industry. Miro is a massive fan of Virtual, Augmented and Mixed reality and always seeks ways to bring engineering to VR/AR/MR and vice versa. Outside of work, he enjoys traveling, learning languages and building DIY projects.




How American Express Uses Deep Learning for Better Decision Making

Financial fraud is on the rise. As the number of global transactions increase and digital technology advances, the complexity and frequency of fraudulent schemes are keeping pace.

Security company McAfee estimated in a 2018 report that cybercrime annually costs the global economy some $600 billion, or 0.8 percent of global gross domestic product.

One of the most prevalent — and preventable — types of cybercrime is credit card fraud, which is exacerbated by the growth in online transactions.

That’s why American Express, a global financial services company, is developing deep learning generative and sequential models to prevent fraudulent transactions.

“The most strategically important use case for us is transactional fraud detection,” said Dmitry Efimov, vice president of machine learning research at American Express. “Developing techniques that more accurately identify and decline fraudulent purchase attempts helps us protect our customers and our merchants.”

Cashing into Big Data

The company’s effort spanned several teams that conducted research on using generative adversarial networks, or GANs, to create synthetic data based on sparsely populated segments.

In most financial fraud use cases, machine learning systems are built on historical transactional data. The systems use deep learning models to scan incoming payments in real time, identify patterns associated with fraudulent transactions and then flag anomalies.

In some instances, like new product launches, GANs can produce additional data to help train and develop more accurate deep learning models.

Given its global integrated network with tens of millions of customers and merchants, American Express deals with massive volumes of structured and unstructured data sets.

Using several hundred data features, including the time stamps for transactional data, the American Express teams found that sequential deep learning techniques, such as long short-term memory and temporal convolutional networks, can be adapted for transaction data to produce superior results compared to classical machine learning approaches.

The results have paid dividends.

“These techniques have a substantial impact on the customer experience, allowing American Express to improve speed of detection and prevent losses by automating the decision-making process,” Efimov said.

Closing the Deal with NVIDIA GPUs 

Due to the huge amount of customer and merchant data American Express works with, they selected NVIDIA DGX-1 systems, which contain eight NVIDIA V100 Tensor Core GPUs, to build models with both TensorFlow and PyTorch software.

Its NVIDIA GPU-powered machine learning techniques are also used to forecast customer default rates and to assign credit limits.

“For our production environment, speed is extremely important with decisions made in a matter of milliseconds, so the best solution to use are NVIDIA GPUs,” said Efimov.

As the systems go into production in the next year, the teams plan on using the NVIDIA TensorRT platform for high-performance deep learning inference to deploy the models in real time, which will help improve American Express’ fraud and credit loss rates.

Efimov will be presenting his team’s work at the GPU Technology Conference in San Jose in March. To learn more about credit risk management use cases from American Express, register for GTC, the premier AI conference for insights, training and direct access to experts on the key topics in computing across industries.

The post How American Express Uses Deep Learning for Better Decision Making appeared first on The Official NVIDIA Blog.

Google at NeurIPS 2019

This week, Vancouver hosts the 33rd annual Conference on Neural Information Processing Systems (NeurIPS 2019), the biggest machine learning conference of the year. The conference includes invited talks, demonstrations and presentations of some of the latest in machine learning research. As a Diamond Sponsor of NeurIPS 2019, Google will have a strong presence at NeurIPS 2019 with more than 500 Googlers attending in order to contribute to, and learn from, the broader academic research community via talks, posters, workshops, competitions and tutorials. We will be presenting work that pushes the boundaries of what is possible in language understanding, translation, speech recognition and visual & audio perception, with Googlers co-authoring more than 130 accepted papers.

If you are attending NeurIPS 2019, we hope you’ll stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving the world’s most challenging research problems, and to see demonstrations of some of the exciting research we pursue, such as ML-based Flood Forecasting, AI for Social Good, Google Research Football, Google Dataset Search, TF-Agents and much more. You can also learn more about our work being presented in the list below (Google affiliations highlighted in blue).

NeurIPS Foundation Board
Samy Bengio, Corinna Cortes

NeurIPS Advisory Board
John C. Platt, Fernando Pereira, Dale Schuurmans

NeurIPS Program Committee
Program Chair: Hugo Larochelle
Diversity & Inclusion Co-Chair: Katherine Heller
Meetup Chair: Nicolas La Roux
Party Co-Chair: Pablo Samuel Castro

Senior Area Chairs include: Amir Globerson, Claudio Gentile, Cordelia Schmid, Corinna Cortes, Dale Schuurmans, Elad Hazan, Honglak Lee, Mehryar Mohri, Peter Bartlett, Satyen Kale, Sergey Levine, Surya Ganguli

Area Chairs include: Afshin Rostamizadeh, Alex Kulesza, Amin Karbasi, Andrew Dai, Been Kim, Boqing Gong, Brainslav Kveton, Ce Liu, Charles Sutton, Chelsea Finn, Cho-Jui Hsieh, D Sculley, Danny Tarlow, David Held, Denny Zhou, Yann Dauphin, Dustin Tran, Hartmut Neven, Hossein Mobahi, Ilya Tolstikhin, Jasper Snoek, Jean-Philippe Vert, Jeffrey Pennington, Kevin Swersky, Kun Zhang, Kunal Talwar, Lihong Li, Manzil Zaheer, Marc G Bellemare, Marco Cuturi, Maya Gupta, Meg Mitchell, Minmin Chen, Mohammad Norouzi, Moustapha Cisse, Olivier Bachem, Qiang Liu, Rong Ge, Sanjiv Kumar, Sanmi Koyejo, Sebastian Nowozin, Sergei Vassilvitskii, Shivani Agarwal, Slav Petrov, Srinadh Bhojanapalli, Stephen Bach, Timnit Gebru, Tomer Koren, Vitaly Feldman, William Cohen, Yann Dauphin, Nicolas La Roux

NeurIPS Workshops Program Committee
Yann Dauphin, Honglak Lee, Sebastian Nowozin, Fernanda Viegas

NeurIPS Invited Talk
Social Intelligence
Blaise Aguera y Arcas

Accepted Papers
Memory Efficient Adaptive Optimization
Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer

Stand-Alone Self-Attention in Vision Models
Niki Parmar, Prajit Ramachandran, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jon Shlens

High Fidelity Video Prediction with Large Neural Nets
Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee

Unsupervised Learning of Object Structure and Dynamics from Videos
Matthias Minderer, Chen Sun, Ruben Villegas, Forrester Cole, Kevin Murphy, Honglak Lee

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, Hyouk Joong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, Zhifeng Chen

Quadratic Video Interpolation
Xiangyu Xu, Li Si-Yao, Wenxiu Sun, Qian Yin, Ming-Hsuan Yang

Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function
Aviv Rosenberg, Yishay Mansour

Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits
Yogev Bar-On, Yishay Mansour

Learning to Screen
Alon Cohen, Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Shay Moran

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li

A Kernel Loss for Solving the Bellman Equation
Yihao Feng, Lihong Li, Qiang Liu

Accurate Uncertainty Estimation and Decomposition in Ensemble Learning
Jeremiah Liu, John Paisley, Marithani-Anna Kioumourtzoglou, Brent Coull

Saccader: Improving Accuracy of Hard Attention Models for Vision
Gamaleldin F. Elsayed, Simon Kornblith, Quoc V. Le

Invertible Convolutional Flow
Mahdi Karami, Dale Schuurmans, Jascha Sohl-Dickstein, Laurent Dinh, Daniel Duckworth

Hypothesis Set Stability and Generalization
Dylan J. Foster, Spencer Greenberg, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan

Bandits with Feedback Graphs and Switching Costs
Raman Arora, Teodor V. Marinov, Mehryar Mohri

Regularized Gradient Boosting
Corinna Cortes, Mehryar Mohri, Dmitry Storcheus

Logarithmic Regret for Online Control
Naman Agarwal, Elad Hazan, Karan Singh

Sampled Softmax with Random Fourier Features
Ankit Singh Rawat, Jiecao Chen, Felix Yu, Ananda Theertha Suresh, Sanjiv Kumar

Multilabel Reductions: What is My Loss Optimising?
Aditya Krishna Menon, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar

MetaInit: Initializing Learning by Learning to Initialize
Yann N. Dauphin, Sam Schoenholz

Generalization Bounds for Neural Networks via Approximate Description Length
Amit Daniely, Elad Granot

Variance Reduction of Bipartite Experiments through Correlation Clustering
Jean Pouget-Abadie, Kevin Aydin, Warren Schudy, Kay Brodersen, Vahab Mirrokni

Likelihood Ratios for Out-of-Distribution Detection
Jie Ren, Peter J. Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark A. DePristo, Joshua V. Dillon, Balaji Lakshminarayanan

Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
Yaniv Ovadia, Emily Fertig, Jie Jessie Ren, D. Sculley, Josh Dillon, Sebastian Nowozin, Zack Nado, Balaji Lakshminarayanan, Jasper Snoek

Surrogate Objectives for Batch Policy Optimization in One-step Decision Making
Minmin Chen, Ramki Gummadi, Chris Harris, Dale Schuurmans

Globally Optimal Learning for Structured Elliptical Losses
Yoav Wald, Nofar Noy, Gal Elidan, Ami Wiesel

DPPNet: Approximating Determinantal Point Processes with Deep Networks
Zelda Mariet, Yaniv Ovadia, Jasper Snoek

Graph Normalizing Flows
Jenny Liu, Aviral Kumar, Jimmy Ba, Jamie Kiros, Kevin Swersky

When Does Label Smoothing Help?
Rafael Muller, Simon Kornblith, Geoff Hinton

On the Role of Inductive Bias From Simulation and the Transfer to the Real World: a new Disentanglement Dataset
Muhammad Waleed Gondal, Manuel Wüthrich, Đorđe Miladinović, Francesco Locatello, Martin Breidt, Valentin Volchkov, Joel Akpo, Olivier Bachem, Bernhard Schölkopf, Stefan Bauer

On the Fairness of Disentangled Representations
Francesco Locatello, Gabriele Abbati, Tom Rainforth, Stefan Bauer, Bernhard Schölkopf, Olivier Bachem

Are Disentangled Representations Helpful for Abstract Visual Reasoning?
Sjoerd van Steenkiste, Francesco Locatello, Jürgen Schmidhuber, Olivier Bachem

Don’t Blame the ELBO! A Linear VAE Perspective on Posterior Collapse
James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi

Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
Aviral Kumar, Justin Fu, Matthew Soh, George Tucker, Sergey Levine

Optimizing Generalized Rate Metrics with Game Equilibrium
Harikrishna Narasimhan, Andrew Cotter, Maya Gupta

On Making Stochastic Classifiers Deterministic
Andrew Cotter, Harikrishna Narasimhan, Maya Gupta

Discrete Flows: Invertible Generative Models of Discrete Data
Dustin Tran, Keyon Vafa, Kumar Agrawal, Laurent Dinh, Ben Poole

Graph Agreement Models for Semi-Supervised Learning
Otilia Stretcu, Krishnamurthy Viswanathan, Dana Movshovitz-Attias, Emmanouil Platanios, Andrew Tomkins, Sujith Ravi

A Robust Non-Clairvoyant Dynamic Mechanism for Contextual Auctions
Yuan Deng, Sébastien Lahaie, Vahab Mirrokni

Adversarial Robustness through Local Linearization
Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy (Dj) Dvijotham, Alhusein Fawzi, Soham De, Robert Stanforth, Pushmeet Kohli

A Geometric Perspective on Optimal Representations for Reinforcement Learning
Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle

Online Learning via the Differential Privacy Lens
Jacob Abernethy, Young Hun Jung, Chansoo Lee, Audra McMillan, Ambuj Tewari

Reducing the Variance in Online Optimization by Transporting Past Gradients
Sébastien M. R. Arnold, Pierre-Antoine Manzagol, Reza Babanezhad, Ioannis Mitliagkas, Nicolas Le Roux

Universality and Individuality in Neural Dynamics Across Large Populations of Recurrent Networks
Niru Maheswaranathan, Alex Williams, Matt Golub, Surya Ganguli, David Sussillo

Reverse Engineering Recurrent Networks for Sentiment Classification Reveals Line Attractor Dynamics
Niru Maheswaranathan, Alex H. Williams, Matthew D. Golub, Surya Ganguli, David Sussillo

Strategizing Against No-Regret Learners
Yuan Deng, Jon Schneider, Balasubramanian Sivan

Prior-Free Dynamic Auctions with Low Regret Buyers
Yuan Deng, Jon Schneider, Balasubramanian Sivan

Private Stochastic Convex Optimization with Optimal Rates
Raef Bassily, Vitaly Feldman, Kunal Talwar, Abhradeep Thakurta

Computational Separations between Sampling and Optimization
Kunal Talwar

Momentum-Based Variance Reduction in Non-Convex SGD
Ashok Cutkosky and Francesco Orabona

Kernel Truncated Randomized Ridge Regression: Optimal Rates and Low Noise Acceleration
Kwang-Sung Jun, Ashok Cutkosky, Francesco Orabona

Fast and Flexible Multi-Task Classification using Conditional Neural Adaptive Processes
James Requeima, Jonathan Gordon, John Bronskill, Sebastian Nowozin, Richard E. Turner

Icebreaker: Element-wise Active Information Acquisition with Bayesian Deep Latent Gaussian Model
Wenbo Gong, Sebastian Tschiatschek, Richard E. Turner, Sebastian Nowozin, Jose Miguel Hernandez-Lobato, Cheng Zhang

Multiview Aggregation for Learning Category-Specific Shape Reconstruction
Srinath Sridhar, Davis Rempe, Julien Valentin, Sofien Bouaziz, Leonidas J. Guibas

Visualizing and Measuring the Geometry of BERT
Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce, Fernanda Viégas, Martin Wattenberg

Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond
Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab S. Mirrokni

A Benchmark for Interpretability Methods in Deep Neural Networks
Sara Hooker, Dumitru Erhan, Pieter-jan Kindermans, Been Kim

Practical and Consistent Estimation of f-Divergences
Paul Rubenstein, Olivier Bousquet, Josip Djolonga, Carlos Riquelme, Ilya Tolstikhin

Tree-Sliced Variants of Wasserstein Distances
Tam Le, Makoto Yamada, Kenji Fukumizu, Marco Cuturi

Game Design for Eliciting Distinguishable Behavior
Fan Yang, Liu Leqi, Yifan Wu, Zachary Lipton, Pradeep Ravikumar, Tom M Mitchell, William Cohen

Differentially Private Anonymized Histograms
Ananda Theertha Suresh

Locally Private Gaussian Estimation
Matthew Joseph, Janardhan Kulkarni, Jieming Mao, Zhiwei Steven Wu

Exponential Family Estimation via Adversarial Dynamics Embedding
Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans

Learning to Predict Without Looking Ahead: World Models Without Forward Prediction
C. Daniel Freeman, Luke Metz, David Ha

Adaptive Density Estimation for Generative Models
Thomas Lucas, Konstantin Shmelkov, Karteek Alahari, Cordelia Schmid, Jakob Verbeek

Weight Agnostic Neural Networks
Adam Gaier, David Ha

Retrosynthesis Prediction with Conditional Graph Logic Network
Hanjun Dai, Chengtao Li, Connor Coley, Bo Dai, Le Song

Large Scale Structure of Neural Network Loss Landscapes
Stanislav Fort, Stainslaw Jastrzebski

Off-Policy Evaluation via Off-Policy Classification
Alex Irpan, Kanishka Rao, Konstantinos Bousmalis, Chris Harris, Julian Ibarz, Sergey Levine

Domes to Drones: Self-Supervised Active Triangulation for 3D Human Pose Reconstruction
Aleksis Pirinen, Erik Gartner, Cristian Sminchisescu

Energy-Inspired Models: Learning with Sampler-Induced Distributions
Dieterich Lawson, George TuckerBo Dai, Rajesh Ranganath

From Deep Learning to Mechanistic Understanding in Neuroscience: The Structure of Retinal Prediction
Hidenori Tanaka, Aran Nayebi, Niru Maheswaranathan, Lane McIntosh, Stephen Baccus, Surya Ganguli

Language as an Abstraction for Hierarchical Deep Reinforcement Learning
Yiding Jiang, Shixiang Gu, Kevin Murphy, Chelsea Finn

Bayesian Layers: A Module for Neural Network Uncertainty
Dustin Tran, Michael W. Dusenberry, Mark van der Wilk, Danijar Hafner

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates
Hugo Penedones, Carlos RiquelmeDamien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu

A Unified Framework for Data Poisoning Attack to Graph-based Semi-Supervised Learning
Xuanqing Liu, Si Si, Xiaojin Zhu, Yang Li, Cho-Jui Hsieh

MixMatch: A Holistic Approach to Semi-Supervised Learning
David Berthelot, Nicholas Carlini, Ian Goodfellow (work done while at Google), Avital Oliver, Nicolas Papernot, Colin Raffel

SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies
Seyed Kamyar Seyed Ghasemipour, Shixiang (Shane) Gu, Richard Zemel

Limits of Private Learning with Access to Public Data
Noga Alon, Raef Bassily, Shay Moran

Regularized Weighted Low Rank Approximation
Frank Ban, David Woodruff, Richard Zhang

Unsupervised Curricula for Visual Meta-Reinforcement Learning
Allan Jabri, Kyle Hsu, Abhishek Gupta, Benjamin Eysenbach, Sergey Levine, Chelsea Finn

Secretary Ranking with Minimal Inversions
Sepehr Assadi, Eric Balkanski, Renato Paes Leme

Mixtape: Breaking the Softmax Bottleneck Efficiently
Zhilin Yang, Thang Luong, Russ Salakhutdinov, Quoc V. Le

Budgeted Reinforcement Learning in Continuous State Space
Nicolas Carrara, Edouard Leurent, Romain Laroche, Tanguy Urvoy, Odalric-Ambrym Maillard, Olivier Pietquin

From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization
Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang

Generalization Bounds for Neural Networks via Approximate Description Length
Amit Daniely, Elad Granot

Flattening a Hierarchical Clustering through Active Learning
Fabio Vitale, Anand Rajagopalan, Claudio Gentile

Robust Attribution Regularization
Jiefeng Chen, Xi Wu, Vaibhav Rastogi, Yingyu Liang, Somesh Jha

Robustness Verification of Tree-based Models
Hongge Chen, Huan Zhang, Si Si, Yang Li, Duane Boning, Cho-Jui Hsieh

Meta Architecture Search
Albert Shaw, Wei Wei, Weiyang Liu, Le Song, Bo Dai

Contextual Bandits with Cross-Learning
Santiago Balseiro, Negin Golrezaei, Mohammad Mahdian, Vahab Mirrokni, Jon Schneider

Dynamic Incentive-Aware Learning: Robust Pricing in Contextual Auctions
Negin Golrezaei, Adel Javanmard, Vahab Mirrokni

Optimizing Generalized Rate Metrics with Three Players
Harikrishna Narasimhan, Andrew Cotter, Maya Gupta

Noise-Tolerant Fair Classification
Alexandre Louis Lamy, Ziyuan Zhong, Aditya Krishna Menon, Nakul Verma

Towards Automatic Concept-based Explanations
Amirata Ghorbani, James Wexler, James Zou, Been Kim

Locally Private Learning without Interaction Requires Separation
Amit Daniely, Vitaly Feldman

Learning GANs and Ensembles Using Discrepancy
Ben Adlam, Corinna Cortes, Mehryar Mohri, Ningshan Zhang

CondConv: Conditionally Parameterized Convolutions for Efficient Inference
Brandon Yang, Gabriel Bender, Quoc V. Le, Jiquan Ngiam

A Fourier Perspective on Model Robustness in Computer Vision
Dong Yin, Raphael Gontijo Lopes, Jonathon Shlens, Ekin D. Cubuk, Justin Gilmer

Robust Bi-Tempered Logistic Loss Based on Bregman Divergences
Ehsan Amid, Manfred K. Warmuth, Rohan Anil, Tomer Koren

When Does Label Smoothing Help?
Rafael Müller, Simon Kornblith, Geoffrey Hinton

Memory Efficient Adaptive Optimization
Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer

Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model
Guodong Zhang, Lala Li, Zachary Nado, James Martens, Sushant Sachdeva, George E. Dahl, Christopher J. Shallue, Roger Grosse

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Jaehoon Lee, Lechao Xiao, Samuel S. Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, Jeffrey Pennington

Universality and Individuality in Neural Dynamics Across Large Populations of Recurrent Networks
Niru Maheswaranathan, Alex H. Williams, Matthew D. Golub, Surya Ganguli, David Sussillo

Abstract Reasoning with Distracting Features
Kecheng Zheng, Zheng-Jun Zha, Wei Wei

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Differentiable Ranking and Sorting Using Optimal Transport
Marco Cuturi, Olivier Teboul, Jean-Philippe Vert

XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le

Private Learning Implies Online Learning: An Efficient Reduction
Alon Gonen, Elad Hazan, Shay Moran

Evaluating Protein Transfer Learning with TAPE
Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Peter Chen, John Canny, Pieter Abbeel, Yun Song

Tight Dimensionality Reduction for Sketching Low Degree Polynomial Kernels
Michela Meister, Tamas Sarlos, David P. Woodruff

No Pressure! Addressing the Problem of Local Minima in Manifold Learning Algorithms
Max Vladymyrov

Subspace Detours: Building Transport Plans that are Optimal on Subspace Projections
Boris Muzellec, Marco Cuturi

Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function
Aviv Rosenberg, Yishay Mansour

Private Learning Implies Online Learning: An Efficient Reduction
Alon Gonen, Elad Hazan, Shay Moran

On the Fairness of Disentangled Representations
Francesco Locatello, Gabriele Abbati, Tom Rainforth, Stefan Bauer, Bernhard Schölkopf, Olivier Bachem

On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset
Muhammad Waleed Gondal, Manuel Wüthrich, Ðorde Miladinovíc, Francesco Locatello, Martin Breidt, Valentin Volchkov, Joel Akpo, Olivier Bachem, Bernhard Schölkopf, Stefan Bauer

Stacked Capsule Autoencoders
Adam R. Kosiorek, Sara Sabour, Yee Whye Teh, Geoffrey E. Hinton

Wasserstein Dependency Measure for Representation Learning
Sherjil Ozair, Corey Lynch, Yoshua Bengio, Aaron van den Oord, Sergey Levine, Pierre Sermanet

Sampling Sketches for Concave Sublinear Functions of Frequencies
Edith Cohen, Ofir Geri

Hamiltonian Neural Networks
Sam Greydanus, Misko Dzamba, Jason Yosinski

Evaluating Protein Transfer Learning with TAPE
Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John Canny, Pieter Abbeel, Yun S. Song

Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization
Miika Aittala, Prafull Sharma, Lukas Murmann, Adam B. Yedidia, Gregory W. Wornell, William T. Freeman, Frédo Durand

Quadratic Video Interpolation
Xiangyu Xu, Li Siyao, Wenxiu Sun, Qian Yin, Ming-Hsuan Yang

Transfusion: Understanding Transfer Learning for Medical Imagings
Maithra Raghu, Chiyuan Zhang, Jon Kleinberg, Samy Bengio

XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le

Differentially Private Covariance Estimation
Kareem Amin, Travis Dick, Alex Kulesza, Andres Munoz, Sergei Vassilvitskii

Private Stochastic Convex Optimization with Optimal Rates
Raef Bassily, Vitaly Feldman, Kunal Talwar, Abhradeep Thakurta

Learning Transferable Graph Exploration
Hanjun Dai, Yujia Li, Chenglong Wang, Rishabh Singh, Po-Sen Huang, Pushmeet Kohli

Neural Attribution for Semantic Bug-Localization in Student Programs
Rahul Gupta, Aditya Kanade, Shirish Shevade

PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces
Chuan Guo, Ali Mousavi, Xiang Wu, Daniel Holtmann-Rice, Satyen Kale, Sashank Reddi, Sanjiv Kumar

Efficient Rematerialization for Deep Networks
Ravi Kumar, Manish Purohit, Zoya Svitkina, Erik Vee, Joshua R. Wang

Momentum-Based Variance Reduction in Non-Convex SGD
Ashok Cutkosky, Francesco Orabona

Kernel Truncated Randomized Ridge Regression: Optimal Rates and Low Noise Acceleration
Kwang-Sung Jun, Ashok Cutkosky, Francesco Orabona

3rd Conversational AI: Today’s Practice and Tomorrow’s Potential
Organizers include: Bill Byrne

AI for Humanitarian Assistance and Disaster Response Workshop
Invited Speakers include: Yossi Matias

Bayesian Deep Learning
Organizers include: Kevin P Murphy

Beyond First Order Methods in Machine Learning Systems
Invited Speakers include: Elad Hazan

Biological and Artificial Reinforcement Learning
Invited Speakers include: Igor Mordatch

Context and Compositionality in Biological and Artificial Neural Systems
Invited Speakers include: Kenton Lee

Deep Reinforcement Learning
Organizers include: Chelsea Finn

Document Intelligence
Organizers include: Tania Bedrax Weiss

Federated Learning for Data Privacy and Confidentiality
Organizers include: Jakub KonečnýBrendan McMahan
Invited Speakers include: Françoise Beaufays, Daniel Ramage

Graph Representation Learning
Organizers include: Rianne van den Berg

Human-Centric Machine Learning
Invited Speakers include: Been Kim

Information Theory and Machine Learning
Organizers include: Ben Poole
Invited Speakers include: Alex Alemi

KR2ML – Knowledge Representation and Reasoning Meets Machine Learning
Invited Speakers include: William Cohen

Learning Meaningful Representations of Life
Organizers include: Jasper Snoek, Alexander Wiltschko

Learning Transferable Skills
Invited Speakers include: David Ha

Machine Learning for Creativity and Design
Organizers include: Adam Roberts, Jesse Engel

Machine Learning for Health (ML4H): What Makes Machine Learning in Medicine Different?
Invited Speakers include: Lily Peng, Alan Karthikesalingam, Dale Webster

Machine Learning and the Physical Sciences
Speakers include: Yasaman Bahri, Samual Schoenholz

ML for Systems
Organizers include: Milad HashemiKevin SwerskyAzalia MirhoseiniAnna Goldie
Invited Speakers include: Jeff Dean

Optimal Transport for Machine Learning
Organizers include: Marco Cuturi

The Optimization Foundations of Reinforcement Learning
Organizers include: Bo DaiNicolas Le RouxLihong LiDale Schuurmans

Privacy in Machine Learning
Invited Speakers include: Brendan McMahan

Program Transformations for ML
Organizers include: Pascal LamblinAlexander WiltschkoBart van MerrienboerEmily Fertig
Invited Speakers include: Skye Wanderman-Milne

Real Neurons & Hidden Units: Future Directions at the Intersection of Neuroscience and Artificial Intelligence
Organizers include: David Sussillo

Robot Learning: Control and Interaction in the Real World
Organizers include: Stefan Schaal

Safety and Robustness in Decision Making
Organizers include: Yinlam Chow

Science Meets Engineering of Deep Learning
Invited Speakers include: Yasaman Bahri, Surya Ganguli‎, Been Kim, Surya Ganguli

Sets and Partitions
Organizers include: Manzil Zaheer, Andrew McCallum
Invited Speakers include: Amr Ahmed

Tackling Climate Change with ML
Organizers include: John Platt
Invited Speakers include: Jeff Dean

Visually Grounded Interaction and Language
Invited Speakers include: Jason Baldridge

Workshop on Machine Learning with Guarantees
Invited Speakers include: Mehryar Mohri

Representation Learning and Fairness
Organizers include: Moustapha Cisse, Sanmi Koyejo

AI Came, AI Saw, AI Conquered: How Vysioneer Improves Precision Radiation Therapy

Of the millions diagnosed with cancer each year, over half receive some form of radiation therapy.

Deep learning is helping radiation oncologists make the process more precise by automatically labeling tumors from medical scans in a process known as contouring.

It’s a delicate balance.

“If oncologists contour too small an area, then radiation doesn’t treat the whole tumor and it could keep growing,” said Jen-Tang Lu, founder and CEO of Vysioneer, a Boston-based startup with an office in Taiwan. “If they contour too much, then radiation can harm the neighboring normal tissues.”

A member of the NVIDIA Inception startup accelerator program, Vysioneer builds AI tools to automate the time-consuming process of tumor contouring. To ensure the efficacy and safety of radiotherapy, radiation oncologists can easily spend hours contouring tumors from medical scans, Lu said.

The company’s first product, VBrain, can identify the three most common types of brain tumors from CT and MRI scans. Trained on NVIDIA V100 Tensor Core GPUs in the cloud and NVIDIA Quadro RTX 8000 GPUs on premises, the tool can speed up the contouring task by more than 6x — from over an hour to less than 10 minutes.

Vysioneer showcased its latest demos in the NVIDIA booth at the annual meeting of the Radiological Society of North America last week in Chicago. It’s one of more than 50 NVIDIA Inception startups that attended the conference.

Targeting Metastatic Brain Tumors

A non-invasive treatment, precision radiation therapy uses a high dosage of X-ray beams to destroy tumors without harming neighboring tissues.

Due to the availability of public datasets, most AI models that identify brain cancer from medical scans focus on gliomas, which are primary tumors — ones that originate in the brain.

VBrain, trained on more than 1,500 proprietary CT and MRI scans, identifies the vastly more common metastatic type of brain tumors, which occur when cancer spreads to the brain from another part of the body. Metastatic brain tumors typically occur in multiple parts of the brain at once, and can be tiny and hard to spot from medical scans.

VBrain integrates seamlessly into radiation oncologists’ existing clinical workflow, processing scans in just seconds using an NVIDIA GPU for inference. The tool could reduce variability among radiation oncologists, Lu says, and can also identify tiny lesions that radiologists or clinicians might miss.

The company has deployed its solution in a clinical trial at National Taiwan University Hospital, running on an on-premises server of NVIDIA GPUs.

In one case at the hospital, a patient had lung cancer that spread to the brain. During diagnosis, the patient’s radiologist identified a single large lesion from the brain scan. But VBrain revealed another two tiny lesions. This additional information led the oncologists to alter the patient’s radiation treatment plan.

Vysioneer is working towards FDA clearance for VBrain and plans to launch contouring AI models for medical images of other parts of the body. The company also plans to make VBrain available on NGC, a container registry that provides startups with streamlined deployment, access to the GPU compute ecosystem and a robust distribution channel.

NVIDIA tests and optimizes healthcare AI applications, like VBrain, to operate with the NVIDIA EGX platform, which enables fleets of devices and multiple physical locations of edge nodes to be remotely managed easily and securely, meeting the needs of data security and real-time intelligence in hospitals.

NVIDIA Inception helps startups during critical stages of product development, prototyping and deployment. Every Inception member receives a custom set of ongoing benefits, such as NVIDIA Deep Learning Institute credits, go-to-market support and hardware technology discounts that enable startups with fundamental tools to help them grow.

Lu says the technical articles, newsletters and better access to GPUs have helped the company — founded just six months ago — to efficiently build out its AI solution.

Lu previously was a member of the MGH & BWH Center for Clinical Data Science, where he led the development of DeepSPINE, an AI system to automate spinal diagnosis, trained on an NVIDIA DGX-1 system.

Main image shows VBrain-generated 3D tumor rendering (left) and tumor contours (right) for radiation treatment planning.

The post AI Came, AI Saw, AI Conquered: How Vysioneer Improves Precision Radiation Therapy appeared first on The Official NVIDIA Blog.

2D or Not 2D: NVIDIA Researchers Bring Images to Life with AI

Close your left eye as you look at this screen. Now close your right eye and open your left — you’ll notice that your field of vision shifts depending on which eye you’re using. That’s because while we see in two dimensions, the images captured by your retinas are combined to provide depth and produce a sense of three-dimensionality.

Machine learning models need this same capability so that they can accurately understand image data. NVIDIA researchers have now made this possible by creating a rendering framework called DIB-R — a differentiable interpolation-based renderer — that produces 3D objects from 2D images.

The researchers will present their model this week at the annual Conference on Neural Information Processing Systems (NeurIPS), in Vancouver.

In traditional computer graphics, a pipeline renders a 3D model to a 2D screen. But there’s information to be gained from doing the opposite — a model that could infer a 3D object from a 2D image would be able to perform better object tracking, for example.

NVIDIA researchers wanted to build an architecture that could do this while integrating seamlessly with machine learning techniques. The result, DIB-R, produces high-fidelity rendering by using an encoder-decoder architecture, a type of neural network that transforms input into a feature map or vector that is used to predict specific information such as shape, color, texture and lighting of an image.

It’s especially useful when it comes to fields like robotics. For an autonomous robot to interact safely and efficiently with its environment, it must be able to sense and understand its surroundings. DIB-R could potentially improve those depth perception capabilities.

It takes two days to train the model on a single NVIDIA V100 GPU, whereas it would take several weeks to train without NVIDIA GPUs. At that point, DIB-R can produce a 3D object from a 2D image in less than 100 milliseconds. It does so by altering a polygon sphere — the traditional template that represents a 3D shape. DIB-R alters it to match the real object shape portrayed in the 2D images.

The team tested DIB-R on four 2D images of birds (far left). The first experiment used a picture of a yellow warbler (top left) and produced a 3D object (top two rows).

NVIDIA researchers trained their model on several datasets, including a collection of bird images. After training, DIB-R could take an image of a bird and produce a 3D portrayal with the proper shape and texture of a 3D bird.

“This is essentially the first time ever that you can take just about any 2D image and predict relevant 3D properties,” says Jun Gao, one of a team of researchers who collaborated on DIB-R.

DIB-R can transform 2D images of long extinct animals like a Tyrannosaurus rex or chubby Dodo bird into a lifelike 3D image in under a second.

Built on PyTorch, a machine learning framework, DIB-R is included as part of Kaolin, NVIDIA’s newest 3D deep learning PyTorch library that accelerates 3D deep learning research.

The entire NVIDIA research paper, “Learning to Predict 3D Objects with an Interpolation-Based Renderer,” can be found here. The NVIDIA Research team consists of more than 200 scientists around the globe, focusing on areas including AI, computer vision, self-driving cars, robotics and graphics.

The post 2D or Not 2D: NVIDIA Researchers Bring Images to Life with AI appeared first on The Official NVIDIA Blog.

Understanding Transfer Learning for Medical Imaging

As deep neural networks are applied to an increasingly diverse set of domains, transfer learning has emerged as a highly popular technique in developing deep learning models. In transfer learning, the neural network is trained in two stages: 1) pretraining, where the network is generally trained on a large-scale benchmark dataset representing a wide diversity of labels/categories (e.g., ImageNet); and 2) fine-tuning, where the pretrained network is further trained on the specific target task of interest, which may have fewer labeled examples than the pretraining dataset. The pretraining step helps the network learn general features that can be reused on the target task.

This kind of two-stage paradigm has become extremely popular in many settings, and particularly so in medical imaging. In the context of transfer learning, standard architectures designed for ImageNet with corresponding pretrained weights are fine-tuned on medical tasks ranging from interpreting chest x-rays and identifying eye diseases, to early detection of Alzheimer’s disease. Despite its widespread use, however, the precise effects of transfer learning are not yet well understood. While recent work challenges many common assumptions, including the effects on performance improvement, contribution of the underlying architecture and impact of pretraining dataset type and size, these results are all in the natural image setting, and leave many questions open for specialized domains, such as medical images.

In our NeurIPS 2019 paper, “Transfusion: Understanding Transfer Learning for Medical Imaging,” we investigate these central questions for transfer learning in medical imaging tasks. Through both a detailed performance evaluation and analysis of neural network hidden representations, we uncover many surprising conclusions, such as the limited benefits of transfer learning for performance on the tested medical imaging tasks, a detailed characterization of how representations evolve through the training process across different models and hidden layers, and feature independent benefits of transfer learning for convergence speed.

Performance Evaluation
We first performed a thorough study on the effect of transfer learning on model performance. We compared models trained from random initialization and applied directly on tasks to those pretrained on ImageNet that leverage transfer learning for the same tasks. We looked at two large scale medical imaging tasks — diagnosing diabetic retinopathy from fundus photographs and identifying five different diseases from chest x-rays. We evaluated various neural network architectures including both standard architectures popularly used for medical imaging (ResNet50, Inception-v3) as well as a family of simple, lightweight convolutional neural networks that consist of four or five layers of the standard convolution-batchnormReLU progression, or CBRs.

The results from evaluating all of these models on the different tasks with and without transfer learning give us four main takeaways:

  • Surprisingly, transfer learning does not significantly affect performance on medical imaging tasks, with models trained from scratch performing nearly as well as standard ImageNet transferred models.
  • On the medical imaging tasks, the much smaller CBR models perform at a level comparable to the standard ImageNet architectures.
  • As the CBR models are much smaller and shallower than the standard ImageNet models, they perform much worse on ImageNet classification, highlighting that ImageNet performance is not indicative of performance on medical tasks.
  • The two medical tasks are much smaller in size than ImageNet (~200k vs ~1.2m training images), but in the very small data regime, there may only be a few thousand training examples. We evaluated transfer learning in this very small data regime, finding that while there was a larger gap in performance between transfer and training from scratch for large models (ResNet) this was not true for smaller models (CBRs), suggesting that the large models designed for ImageNet might be too overparameterized for the very small data regime.

Representation Analysis
We next study the degree to which transfer learning affects the kinds of features and representations learned by the neural networks. Given the similar performance, does transfer learning result in different representations from random initialization? Is knowledge from the pretraining step reused, and if so, where? To find answers to these questions, this study analyzes and compares the hidden representations (i.e., representations learned in the latent layers of the network) in the different neural networks trained to solve these tasks. This quantitative analysis can be challenging, due to the complexity and lack of alignment in different hidden layers. But a recent method, singular vector canonical correlation analysis (SVCCA; code and tutorials), based on canonical correlation analysis (CCA), helps overcome these challenges, and can be used to calculate a similarity score between a pair of hidden representations.

Similarity scores are computed for some of the hidden representations from the top latent layers of the networks (closer to the output) between networks trained from random initialization and networks trained from pretrained ImageNet weights. As a baseline, we also compute similarity scores of representations learned from different random initializations. For large models, representations learned from random initialization are much more similar to each other than those learned from transfer learning. For smaller models, there is greater overlap between representation similarity scores.

Representation similarity scores between networks trained from random initialization and networks trained from pretrained ImageNet weights (orange), and baseline similarity scores of representations trained from two different random initializations (blue). Higher values indicate greater similarity. For larger models, representations learned from random initialization are much more similar to each other than those learned through transfer. This is not the case for smaller models.

The reason for this difference between large and small models becomes clear with further investigation into the hidden representations. Large models change less through training, even from random initialization. We perform multiple experiments that illustrate this, from simple filter visualizations to tracking changes between different layers through fine-tuning.

When we combine the results of all the experiments from the paper, we can assemble a table summarizing how much representations change through training on the medical task across (i) transfer learning, (ii) model size and (iii) lower/higher layers.

Effects on Convergence: Feature Independent Benefits and Hybrid Approaches
One consistent effect of transfer learning was a significant speedup in the time taken for the model to converge. But having seen the mixed results for feature reuse from our representational study, we looked into whether there were other properties of the pretrained weights that might contribute to this speedup. Surprisingly, we found a feature independent benefit of pretraining — the weight scaling.

We initialized the weights of the neural network as independent and identically distributed (iid), just like random initialization, but using the mean and variance of the pretrained weights. We called this initialization the Mean Var Init, which keeps the pretrained weight scaling but destroys all the features. This Mean Var Init offered significant speedups over random initialization across model architectures and tasks, suggesting that the pretraining process of transfer learning also helps with good weight conditioning.

Filter visualization of weights initialized according to pretrained ImageNet weights, Random Init, and Mean Var Init. Only the ImageNet Init filters have pretrained (Gabor-like) structure, as Rand Init and Mean Var weights are iid.

Recall that our earlier experiments suggested that feature reuse primarily occurs in the lowest layers. To understand this, we performed weight transfusion experiments, where only a subset of the pretrained weights (corresponding to a contiguous set of layers) are transferred, with the remainder of weights being randomly initialized. Comparing convergence speeds of these transfused networks with full transfer learning further supports the conclusion that feature reuse is primarily happening in the lowest layers.

Learning curves comparing the convergence speed with AUC on the test set. Using only the scaling of the pretrained weights (Mean Var Init) helps with convergence speed. The figures compare the standard transfer learning and the Mean Var initialization scheme to training from random initialization.

This suggests hybrid approaches to transfer learning, where instead of reusing the full neural network architecture, we can recycle its lowest layers and redesign the upper layers to better suit the target task. This gives us most of the benefits of transfer learning while further enabling flexible model design. In the Figure below, we show the effect of reusing pretrained weights up to Block2 in Resnet50, halving the remainder of the channels, initializing those layers randomly, and then training end-to-end. This matches the performance and convergence of full transfer learning.

Hybrid approaches to transfer learning on Resnet50 (left) and CBR models (right) — reusing a subset of the weights and slimming the remainder of the network (Slim), and using mathematically synthesized Gabors for conv1 (Synthetic Gabor).

The figure above also shows the results of an extreme version of this partial reuse, transferring only the very first convolutional layer with mathematically synthesized Gabor filters (pictured below). Using just these (synthetic) weights offers significant speedups, and hints at many other creative hybrid approaches.

Synthetic Gabor filters used to initialize the first layer if neural networks in some of the experiments in this paper. The Gabor filters are generated as grayscale images and repeated across the RGB channels. Left: Low frequencies. Right: High frequencies.

Conclusion and Open Questions
Transfer learning is a central technique for many domains. In this paper we provide insights on some of its fundamental properties in the medical imaging context, studying performance, feature reuse, the effect of different architectures, convergence and hybrid approaches. Many interesting open questions remain: How much of the original task has the model forgotten? Why do large models change less? Can we get further gains matching higher order moments of pretrained weight statistics? Are the results similar for other tasks, such as segmentation? We look forward to tackling these questions in future work!

Special thanks to Samy Bengio and Jon Kleinberg, who are co-authors on this work. Thanks also to Geoffrey Hinton for helpful feedback.

Pod Squad: Descript Uses AI to Make Managing Podcasts Quicker, Easier

You can’t have an AI podcast and not interview someone using AI to make podcasts better.

That’s why we reached out to serial entrepreneur Andrew Mason to talk to him about what he’s doing now. His company, Descript Podcast Studio, uses AI, natural language processing and automatic speech synthesis to make podcast editing easier and more collaborative.

Mason, Descript’s CEO and perhaps best known as Groupon’s founder, spoke with AI Podcast host Noah Kravitz about his company and the newest beta service it offers, called Overdub.


Key Points From This Episode

  • Descript works like a collaborative word processor. Users record audio, which Descript converts to text. They can then edit and rearrange text, and the program will change the audio.
  • Overdub, created in collaboration with Descript’s AI research division, eliminates the need to re-record audio. Type in new text, and Overdub creates audio in the user’s voice.
  • Descript 3.0 launched in November, adding new features such as a detector that can identify and remove vocalized pauses like “um” and “uh” as well as silence.


“We’re trying to use AI to automate the technical heavy lifting components of learning to use editors — as opposed to automating the craft — and we leave space for the user to display and refine their craft” — Andrew Mason [07:10]

“What’s really unique to us is a kind of tonal or prosodic connecting of the dots, where we’ll analyze the audio before and after whatever you’re splicing in with Overdub, and make sure that it sounds continuous in a natural transition” — Andrew Mason [10:30]

You Might Also Like

The Next Hans Zimmer? How AI May Create Music for Video Games, Exercise Routines

Imagine Wolfgang Amadeus Mozart as an algorithm or the next Hans Zimmer as a computer. Pierre Barreau and his startup, Aiva Technologies, are using deep learning to compose music. Their algorithm can create a theme in four minutes flat.

How Deep Learning Can Translate American Sign Language

Rochester Institute of Technology computer engineering major Syed Ahmed, a research assistant at the National Technical Institute for the Deaf, uses AI to translate between American sign language and English. Ahmed trained his algorithm on 1,700 sign language videos.

Tune in to the AI Podcast

Get the AI Podcast through iTunesGoogle PodcastsGoogle PlayCastbox, DoggCatcher, OvercastPlayerFM, Pocket Casts, PodbayPodBean, PodCruncher, PodKicker, SoundcloudSpotifyStitcher and TuneIn.


Make Our Podcast Better

Have a few minutes to spare? Fill out this short listener survey. Your answers will help us make a better podcast.

The post Pod Squad: Descript Uses AI to Make Managing Podcasts Quicker, Easier appeared first on The Official NVIDIA Blog.

AWS announces the Machine Learning Embark program to help customers train their workforce in machine learning

Today at AWS re:Invent 2019, I’m excited to announce the AWS Machine Learning (ML) Embark program to help companies transform their development teams into machine learning practitioners. AWS ML Embark is based on Amazon’s own experience scaling the use of machine learning inside its own operations as well as the lessons learned through thousands of successful customer implementations. Elements of the program include guided instruction from AWS machine learning experts, a discovery workshop, hand-selected curriculum from the Machine Learning University, an AWS DeepRacer event, and co-development of a machine learning proof of concept at the culmination of the program.

Customers I talk to are eager to get started implementing machine learning in their organizations, but it can be difficult to know where to begin. And, once started, it can be challenging to gain meaningful adoption across the organization. More often, customers are not asking “why” machine learning, but “how.” It’s a cultural shift as much as a technical one. Success involves inspiring and motivating teams to get interested in machine learning, identifying the most impactful projects to tackle, and developing a workforce with the right skills. And, teams new to machine learning need guidance and expertise from more seasoned data scientists who are in short supply. As a result, organizations can often feel like turning the corner on machine learning adoption happens at a glacial pace.

The AWS ML Embark program is designed to help these customers overcome some common challenges in the machine learning journey. To kick off the program, participants will pair their business and technical staff with AWS machine learning experts to join a discovery day workshop to identify a business problem well suited for machine learning. Through this exercise, AWS machine learning experts will help the group work backwards from a problem and align on where machine learning can have meaningful impact.

Next, this cross-functional group will participate in instructor-led, on-site trainings with curriculum modeled after Amazon’s Machine Learning University, which has been refined over the last several years to help Amazon’s own developers become proficient in machine learning. Participants will benefit from hand-selected coursework focused on practical application relevant to their business use cases. At the completion of the training, the AWS ML Embark program offers the option to continue education online and take the AWS Certified Machine Learning – Specialty certification exam to validate their skills.

AWS ML Embark also includes a corporate AWS DeepRacer event to expose a broader group of employees to machine learning with friendly competition and hands-on experience through racing fully autonomous 1/18th scale race cars using reinforcement learning.

Finally, experts from the Amazon ML Solutions Lab mentor participants through the ideation, development, and launch of a proof of concept based on a use case identified in the discovery day workshop. Through the process, the team will gain insight into best practices, ways to avoid costly mistakes, and knowledge based on the overall experience of working with experts who have completed hundreds of machine learning implementations.

At the conclusion of the program, a customer is well prepared to begin scaling newly obtained machine learning capabilities throughout their organization to take on additional machine learning projects and solve new challenges across their business. We’re excited to help customers begin their machine learning journey and can’t wait to see what they’ll do after graduation. Nominations for the program are now being accepted.


About the Author

Michelle Lee is vice president of the Machine Learning Solutions Lab at AWS.



AWS Outposts Station a GPU Garrison in Your Data Center

All the goodness of GPU acceleration on Amazon Web Services can now also run inside your own data center.

AWS Outposts powered by NVIDIA T4 Tensor Core GPUs are generally available starting today. They bring cloud-based Amazon EC2 G4 instances inside your data center to meet user requirements for security and latency in a wide variety of AI and graphics applications.

With this new offering, AI is no longer a research project.

Most companies still keep their data inside their own walls because they see it as their core intellectual property. But for deep learning to transition from research into production, enterprises need the flexibility and ease of development the cloud offers — right beside their data. That’s a big part of what AWS Outposts with T4 GPUs now enables.

With this new offering, enterprises can install a fully managed rack-scale appliance next to the large data lakes stored securely in their data centers.

AI Acceleration Across the Enterprise

To train neural networks, every layer of software needs to be optimized, from NVIDIA drivers to container runtimes and application frameworks. AWS services like Sagemaker, Elastic MapReduce and many others designed on custom-built Amazon Machine Images require model development to start with the training on large datasets. With the introduction of NVIDIA-powered AWS Outposts, those services can now be run securely in enterprise data centers.

The GPUs in Outposts accelerate deep learning as well as high performance computing and other GPU applications. They all can access software in NGC, NVIDIA’s hub for GPU-accelerated software optimization, which is stocked with applications, frameworks, libraries and SDKs that include pre-trained models.

For AI inference, the NVIDIA EGX edge-computing platform also runs on AWS Outposts and works with the AWS Elastic Kubernetes Service. Backed by the power of NVIDIA T4 GPUs, these services are capable of processing orders of magnitudes more information than CPUs alone. They can quickly derive insights from vast amounts of data streamed in real time from sensors in an Internet of Things deployment whether it’s in manufacturing, healthcare, financial services, retail or any other industry.

On top of EGX, the NVIDIA Metropolis application framework provides building blocks for vision AI, geared for use in smart cities, retail, logistics and industrial inspection, as well as other AI and IoT use cases, now easily delivered on AWS Outposts.

Alternatively, the NVIDIA Clara application framework is tuned to bring AI to healthcare providers whether it’s for medical imaging, federated learning or AI-assisted data labeling.

The T4 GPU’s Turing architecture uses TensorRT to accelerate the industry’s widest set of AI models. Its Tensor Cores support multi-precision computing that delivers up to 40x more inference performance than CPUs.

Remote Graphics, Locally Hosted

Users of high-end graphics have choices, too. Remote designers, artists and technical professionals who need to access large datasets and models can now get both cloud convenience and GPU performance.

Graphics professionals can benefit from the same NVIDIA Quadro technology that powers most of the world’s professional workstations not only on the public AWS cloud, but on their own internal cloud now with AWS Outposts packing T4 GPUs.

Whether they’re working locally or in the cloud, Quadro users can access the same set of hundreds of graphics-intensive, GPU-accelerated third-party applications.

The Quadro Virtual Workstation AMI, available in AWS Marketplace, includes the same Quadro driver found on physical workstations. It supports hundreds of Quadro-certified applications such as Dassault Systèmes SOLIDWORKS and CATIA; Siemens NX; Autodesk AutoCAD and Maya; ESRI ArcGIS Pro; and ANSYS Fluent, Mechanical and Discovery Live.

Learn more about AWS and NVIDIA offerings and check out our booth 1237 and session talks at AWS re:Invent.

The post AWS Outposts Station a GPU Garrison in Your Data Center appeared first on The Official NVIDIA Blog.

Next Meetup




Plug yourself into AI and don't miss a beat