Category: Global

Introducing the Schema-Guided Dialogue Dataset for Conversational Assistants

Written on October 27, 2019. Posted in Google.

Posted by Abhinav Rastogi, Software Engineer and Pranav Khaitan, Engineering Lead, Google Research

Today’s virtual assistants help users to accomplish a wide variety of tasks, including finding flights, searching for nearby events and movies, making reservations, sourcing information from the web and more. They provide this functionality by offering a unified natural language interface to a wide variety of services across the web. Large-scale virtual assistants, like Google Assistant, need to integrate with a large and constantly increasing number of services, each with potentially overlapping functionality, over a wide variety of domains. Supporting new services with ease, without collection of additional data or retraining the model, and reducing maintenance workload are necessary to accommodate future growth. Despite tremendous progress, however, these challenges have often been overlooked in state-of-the-art models. This is due, in part, to the absence of suitable datasets that match the scale and complexity confronted by such virtual assistants.

In our recent paper, “Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset”, we introduce a new dataset to address these problems. The Schema-Guided Dialogue dataset (SGD) is the largest publicly available corpus of task-oriented dialogues, with over 18,000 dialogues spanning 17 domains. Equipped with various annotations, this dataset is designed to serve as an effective testbed for intent prediction, slot filling, state tracking (i.e., estimating the user’s goal) and language generation, among other tasks for large-scale virtual assistants. We also propose a schema-guided approach for building virtual assistants as a solution to the aforementioned challenges. Our approach utilizes a single model across all services and domains, with no domain-specific parameters. Based on the schema-guided approach and building on the power of pre-trained language models like BERT, we open source a model for dialogue state tracking, which is applicable in a zero-shot setting (i.e., with no training data for new services and APIs) while remaining competitive in the regular setting.

The Dataset
The primary goal of releasing the SGD dataset is to confront many real-world challenges that are not sufficiently captured by existing datasets. The SGD dataset consists of over 18k annotated multi-domain, task-oriented conversations between a human and a virtual assistant. These conversations involve interactions with services and APIs spanning 17 domains, ranging from banks and events to media, calendar, travel, and weather. For most of these domains, the SGD dataset contains multiple different APIs, many of which have overlapping functionalities but different interfaces, which reflects common real-world scenarios. SGD is the first dataset to cover such a wide variety of domains and provide multiple APIs per domain. Furthermore, to quantify the robustness of models to changes in API interfaces or to the addition of new APIs, the evaluation set contains many new services that are not present in the training set.

For the creation of the SGD dataset, we have prioritized the variety and accuracy of annotations in the included dialogues. To begin with, dialogues were collected by interaction between two people using a Wizard-of-Oz style process, followed by crowdsourced annotation. Initial efforts revealed the difficulty in obtaining consistent annotations using this method, so we developed a new data collection process that minimized the need for complex manual annotation, and considerably reduced the time and cost of data collection.

For this alternate approach, we developed a multi-domain dialogue simulator that generates dialogue skeletons over an arbitrary combination of APIs, along with the corresponding annotations, such as dialogue state and system actions. The simulator consists of two agents playing the role of the user and the assistant. Both the agents interact with each other using a finite set of actions denoting dialogue semantics with transitions specified through a probabilistic automaton, designed to capture a wide variety of dialogue trajectories. The actions generated by the simulator are converted into natural language utterances using a set of templates. Crowdsourcing is used only for paraphrasing these templatized utterances in order to make the dialogue more natural and coherent. This setup eliminates the need for complicated domain-specific instructions while keeping the crowdsourcing task simple and yields natural dialogues with consistent, high quality annotations.

Steps for obtaining dialogues, with assistant turns marked in red and user turns in blue. Left: The simulator generates a dialogue skeleton using a finite set of actions. Center: Actions are converted into utterances using templates (~50 per service) and slot values are replaced with natural variations. Right: Paraphrasing via crowdsourcing to make the flow cohesive.

The Schema-Guided Approach
With the availability of the SGD dataset, it is now possible to train virtual assistants to support the diversity of services available on the web. A common approach to do this requires a large master schema that lists all supported functions and their parameters. However, it is difficult to develop a master schema catering to all possible use cases. Even if that problem is solved, a master schema would complicate integration of new or small-scale services and would increase the maintenance workload of the assistant. Furthermore, while there are many similar concepts across services that can be jointly modeled, for example, the similarities in logic for querying or specifying the number of movie tickets, flight tickets or concert tickets, the master schema approach does not facilitate joint modeling of such concepts, unless an explicit mapping between them is manually defined.

The new schema-guided approach we propose addresses these limitations. This approach does not require the definition of a master schema for the assistant. Instead, each service or API provides a natural language description of the functions listed in its schema along with their associated attributes. These descriptions are then used to learn a distributed semantic representation of the schema, which is given as an additional input to the dialogue system. The dialogue system is then implemented as a single unified model, containing no domain or service specific parameters. This unified model facilitates representation of common knowledge between similar concepts in different services, while the use of distributed representations of the schema makes it possible to operate over new services that are not present in the training data. We have implemented this approach in our open-sourced dialogue state tracking model.

Eighth Dialogue System Technology Challenge
The Dialog System Technology Challenges (DSTCs) are a series of research competitions to accelerate the development of new dialogue technologies. This year, Google organized one of the tracks, “Schema-Guided Dialogue State Tracking“, as part of the recently concluded 8th DSTC. We received submissions from a total of 25 teams from both industry and academia, which will be presented at the DSTC8 workshop at AAAI-20.

We believe that this dataset will act as a good benchmark for building large-scale dialogue models. We are excited and looking forward to all the innovative ways in which the research community will use it for the advancement of dialogue technologies.

Acknowledgements
This post reflects the work of our co-authors Xiaoxue Zang, Srinivas Sunkara and Raghav Gupta. We also thank Amir Fayazi and Maria Wang for help with data collection and Guan-Lin Chao for insights on model design and implementation.

Google at ICCV 2019

Written on October 27, 2019. Posted in Google.

Andrew Helton, Editor, Google Research Communications

This week, Seoul, South Korea hosts the International Conference on Computer Vision 2019 (ICCV 2019), one of the world’s premier conferences on computer vision. As a leader in computer vision research and a Gold Sponsor, Google will have a strong presence at ICCV 2019 with over 200 Googlers in attendance, more than 40 research presentations, and involvement in the organization of a number of workshops and tutorials.

If you are attending ICCV this year, please stop by our booth. There you can chat with researchers who are actively pursuing the latest innovations in computer vision and demo some of their latest research, including the technology behind MediaPipe, the new Open Images dataset, new developments for Google Lens and much more.

This year Google researchers are recipients of three prestigious ICCV awards:

Distinguished Researcher Award — Bill Freeman, Research Scientist, Google Research
Helmholtz Prize (Test of Time Award) — ICCV 2009 paper, “Building Rome in a Day“, by Sameer Agarwal, Noah Snavely, Ian Simon, Steve Seitz and Rick Szeliski
Marr Prize (Best Paper Award) — ICCV 2019 paper, “SinGAN: Learning a Generative Model from a Single Natural Image“, by Tamar Rott Shaham, Tali Dekel and Tomer Michaeli

More details about the Google research being presented at ICCV 2019 can be found below (Google affiliations in blue).

Organizing Committee includes:
Ming-Hsuan Yang (Program Chair)

Oral Presentations
Learning Single Camera Depth Estimation using Dual-Pixels
Rahul Garg, Neal Wadhwa, Sameer Ansari, Jonathan Barron

RIO: 3D Object Instance Re-Localization in Changing Indoor Environments
Johanna Wald, Armen Avetisyan, Nassir Navab, Federico Tombari, Matthias Niessner

ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors
Weicheng Kuo, Anelia Angelova, Jitendra Malik, Tsung-Yi Lin

PuppetGAN: Cross-Domain Image Manipulation by Demonstration
Ben Usman, Nick Dufour, Kate Saenko, Chris Bregler

COCO-GAN: Generation by Parts via Conditional Coordinating
Chieh Hubert Lin, Chia-Che Chang, Yu-Sheng Chen, Da-Cheng Juan, Wei Wei, Hwann-Tzong Chen

Towards Unconstrained End-to-End Text Spotting
Siyang Qin, Alessandro Bissaco, Michalis Raptis, Yasuhisa Fujii, Ying Xiao

SinGAN: Learning a Generative Model from a Single Natural Image
Tamar Rott Shaham, Tali Dekel, Tomer Michaeli
(ICCV 2019 Marr Prize Winner — Best Paper Award)

Generative Modeling for Small-Data Object Detection
Lanlan Liu, Michael Muelly, Jia Deng, Tomas Pfister, Li-Jia Li

Searching for MobileNetV3
Andrew Howard, Mark Sandler, Bo Chen, Weijun Wang, Liang-Chieh Chen, Mingxing Tan, Grace Chu, Vijay Vasudevan, Yukun Zhu, Ruoming Pang, Hartwig Adam, Quoc Le

S⁴L: Self-Supervised Semi-supervised Learning
Lucas Beyer, Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov

Sampling-Free Epistemic Uncertainty Estimation Using Approximated Variance Propagation
Janis Postels, Francesco Ferroni, Huseyin Coskun, Nassir Navab, Federico Tombari

Linearized Multi-sampling for Differentiable Image Transformation
Wei Jiang, Weiwei Sun, Andrea Tagliasacchi, Eduard Trulls, Kwang Moo Yi

Poster Presentations
ELF: Embedded Localisation of Features in Pre-trained CNN
Assia Benbihi, Matthieu Geist, Cedric Pradalier

Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras
Ariel Gordon, Hanhan Li, Rico Jonschkowski, Anelia Angelova

ForkNet: Multi-branch Volumetric Semantic Completion from a Single Depth Image
Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

A Learned Representation for Scalable Vector Graphics
Raphael Gontijo Lopes, David Ha, Douglas Eck, Jonathon Shlens

FrameNet: Learning Local Canonical Frames of 3D Surfaces from a Single RGB Image
Jingwei Huang, Yichao Zhou, Thomas Funkhouser, Leonidas Guibas

Prior-Aware Neural Network for Partially-Supervised Multi-Organ Segmentation
Yuyin Zhou, Zhe Li, Song Bai, Xinlei Chen, Mei Han, Chong Wang, Elliot Fishman, Alan Yuille

Boundless: Generative Adversarial Networks for Image Extension
Dilip Krishnan, Piotr Teterwak, Aaron Sarna, Aaron Maschinot, Ce Liu, David Belanger, William Freeman

Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection
Keren Ye, Mingda Zhang, Adriana Kovashka, Wei Li, Danfeng Qin, Jesse Berent

NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-supervised Object Detection
Jiyang Gao, Jiang Wang, Shengyang Dai, Li-Jia Li, Ram Nevatia

Object-Driven Multi-Layer Scene Decomposition from a Single Image
Helisa Dhamo, Nassir Navab, Federico Tombari

Improving Adversarial Robustness via Guided Complement Entropy
Hao-Yun Chen, Jhao-Hong Liang, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan

XRAI: Better Attributions Through Regions
Andrei Kapishnikov, Tolga Bolukbasi, Fernanda Viegas, Michael Terry

SegSort: Segment Sorting for Semantic Segmentation
Jyh-Jing Hwang, Stella Yu, Jianbo Shi, Maxwell Collins, Tien-Ju Yang, Xiao Zhang, Liang-Chieh Chen

Self-Supervised Learning with Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera
Yuhua Chen, Cordelia Schmid, Cristian Sminchisescu

VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid

Explaining the Ambiguity of Object Detection and 6D Pose from Visual Data
Fabian Manhardt, Diego Martín Arroyo, Christian Rupprecht, Benjamin Busam, Tolga Birdal, Nassir Navab, Federico Tombari

Constructing Self-Motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation
Qing Lian, Lixin Duan, Fengmao Lv, Boqing Gong

Learning Shape Templates Using Structured Implicit Functions
Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William Freeman, Thomas Funkhouser

Transferable Representation Learning in Vision-and-Language Navigation
Haoshuo Huang, Vihan Jain, Harsh Mehta, Alexander Ku, Gabriel Magalhaes, Jason Baldridge, Eugene Ie

Controllable Attention for Structured Layered Video Decomposition
Jean-Baptiste Alayrac, Joao Carreira, Relja Arandjelović, Andrew Zisserman

Pixel2Mesh++: Multi-view 3D Mesh Generation via Deformation
Chao Wen, Yinda Zhang, Zhuwen Li, Yanwei Fu

Beyond Cartesian Representations for Local Descriptors
Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, Eduard Trulls

Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization without Accessing Target Domain Data
Xiangyu Yue, Yang Zhang, Sicheng Zhao, Alberto Sangiovanni-Vincentelli, Kurt Keutzer, Boqing Gong

Evolving Space-Time Neural Architectures for Videos
AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael Ryoo

Moulding Humans: Non-parametric 3D Human Shape Estimation from Single Images
Valentin Gabeur, Jean-Sebastien Franco, Xavier Martin, Cordelia Schmid, Gregory Rogez

Multi-view Image Fusion
Marc Comino Trinidad, Ricardo Martin-Brualla, Florian Kainz, Janne Kontkanen

EvalNorm: Estimating Batch Normalization Statistics for Evaluation
Saurabh Singh, Abhinav Shrivastava

Attention Augmented Convolutional Networks
Irwan Bello, Barret Zoph, Quoc Le, Ashish Vaswani, Jonathon Shlens

Patchwork: A Patch-wise Attention Network for Efficient Object Detection and Segmentation in Video Streams
Yuning Chai

Workshops
Low-Power Computer Vision
Organizers include: Bo Chen

Neural Architects
Organizers include: Barret Zoph

The 3rd YouTube-8M Large-Scale Video Understanding Workshop
Organizers include: Paul Natsev, Cordelia Schmid, Rahul Sukthankar, Joonseok Lee, George Toderici

Should We Pre-register Experiments in Computer Vision?
Organizers include: Jack Valmadre

Extreme Vision Modeling
Organizers include: Rahul Sukthankar

Joint COCO and Mapillary Recognition Challenge
Organizers include: Tsung-Yi Lin, Yin Cui

Open Images Challenge
Organizers include: Vittorio Ferrari, Alina Kuznetsova, Rodrigo Benenson, Victor Gomes, Matteo Malloci

Tutorials
Meta-Learning and Metric Learning Algorithms
Organizers include: Kevin Swersky

Cancer researchers embrace AI to accelerate development of precision medicine

Written on October 27, 2019. Posted in Microsoft.

Biomedical researchers are embracing artificial intelligence to accelerate the implementation of cancer treatments that target patients’ specific genomic profiles, a type of precision medicine that in some cases is more effective than traditional chemotherapy and has fewer side effects.

The potential for this new era of cancer treatment stems from advances in genome sequencing technology that enables researchers to more efficiently discover the specific genomic mutations that drive cancer, and an explosion of research on the development of new drugs that target those mutations.

To harness this potential, researchers at The Jackson Laboratory, an independent, nonprofit biomedical research institution also known as JAX and headquartered in Bar Harbor, Maine, developed a tool to help the global medical and scientific communities stay on top of the continuously growing volume of data generated by advances in genomic research.

The tool, called the Clinical Knowledgebase, or CKB, is a searchable database where subject matter experts store, sort and interpret complex genomic data to improve patient outcomes and share information about clinical trials and treatment options.

The challenge is to find the most relevant cancer-related information from the 4,000 or so biomedical research papers published each day, according to Susan Mockus, the associate director of clinical genomic market development with JAX’s genomic medicine institute in Farmington, Connecticut.

“Because there is so much data and so many complexities, without embracing and incorporating artificial intelligence and machine learning to help in the interpretation of the data, progress will be slow,” she said.

That’s why Mockus and her colleagues at JAX are collaborating with computer scientists working on Microsoft’s Project Hanover who are developing AI technology that enables machines to read complex medical and research documents and highlight the important information they contain.

While this machine reading technology is in the early stages of development, researchers have found they can make progress by narrowing the focus to specific areas such as clinical oncology, explained Peter Lee, corporate vice president of Microsoft Healthcare in Redmond, Washington.

“For something that really matters like cancer treatment where there are thousands of new research papers being published every day, we actually have a shot at having the machine read them all and help a board of cancer specialists answer questions about the latest research,” he said.

Peter Lee stands with arms crossed behind some plants — Peter Lee, corporate vice president of Microsoft Healthcare.

Curating CKB

Mockus and her colleagues are using Microsoft’s machine reading technology to curate CKB, which stores structured information about genomic mutations that drive cancer, drugs that target cancer genes and the response of patients to those drugs.

One application of this knowledgebase allows oncologists to discover what, if any, matches exist between a patient’s known cancer-related genomic mutations and drugs that target them as they explore and weigh options for treatment, including enrollment in clinical trials for drugs in development.

This information is also useful to translational and clinical researchers, Mockus noted.

The bottleneck is filtering through the more than 4,000 papers published every day in biomedical journals to find the subset of about 200 related to cancer, read them and update CKB with the relevant information on the mutation, drug and patient response.

“What you want is some degree of intelligence incorporated into the system that can go out and not just be efficient, but also be effective and relevant in terms of how it can filter information. That is what Hanover has done,” said Auro Nair, executive vice president of JAX.

The core of Microsoft’s Project Hanover is the capability to comb through the thousands of documents published each day in the biomedical literature and flag and rank all that are potentially relevant to cancer researchers, highlighting, for example, information on gene, mutation, drug and patient response.

Human curators working on CKB are then free to focus on the flagged research papers, validating the accuracy of the highlighted information.

“Our goal is to make the human curators superpowered,” said Hoifung Poon, director of precision health natural language processing with Microsoft’s research organization in Redmond and the lead researcher on Project Hanover.

“With the machine reader, we are able to suggest that this might be a case where a paper is talking about a drug-gene mutation relation that you care about,” Poon explained. “The curator can look at this in context and, in a couple of minutes, say, ‘This is exactly what I want,’ or ‘This is incorrect.’”

Hoifung Poon sits on a yellow chair — Hoifung Poon , director of precision health natural language processing with Microsoft’s research organization, is leading the development of Project Hanover, a machine reading technology.

Self supervision

To be successful, Poon and his team need to train machine learning models in such a way that they catch all the potentially relevant information – ensure there are no gaps in content – and, at the same time, weed out irrelevant information sufficiently to make the curation process more efficient.

In traditional machine reading tasks such as finding information about celebrities in news stories, researchers tend to focus on relationships contained within a single sentence, such as a celebrity name and a new movie.

Since this type of information is widespread across news stories, researchers can skip instances that are more challenging such as when the name of the celebrity and movie are mentioned in separate paragraphs, or when the relationship involves more than two pieces of information.

“In biomedicine, you can’t do that because your latest finding may only appear in this single paper and if you skip it, it could be life or death for this patient,” explained Poon. “In this case, you have to tackle some of the hard linguistic challenges head on.”

Poon and his team are taking what they call a self-supervision approach to machine learning in which the model automatically annotates training examples from unlabeled text by leveraging prior knowledge in existing databases and ontologies.

For example, a National Cancer Institute initiative manually compiled information from the biomedical literature on how genes regulate each other but was unable to sustain the effort beyond two years. Poon’s team used the compiled knowledge to automatically label documents and train a machine reader to find new instances of gene regulation.

They took the same approach with public datasets on approved cancer drugs and drugs in clinical trials, among other sources.

This connect-the-dots approach creates a machine learned model that “rarely misses anything” and is precise enough “where we can potentially improve the curation efficiency by a lot,” said Poon.

Collaboration with JAX

The collaboration with JAX allows Poon and his team to validate the effectiveness of Microsoft’s machine reading technology while increasing the efficiency of Mockus and her team as they curate CKB.

“Leveraging the machine reader, we can say here is what we are interested in and it will help to triage and actually rank papers for us that have high clinical significance,” Mockus said. “And then a human goes in to really tease apart the data.”

Over time, feedback from the curators will be used to help train the machine reading technology, making the models more precise and, in turn, making the curators more efficient and allowing the scope of CKB to expand.

“We feel really, really good about this relationship,” said Nair. “Particularly from the standpoint of the impact it can have in providing a very powerful tool to clinicians.”

Learn more about the Clinical Knowledgebase and The Jackson Laboratory
Learn more about Project Hanover
Read: How Microsoft computer scientists and researchers are working to ‘solve’ cancer
Read: Microsoft announces general availability of cloud-based tools for genomics research

John Roach writes about Microsoft research and innovation. Follow him on Twitter.

The post Cancer researchers embrace AI to accelerate development of precision medicine appeared first on The AI Blog.

AI’s Latest Adventure Turns Pets into GANimals

Written on October 27, 2019. Posted in NVIDIA.

Imagine your Labrador’s smile on a lion or your feline’s finicky smirk on a tiger. Such a leap is easy for humans to perform, with our memories full of images. But the same task has been a tough challenge for computers — until the GANimal.

A team of NVIDIA researchers has defined new AI techniques that give computers enough smarts to see a picture of one animal and recreate its expression and pose on the face of any other creature. The work is powered in part by generative adversarial networks (GANs), an emerging AI technique that pits one neural network against another.

You can try it for yourself with the GANimal app. Input an image of your dog or cat and see its expression and pose reflected on dozens of breeds and species from an African hunting dog and Egyptian cat to a Shih-Tzu, snow leopard or sloth bear.

I tried it, using a picture of my son’s dog, Duke, a mixed-breed mutt who resembles a Golden Lab. My fave — a dark-eyed lynx wearing Duke’s dorky smile.

There’s potential for serious applications, too. Someday movie makers may video dogs doing stunts and use AI to map their movements onto, say, less tractable tigers.

The team reports its work this week in a paper at the International Conference on Computer Vision (ICCV) in Seoul. The event is one of three seminal conferences for researchers in the field of computer vision.

Their paper describes what the researchers call FUNIT, “a Few-shot, UNsupervised Image-to-image Translation algorithm that works on previously unseen target classes that are specified, at test time, only by a few example images.”

“Most GAN-based image translation networks are trained to solve a single task. For example, translate horses to zebras,” said Ming-Yu Liu, a lead computer-vision researcher on the NVIDIA team behind FUNIT.

“In this case, we train a network to jointly solve many translation tasks where each task is about translating a random source animal to a random target animal by leveraging a few example images of the target animal,” Liu explained. “Through practicing solving different translation tasks, eventually the network learns to generalize to translate known animals to previously unseen animals.”

Before this work, network models for image translation had to be trained using many images of the target animal. Now, one picture of Rover does the trick, in part thanks to a training function that includes many different image translation tasks the team adds to the GAN process.

The work is the next step in Liu’s overarching goal of finding ways to code human-like imagination into neural networks. “This is how we make progress in technology and society by solving new kinds of problems,” said Liu.

The team — which includes seven of NVIDIA’s more than 200 researchers — wants to expand the new FUNIT tool to include more kinds of images at higher resolutions. They are already testing it with images of flowers and food.

Liu’s work in GANs hit the spotlight earlier this year with GauGAN, an AI tool that turns anyone’s doodles into photorealistic works of art.

The GauGAN tool has already been used to create more than a million images. Try it for yourself on the AI Playground.

At the ICCV event, Liu will present a total of four papers in three talks and one poster session. He’ll also chair a paper session and present at a tutorial on how to program the Tensor Cores in NVIDIA’s latest GPUs.

The post AI’s Latest Adventure Turns Pets into GANimals appeared first on The Official NVIDIA Blog.

Clean Sweep: Tokyo Robotics Company Builds Tidying Robots

Written on October 24, 2019. Posted in NVIDIA.

Though creating an autonomous robot that can tidy a room seems like enough of an achievement, Tokyo-based Preferred Networks goes one step further. By integrating natural language processing into their technology, their robots respond to commands and adjust their actions.

Jun Hatori, a software engineer at Preferred Networks, spoke with AI Podcast host Noah Kravitz about the company’s latest developments.

To create robots that can understand how to clean up a room and respond to demands, Hatori described two main obstacles.

“I started to realize that robots can’t do as much as we can instruct,” he said. While NLP technology allows robots to understand the commands being given, their hardware isn’t always advanced enough to carry out the tasks.

The second challenge is crafting a robot that can understand the nuances of human language. “If you’re going to give it a command — like, ‘Pick up that white stuff’ — then the robot basically has to know what kind of items are there, how they’re placed, and what the word ‘white’ means,” Hatori said.

Preferred Networks has overcome these challenges to craft a robot with computer vision and object detection technology, as well as human-robot interaction capabilities such as gesture recognition and spoken language interpretation.

Their robot first assesses the room and creates a task list based on the objects that are out of place. Using a “paragrip” — a pinching hand — the robot grasps objects and puts them away.

By integrating NLP capabilities, users can instruct the robot to put objects elsewhere.

Preferred Networks has also applied this human-robot interaction technology in the biohealth, industrial and automobile domains.

But their focus is still on personal robots. “Everyone knows it has huge potential if someone can build something actually usable,” Hatori said. “In the coming years, I think there’s going to be very big competition among many companies and research groups.”

You can see Preferred Networks’ cleaning robot in action along with their other projects at their website.

How to Tune in to the AI Podcast

Get the AI Podcast through iTunes, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn. Your favorite not listed here? Email us at aipodcast [at] nvidia [dot] com or fill out this short listener survey.

The post Clean Sweep: Tokyo Robotics Company Builds Tidying Robots appeared first on The Official NVIDIA Blog.

A New Workflow for Collaborative Machine Learning Research in Biodiversity

Written on October 24, 2019. Posted in Google.

Posted by Serge Belongie, Visiting Faculty and Hartwig Adam, Engineering Director, Google Research

The promise of machine learning (ML) for species identification is coming to fruition, revealing its transformative potential in biodiversity research. International workshops such as FGVC and LifeCLEF feature competitions to develop top performing classification algorithms for everything from wildlife camera trap images to pressed flower specimens on herbarium sheets. The encouraging results that have emerged from these competitions inspired us to expand the availability of biodiversity datasets and ML models from workshop-scale to global-scale.

Bringing powerful ML algorithms to the communities that need them requires more than the traditional “big data + big compute” equation. Institutions ranging from natural history museums to citizen science groups take great care to collect and annotate datasets, and the data they share have enabled numerous scientific research publications. But central to the tradition of scholarly research are the conventions of citation and attribution, and it follows that as ML extends its reach into the life sciences, it should bring with it appropriate counterparts to those conventions. More broadly, there is a growing awareness of the importance of ethics, fairness, and transparency within the ML community. As institutions develop and deploy applications of ML at scale, it is critical that they be designed with these considerations in mind.

This week at Biodiversity Next, in collaboration with the Global Biodiversity Information Facility (GBIF), iNaturalist, and Visipedia, we are announcing a new workflow for biodiversity research institutions who would like to make use of ML. With its billion+ species occurrence count contributed by thousands of institutions around the globe, GBIF is playing a vital role in enabling this workflow, whether in terms of data aggregation, collaboration across teams, or standardizing citation practices. In the short term, the most important role relates to an emerging cultural shift in accepted practices for the use of mediated data for training of ML models. In the process of data mediation, GBIF helps ensure that training datasets for ML follow standardized licensing terms, use compatible taxonomies and data formats, and provide fair and sufficient data coverage for the ML task at hand by potentially sampling from multiple source datasets.

This new workflow comprises the following two components:

To assist in developing and refining machine vision models, GBIF will package datasets, taking care to ensure license and citation practice are respected. The training datasets will be issued a Digital Object Identifier (DOI), and will be linked through the DOI citation graph.
To assist application developers, Google and Visipedia will train and publish publicly accessible models with documentation on TensorFlow Hub. These models can then, in turn, be deployed in biodiversity research and citizen science efforts.

Case Study: Recognizing Fungi Species from Photos with the Interactive Mushroom Recognizer
As an illustration of the above workflow, we present an example of fungi recognition. The dataset in this case is curated by the Danish Mycological Society, and formatted, packaged, and shared by GBIF. The dataset provenance, model architecture, license information, and more can be found on the TF Hub model page, along with a live, interactive demonstration of the model that can run on user-supplied images.

Illustration of live, interactive Mushroom Recognizer, powered by a publicly available model trained on a fungi dataset provided by the Danish Mycological Society.

Invitation to Participate
For more information about this initiative, please visit the project page at GBIF. We look forward to engaging with institutions around the globe to enable new and innovative uses of ML for biodiversity.

Acknowledgements
We’d like to thank our collaborators at GBIF, iNaturalist, and Visipedia for working together to develop this workflow. At Google we would like to thank Christine Kaeser-Chen, Chenyang Zhang, Yulong Liu, Kiat Chuan Tan, Christy Cui, Arvi Gjoka, Denis Brulé, Cédric Deltheil, Clément Beauseigneur, Grace Chu, Andrew Howard, Sara Beery, and Katherine Chou.

AI Gold Seen in Healthcare’s Mountain of Waste

Written on October 24, 2019. Posted in NVIDIA.

A new report estimates the cost of waste in the U.S. healthcare system alone ranges as high as $935 billion a year, about 25 percent of total healthcare spending.

A growing army of startups and established practitioners sees the inefficiencies as a trillion-dollar opportunity to apply AI.

The U.S. spends about 18 percent of its gross domestic product on healthcare, more than any other country. A report published online by the Journal of the American Medical Association surveyed 54 studies to estimate annual waste figures in six broad categories, including failures from choosing ineffective treatments (up to $166 billion), failures from coordinating multiple treatments ($78 billion), fraud and abuse ($84 billion) and administrative complexity ($266 billion).

“Implementation of effective measures to eliminate waste represents an opportunity to reduce the continued increases in U.S. health care expenditures,” the report concluded.

MICCAI Heard the Call

Researchers echoed that theme at a major medical imaging conference in Shenzhen, China, recently.

Shiyuan Liu

Catherine Mohr, vice president of strategy at Intuitive Surgical, reviewed the history of medtech with an eye on “how to think about distinguishing price from value when developing the next generation of medical devices,” in a keynote at this year’s International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI).

Attendees also got an update on the state of the art in using AI in medical imaging in a keynote from Shiyuan Liu, president of the Chinese Medical Imaging AI Innovation Alliance. Liu called for practitioners, vendors and academics to work together to drive AI forward.

700+ AI Healthcare Startups

Opportunities span the waterfront. “Every single type of health professional” will be impacted by AI, said Eric Topol, founder and director of the Scripps Research Translational Institute, in a keynote at NVIDIA’s GTC event in Silicon Valley earlier this year. AI will help practitioners provide “better, faster, cheaper” care, said the author of the recently released book, “Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again.”

That message has not been lost on entrepreneurs. A recent healthcare event sponsored by a major Wall Street bank was “crawling with tech VCs, and five years ago that was not the case,” said Jeff Herbst, vice president of business development at NVIDIA.

With more than 700 startups, healthcare represents the largest category in NVIDIA’s Inception accelerator program that provides AI training and tools to fuel their growth. Herbst calls out Biotrillion as one to watch. The startup generates digital biomarkers to detect disease using its own analytics on sensor data from a user’s smartphone and smartwatch.

“The biggest opportunity in healthcare is in using AI to keep people well — this is the most exciting area to me,” he said.

There’s no shortage of other examples. San Francisco-based Fathom is developing deep learning tools to automate the painstaking medical coding process while increasing accuracy. Its tools use NVIDIA P100 and V100 Tensor Core GPUs in Google Cloud for both training and inference, reducing human time spent on medical coding by as much as 90 percent.

Houston-based InformAI helps reduce fatigue and stress for radiologists by building deep learning tools that can help them analyze medical scans faster. It’s image classifiers and patient outcome predictors run both on NVIDIA V100 GPUs in the Microsoft Azure cloud platform and an onsite NVIDIA DGX Station. In just 30 seconds they can analyze a patient’s 3D CT scan for 20 sinus conditions.

Subtle Medical of Menlo Park, CA, announced this week that it received FDA clearance for SubtleMR, its deep learning solution for improving the image quality of MRIs. The Inception member’s first product, SubtlePET, which can produce PET images in as little as a quarter of the scanning time of current systems, received FDA clearance last year. Both products are trained on DGX-1 and DGX Station and enabled by TensorRT.

Major Players Embrace AI

Medical imaging is one of the biggest areas in healthcare AI, with startups scattered around the globe. They include South Korean startup Lunit and InferVISION, one of China’s top medical imaging startups, focusing on lung nodule analysis and prediction from CT scans.

Major providers and vendors are also embracing AI. Two developers from UnitedHealth Group, one of the largest healthcare companies in the U.S., shared in a talk at GTC earlier this year how the provider is adopting AI for tasks that span prior authorization of medical procedures to directing phone calls.

In June, Siemens Healthineers and NVIDIA shared their latest work in AI for medical imaging at the Society for Imaging Informatics in Medicine annual conference. Siemens Healthineers is using an NVIDIA GPU-based supercomputing infrastructure to develop AI software for generating organ segmentations that enable precision radiation therapy.

“The area that will have the biggest impact in AI is healthcare,” said Ian Buck, vice president of NVIDIA’s Accelerated Computing Group in a recent interview.

“The healthcare industry is chock full of data … there are many obstacles ahead, but I am truly hopeful AI can help cure diseases and save lives — that makes me excited about the work we do,” Buck said.

The post AI Gold Seen in Healthcare’s Mountain of Waste appeared first on The Official NVIDIA Blog.

Your guide to artificial Intelligence and machine learning at re:Invent 2019

Written on October 23, 2019. Posted in Amazon.

With less than 40 days to re:Invent 2019, the excitement is building up and we are looking forward to seeing you all soon! Continuing our journey on artificial intelligence and machine learning, we are bringing a lot of technical content this year, with over 200 breakout sessions, deep-dive chalk talks, hands-on exercises with workshops featuring Amazon SageMaker, AWS DeepRacer, and deep learning frameworks such as TensorFlow, PyTorch, and more. You’ll hear from many customers including Vanguard, BBC, Autodesk, British Airways, Fannie Mae, Thermo Fisher, Intuit, and many more. We are also hosting the Machine Learning Summit again this year, where you will hear from researchers and entrepreneurs about the latest breakthroughs today and the future possibilities tomorrow.

To get you started on planning, here are a few highlights for the AI and ML sessions from the re:Invent 2019 session catalog. The reserved seating is now open, so get your seats in advance for your favorite sessions.

Getting started

If you are new to AI and ML, we have some sessions for you to get started and learn these concepts. These sessions cover the basics including overviews and demos for Amazon SageMaker, the different AI services for many applications, and the popular AWS DeepLens and AWS DeepRacer to help you learn, while having fun.

Leadership session: Machine Learning (Session AIM218-L)

As we embark on the golden age of machine learning, we are seeing the constraints and blockers disappear, and the value extending across different industries. In this leadership session, learn about the latest machine learning offerings from AWS as we explore the democratization of machine learning. We will discuss the breadth and depth of our machine learning services and you will hear from customers who are partnering with AWS on this journey.

Amazon SageMaker deep dive: A modular solution for machine learning (Session AIM307)

Amazon SageMaker is a fully managed service enabling all developers and data scientists with every aspect of the machine learning workflow. In this session, we will discuss the technical details of Amazon SageMaker to help you with your machine learning journey to get your ML models from experimentation to production at scale. We will also discuss practical deployments through real-world customer examples.

Starting the enterprise machine learning journey (Session AIM205)

Amazon has been investing in machine learning for more than 20 years, innovating in areas such as fulfillment and logistics, personalization and recommendations, forecasting, fraud prevention, and supply chain optimization. During this session, we take this expertise and show you how to identify business problems that can be solved with machine learning. We discuss considerations including selecting the right use case for a machine learning pilot, nurturing skills, and measuring the success of such pilots.

Finding a needle in a haystack: Use AI to transform content management (Session AIM206)

Finding digital content, from documents to media, can be frustrating and time-consuming. Across your employees or customers, this challenge can waste hours, derail projects, and create poor experiences. In this breakout session, learn how to use language and vision AI services to extract data, insights, and trends from all of your digital content, with a focus on how to more effectively manage your documents and find what you need.

Get started with AWS DeepRacer (Workshop AIM207)

Get behind the keyboard for an immersive experience with AWS DeepRacer. Developers with no prior machine learning experience learn new skills and apply their knowledge in a fun and exciting way. With the help of the AWS pit crew, build and train a reinforcement learning model that you can race on the tracks and win special AWS prizes, in this one of many workshops for AWS DeepRacer. See the “Advanced topics in machine learning” section for an advanced version of this workshop.

Start using computer vision with AWS DeepLens (Workshop AIM229)

If you’re new to deep learning, this workshop is for you. Learn how to build and deploy computer-vision models using the AWS DeepLens deep-learning-enabled video camera. Also learn how to build a machine learning application and a model from scratch using Amazon SageMaker. Finally, learn to extend that model to Amazon SageMaker to build an end-to-end AI application. See the “Advanced topics in machine learning” section for an advanced version of this workshop.

Improve machine learning model quality in response to changes in data (Session AIM213)

Machine learning models are typically trained and evaluated using historical data. But the real-world data may not look like the training data, especially as models age over time and the distribution of data changes. This gradual variance of the model from the real world is known as model drift, and it can have a big impact on prediction quality. This session explores techniques you can use to monitor prediction quality in production, as well as effective corrective actions such as auditing and iterative retraining.

Practical applications of machine learning

The biggest value for machine learning is its applicability across different industries. In these sessions, chalk talks, and workshops, we will dive deep into the practical aspects of machine learning for specific industries including finance, healthcare, retail, media and entertainment, manufacturing, and more.

Transforming Healthcare with AI (Session AIM210)

Improving patient care, making treatment decisions, managing clinical trials, and more are all moving into a new age due to advancements in AI. In this session, we cover AI solutions specific to the Healthcare industry, from extracting relevant medical information from patient records and clinical trial reports to automating the clinical documentation process with automatic speech recognition. Hear directly from our customers and come away with answers on how to get started immediately.

ML in retail: Solutions that add intelligence to your business (Session AIM212)

Machine learning is ranked the number-one “game changer” for the retail market segment by chief experience officers (CXOs), yet it’s only number eight on top spending priorities. So which scenarios are real? In this session, we dive into how AWS puts machine learning in the hands of every developer, without the need for deep machine learning experience. Learn about personalized product recommendations, inventory forecasting, new in-store experiences, and more. Learn from our experience at Amazon.com and hear from our customers today.

AI document processing for business automation (Session AIM211)

Millions of times per day, customers from the Finance, Healthcare, public, and other sectors rely on information that is locked in documents. Amazon Textract uses artificial intelligence to “read” such documents as a person would, to extract not only text but also tables, forms, and other structured data without configuration, training, or custom code. In this session, we demonstrate how you can use Amazon Textract to automate business processes with AI. You also hear directly from our customers about how they accelerated their own business processes with Amazon Textract.

Predict future business outcomes using Amazon Forecast (Session AIM312)

Based on the same technology used at Amazon.com, Amazon Forecast uses machine learning and time-series data to build accurate business forecasts. In this session, learn how machine learning can improve accuracy in demand forecasting, financial planning, and resource allocation while reducing your forecasting time from months to hours.

Build accurate training datasets with Amazon SageMaker Ground Truth (Session AIM308)

Successful machine learning models are built on high-quality training datasets. Typically, the task of data labeling is distributed across a large number of humans, adding significant overhead and cost. This session explains how Amazon SageMaker Ground Truth reduces cost and complexity using techniques designed to improve labeling accuracy and reduce human effort. We will walk through best practices for building highly accurate training datasets and discuss how you can use Amazon SageMaker Ground Truth to implement them.

Build predictive maintenance systems with Amazon SageMaker (Chalk Talk AIM328)

Across a wide spectrum of industries, customers are starting to use prediction maintenance models to proactively fix problems before they impact production. The result is an optimized supply chain and improved working conditions. In this session, learn how to use data from equipment to build, train, and deploy predictive models. We dive deep into the architecture for using the turbofan degradation simulation dataset to train the model to recognize potential equipment failures and share details.

Build a fraud detection system with Amazon SageMaker (Workshop AIM359)

In this workshop, we will explore the new AWS Fraud Detection Solution. We show you how to build, train, and deploy a fraud detection machine learning model. The fraud detection model recognizes fraud patterns, and is self-learning that enables it to adapt to new, unknown fraud patterns. We will show you how to execute automated transaction processing, and how to the Fraud Detection solution flags that activity for review.

Delight your customers with ML-based personalized recommendations (Session AIM323)

Recommendation engines make targeted marketing campaigns, re-ranking of items, personalized notifications, and personalized search possible. In this session, we deep-dive into using Amazon Personalize to create and manage personalized recommendations efficiently, letting you focus on the real value of the data for your business. We discover how these deep learning techniques have a direct impact on the bottom line of your business by increasing engagement, click-through, satisfaction, and revenue. Learn from customer examples and dive into some live demonstrations.

Accelerate time-series forecasting with Amazon Forecast (Workshop AIM335)

Based on the same technology used at Amazon.com, Amazon Forecast uses machine learning to combine time-series data with additional variables to build up to 50% more accurate forecasts. In this workshop, prepare a dataset, build models based on that dataset, evaluate a model’s performance based on real observations, and learn how to evaluate the value of a forecast compared with another. Gain the skills to make decisions that will impact the bottom line of your business.

Build a content-recommendation engine with Amazon Personalize (Workshop AIM304)

Machine learning is being used increasingly to improve customer engagement by powering personalized product and content recommendations. Amazon Personalize lets you easily build sophisticated personalization capabilities into your applications, using machine learning technology perfected from years of use on Amazon.com. In this workshop, you build your own recommendation engine by providing training data, building a model based on the algorithm of your choice, testing the model by deploying your Amazon Personalize campaign, and integrating it into your own application.

Advanced topics in machine learning

We have a number of sessions that will dive deep into the technical details of machine learning across our service portfolio as well as deep learning frameworks including TensorFlow, PyTorch, and Apache MXNet. These code-level sessions and hands-on workshops will enable the advanced developer or data scientist in you to customize, integrate, and solve many challenges with deep technical solutions.

Deep learning with TensorFlow (Session AIM410, Workshop AIM401)

TensorFlow is of the most popular open-source deep learning frameworks used in machine learning development. The advanced breakout session will dive deep into training machine learning models with TensorFlow using Amazon SageMaker, including distributed training, cost-effective inference, and workflow management. The code-level workshop will include hands-on exercises where we will train and deploy TensorFlow models, apply automatic model tuning using Amazon SageMaker, and make predictions in production.

Deep learning with PyTorch (Session AIM412, Workshop AIM402)

PyTorch is rapidly gaining popularity in the industry as a deep learning framework used to transition seamlessly from research prototyping to production deployment. In the breakout session, you will lern how to develop deep learning models with PyTorch using Amazon SageMaker for multiple use cases including using a BERT model and instance segmentation for fine-grain computer vision. In the workshop, you will build a natural language processing model to analyze text.

Deep learning with Apache MXNet (Session AIM411, Workshop AIM403)

Apache MXNet has been a widely used deep learning framework on diverse applications such as computer vision, speech recognition, and natural language processing (NLP). The breakout session will discuss on building computer vision and NLP models using MXNet to automatically extract information from documents. In the workshop, we will build a computer vision model using MXNet and train the model for high accuracy, and finally deploy it to production using Amazon SageMaker.

Deep dive on Project Jupyter (Session AIM413)

Amazon SageMaker offers fully managed Jupyter notebooks that you can use in the cloud so you can explore and visualize data and develop your machine learning model. In this session, we explain why we picked Jupyter notebooks, and how and why AWS is contributing to Project Jupyter. We dive deep into our overall strategy for Jupyter and explain different use cases for Jupyter, including data science, analytics, and simulation.

Under the hood of AWS DeepRacer: Advanced RL driving course (Workshop AIM428)

This technical deep dive is suitable for advanced machine learning developers looking to learn more complex reinforcement learning concepts using AWS DeepRacer and Amazon SageMaker RL. AWS data scientists help you build models that require innovations in neural network architecture, expand the algorithms, and help you customize your AWS DeepRacer model for performance. We also dive deep into the technology under the hood that powers the AWS DeepRacer car.

Optimize deep learning models for edge deployments with AWS DeepLens (Workshop AIM405)

In this workshop, learn how to optimize your computer vision pipelines for edge deployments with AWS DeepLens and Amazon SageMaker Neo. Also learn how to build a sample object detection model with Amazon SageMaker and deploy it to AWS DeepLens. Finally, learn how to optimize your deep learning models and code to achieve faster performance for use cases where speed matters.

Take an ML model from idea to production using Amazon SageMaker (Workshop AIM427)

Come build the most accurate text-classification model possible with Amazon SageMaker. This service lets you build, train, and deploy ML models using built-in or custom algorithms. In this workshop, learn how to leverage Keras/TensorFlow deep-learning frameworks to build a text-classification solution using custom algorithms on Amazon SageMaker. We walk you through packaging custom training code in a Docker container, testing it locally, and then using Amazon SageMaker to train a deep-learning model. You then try to iteratively improve the model to achieve high accuracy. Finally, you deploy the model in production so applications can leverage the classification service.

Implement ML workflows with Kubernetes and Amazon SageMaker (Session AIM326)

Until recently, data scientists have spent much time performing operational tasks, such as ensuring that frameworks, runtimes, and drivers for CPUs and GPUs work well together. In addition, data scientists needed to design and build end-to-end machine learning (ML) pipelines to orchestrate complex ML workflows for deploying ML models in production. With Amazon SageMaker, data scientists can now focus on creating the best possible models while enabling organizations to easily build and automate end-to-end ML pipelines. In this session, we dive deep into Amazon SageMaker and container technologies, and we discuss how easy it is to integrate such tasks as model training and deployment into Kubernetes and Kubeflow-based ML pipelines.

Security for ML environments with Amazon SageMaker (Session AIM327)

Amazon SageMaker is a modular, fully managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. In this session, we dive deep into the security configurations of Amazon SageMaker components, including notebooks, training, and hosting endpoints. Vanguard joins us to discuss the company’s use of Amazon SageMaker and its implementation of key controls in a highly regulated environment, including fine-grained access control, end-to-end encryption in transit, and comprehensive audit trails for resource and data access. If you want to build secure ML environments, this session is for you.

Machine Learning Summit

Whether you are a data scientist, machine learning practitioner, or business professional, you’ll enjoy the Machine Learning Summit at this year’s re:Invent, which will showcase advances in machine learning as well as the emerging trends. From disaster management to pediatrics, from fighting fake news to indoor farming, you will hear experts share their knowledge and perspectives.

Some of the sessions include:

Deep Learning for Disaster Management and Response
Cornelia Caragea, Associate Professor, Science and Engineering Offices,
Computer Science, University of Illinois at Chicago

Fighting Fake News and Deep Fakes with Machine Learning
Delip Rao, Vice President of Research at the AI Foundation

Deep Learning in Deep Nets: Helping Fish Farmers Feed the World
Bryton Shang, Founder and CEO, Aquabyte

Big Data for Tiny Patients: Applying ML to Pediatrics
Dr. Judith Dexheimer, Associate Professor, UC Department of Pediatrics,
Cincinnati Children’s Hospital Medical Center

Machine Learning and Society: Bias, Fairness and Explainability
Pietro Perona, Amazon Fellow, AWS

From Seed to Store: Using AI to Optimize the Indoor Farms of the Future
Henry Sztul, SVP, Science and Technology, Bowery Farming

The Machine Learning Summit will inform you about what’s on the horizon for machine learning. The event is scheduled for Tuesday, December 3, 2019, from 1:30 PM to 6 PM at the Venetian Theater. Visit the summit home page and register today.

About the Author

Shyam Srinivasan is on the AWS Machine Learning marketing team. He cares about making the world a better place through technology and loves being part of this journey. In his spare time, Shyam loves to run, travel, and have fun with his family and friends.

Learning to Smell: Using Deep Learning to Predict the Olfactory Properties of Molecules

Written on October 23, 2019. Posted in Google.

Posted by Alexander B Wiltschko, Senior Research Scientist, Google Research

Smell is a sense shared by an incredible range of living organisms, and plays a critical role in how they analyze and react to the world. For humans, our sense of smell is tied to our ability to enjoy food and can also trigger vivid memories. Smell allows us to appreciate all of the fragrances that abound in our everyday lives, be they the proverbial roses, a batch of freshly baked cookies, or a favorite perfume. Yet despite its importance, smell has not received the same level of attention from machine learning researchers as have vision and hearing.

Odor perception in humans is the result of the activation of 400 different types of olfactory receptors (ORs), expressed in 1 million olfactory sensory neurons (OSNs), in a small patch of tissue called the olfactory epithelium. These OSNs send signals to the olfactory bulb, and then to further structures in the brain. Based on analogous advances in deep learning for sight and sound, it should be possible to directly predict the end sensory result of an input molecule, even without knowing the intricate details of all the systems involved. Solving the odor prediction problem would aid in discovering new synthetic odorants, thereby reducing the ecological impact of harvesting natural products. Inspection of the resulting olfactory models may even lead to new insights into the biology of smell.

Small odorant molecules are the most basic building blocks of flavors and fragrances, and therefore represent the simplest version of the odor prediction problem. Yet each molecule can have multiple odor descriptors. Vanillin, for example, has descriptors such as sweet, vanilla, creamy, and chocolate, with some notes being more apparent than others. So odor prediction is also a multi-label classification problem.

In “Machine Learning for Scent: Learning Generalizable Perceptual Representations of Small Molecules”, we leverage graph neural networks (GNNs), a kind of deep neural network designed to operate on graphs as input, to directly predict the odor descriptors for individual molecules, without using any handcrafted rules. We demonstrate that this approach yields significantly improved performance in odor prediction compared to current state-of-the-art and is a promising direction for future research.

Graph Neural Networks for Odor Prediction
Since molecules are analogous to graphs, with atoms forming the vertices and bonds forming the edges, GNNs are the natural model of choice for their understanding. But how does one translate the structure of a molecule into a graph representation? Initially, every node in the graph is represented as a vector, using any preferred featurization — atom identity, atom charge, etc. Then, in a series of message passing steps, every node broadcasts its current vector value to each of its neighbors. An update function then takes the collection of vectors sent to it, and generates an updated vector value. This process can be repeated many times, until finally all of the nodes in the graph are summarized into a single vector via summing or averaging. That single vector, representing the entire molecule, can then be passed into a fully connected network as a learned molecular featurization. This network outputs a prediction for odor descriptors, as provided by perfume experts.

Each node is represented as a vector, and each entry in the vector initially encodes some atomic-level information.

For each node we look at adjacent nodes and collect their information, which is then transformed with a neural network into new information for the centered node. This procedure is performed iteratively. Other variants of GNNs utilize edge and graph-level information.

Illustration of a GNN for odor prediction. We translate the structure of molecules into graphs that are fed into GNN layers to learn a better representation of the nodes. These nodes are reduced into a single vector and passed into a neural network that is used to predict multiple odor descriptors.

This representation doesn’t know anything about spatial positions of atoms, and so it can’t distinguish stereoisomers, molecules made of the same atoms but in slightly different configurations that can smell different, such as (R)- and (S)-carvone. Nevertheless, we have found that even without distinguishing stereoisomers, in practice it is still possible to predict odor quite well.

For odor prediction, GNNs consistently demonstrate improved performance compared to previous state-of-the-art methods, such as random forests, which do not directly encode graph structure. The magnitude of the improvement depends on which odor one tries to predict.

Example of the performance of a GNN on odor descriptors against a strong baseline, as measured by the AUROC score. Example odor descriptors are picked randomly. Closer to 1.0 means better. In the majority of cases GNNs outperform the field-standard baseline substantially, with similar performance seen against other metrics (e.g., AUPRC, recall, precision).

Learning from the Model, and Extending It to Other Tasks
In addition to predicting odor descriptors, GNNs can be applied to other olfaction tasks. For example, take the case of classifying new or refined odor descriptors using only limited data. For each molecule, we extract a learned representation from an intermediate layer of the model that is optimized for our odor descriptors, which we call an “odor embedding”. One can think of this as an olfaction version of a color space, like RGB or CMYK. To see if this odor embedding is useful for predicting related but different tasks, we designed experiments that test our learned embedding on related tasks for which it was not originally designed. We then compared the performance of our odor embedding representation to a common chemoinformatic representation that encodes structural information of a molecule, but is agnostic to odor and found that the odor embedding generalized to several challenging new tasks, even matching state-of-the-art on some.

2D snapshot of our embedding space with some example odors highlighted. Left: Each odor is clustered in its own space. Right: The hierarchical nature of the odor descriptor. Shaded and contoured areas are computed with a kernel-density estimate of the embeddings.

Future Work
Within the realm of machine learning, smell remains the most elusive of the senses, and we’re excited to continue doing a small part to shed light on it through further fundamental research. The possibilities for future research are numerous, and touch on everything from designing new olfactory molecules that are cheaper and more sustainably produced, to digitizing scent, or even one day giving those without a sense of smell access to roses (and, unfortunately, also rotten eggs). We hope to also bring this problem to the attention of more of the machine learning world through the eventual creation and sharing of high-quality, open datasets.

Acknowledgements
This early research is the result of the work and advisement of a team of talented researchers and engineers in Google Brain — Benjamin Sanchez-Lengeling, Jennifer Wei, Brian Lee, Emily Reif, Carey Radebaugh, Max Bileschi, Yoni Halpern, and D. Sculley. We are delighted to have collaborated on this work with Richard Gerkin at ASU and Alán Aspuru-Guzik at the University of Toronto. We are of course building on an enormous amount of prior work, and have benefitted particularly from work by Justin Gilmer, George Dahl and others on fundamental methodology in GNNs, among many other works in neuroscience, statistics and chemistry. We are also grateful to helpful comments from Steven Kearnes, David Belanger, Joel Mainland, and Emily Mayhew.

AI’s New Onramp: Meet the Data Science PC

Written on October 23, 2019. Posted in NVIDIA.

The trip to AI and big-data analytics is now just a click away. Starting today, three NVIDIA partners are selling online a new class of computers we call data science PCs.

The systems bundle the hardware and software data scientists need to hit an “on” button and start managing datasets and models to make AI predictions. Data science PCs tap NVIDIA TITAN RTX GPUs and RAPIDS software to deliver 3-6x speed-ups compared to CPU-only desktops.

Three experts in building high-end PCs — Digital Storm, Maingear and Puget Systems — are offering the products now. They’re targeting an expanding class of independent data scientists to help them achieve better results faster.

data science PC benchmark — A data science PC handled extract-transform-load (ETL) and XGBoost training on a dataset derived from New York City taxis, delivering end-to-end predictions in one-sixth the time of a CPU-only desktop.

Some of the world’s largest and most innovative organizations are already using GPU-accelerated servers and workstations to tackle their demanding data-science jobs.

For example, Walmart’s supermarket of the future that can compute in real time more than 1.6 terabytes of data generated per second using NVIDIA’s EGX platform. The Summit system at Oak Ridge National Laboratory can tap its 27,648 NVIDIA V100 Tensor Core GPUs to drive 3.3 exaflops of mixed-precision horsepower on AI tasks.

But data science isn’t just for large enterprises. Startups, researchers, students and enthusiasts are jumping into this burgeoning field. They’re contributing to the corporate momentum making the role of data scientist one of the fastest growing jobs in the U.S.

The data science PC aims to fuel this growing class of independent data science practitioners. The combination of powerful, pre-configured systems and a tested software stack can jumpstart their work.

The Speeds and Feeds

Under the hood, a data science PC includes one or two TITAN RTX GPUs, each with up to 24GB of memory. NVLink high-speed interconnect technology connects the two GPUs to tackle datasets that demand more GPU memory.

The systems can accommodate 48-128GB of main memory and storage options include drives that range up to 10TB.

Each data science PC will ship with Linux and RAPIDS, NVIDIA’s data science software stack, powered by its popular CUDA-X AI programming libraries.

NVIDIA RAPIDS eases the job of porting existing code for GPU acceleration. Its APIs are modeled after popular libraries used in data science. In many cases, it’s only necessary to change a few lines of code in order to tap the potential of GPU acceleration.

Here are some of the key elements of RAPIDS:

cuDF is a Python GPU data-frame library for loading, joining, aggregating, filtering and otherwise manipulating data. The API is designed to be similar to Pandas, so existing code easily maps to the GPU.

cuML accelerates popular machine learning algorithms, including XGBoost, PCA, K-means, k-Nearest Neighbors and more. It is closely aligned with sciKit-learn.

cuGraph is a library of graph algorithms, similar to NetworkX, that works with data stored in a GPU data frame.

An ecosystem of startups in Inception, NVIDIA virtual accelerator program for startups focused on AI and data science, provides applications and services that run on top of RAPIDS. They include companies, such as Graphistry and OmniSci, that offer big-data visualization tools.

Data scientists can also use NVIDIA’s data science developer forum to ask questions and learn more about data science on GPUs.

The data science PC is here, ready to propel you to an AI future. Learn more from our partners Digital Storm, Maingear and Puget Systems.

The post AI’s New Onramp: Meet the Data Science PC appeared first on The Official NVIDIA Blog.

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT