Learn About Our Meetup

4500+ Members

Category: Global

AWS supports the Deepfake Detection Challenge with competition data and AWS credits

Today AWS is pleased to announce that it is working with Facebook, Microsoft, and the Partnership on AI on the first Deepfakes Detection Challenge.  The competition, to which we are contributing up to $1 million in AWS credits to researchers and academics over the next two years, is designed to produce technology that can be deployed to better detect when artificial intelligence has been used to alter a video in order to mislead the viewer. We plan to host the full competition dataset when it is made available later this year, and are offering the support of Amazon machine learning experts to help teams get started. We want to ensure access to this data for a diverse set of participants with varied perspectives to help develop the best possible solutions to combat the growing problem of “deepfakes.”

The same technology which has given us delightfully realistic animation effects in movies and video games, has also been used by bad actors to blur the distinction between reality and fiction. “Deepfake” videos manipulate audio and video using artificial intelligence to make it appear as though someone did or said something they didn’t. These techniques can be packaged up in to something as simple as a cell phone app, and are already being used to deliberately mislead audiences by spreading fake viral videos through social media. The fear is that deepfakes may become so realistic that they will be used to the detriment of reputations, to sway popular opinion, and could in time make any piece of information suspicious.

The Deepfakes Detection Challenge invites participants to build new approaches that can detect deepfake audio, video, and other tampered media. The challenge will kick off in December at the NeurIPS Conference with the release of a new dataset generated by Facebook which comprises tens of thousands of example videos, both real and fake. Competitors will use this dataset to design novel algorithms which can detect a real or fake video, and the algorithms will be evaluated against a secret test dataset (which will not be made available to ensure there is a standard, scientific evaluation of entries).

Building deepfake detectors will require novel algorithms which can process this vast library of data (more than 4 petabytes). AWS will work with DFDC partners to explore options for hosting the data set, including the use of Amazon S3, and we will make $1 million in AWS credits available to develop and test these sophisticated new algorithms. All participants will be able to request a minimum of $1,000 in AWS credits to get started, with additional awards granted in quantities of up to $10,000 as entries demonstrate viability or success in detecting deepfakes. Participants can visit to learn more and request AWS credits.

The Deepfakes Detection Challenge steering committee is sharing the first 5,000 videos of the dataset with researchers working in this field. The group will collect feedback and host a targeted technical working session at the International Conference on Computer Vision (ICCV) in Seoul beginning on October 27, 2019. Following this due diligence, the full data set release and the launch of the Deepfakes Detection Challenge will coincide with the Conference on Neural Information Processing Systems (NeurIPS) this December.

To support participants in this endeavor, AWS will also be providing access to Amazon ML Solutions Lab experts and solutions architects to help provide technical support and guidance to contestants to help teams get started in the challenge. The Amazon ML Solutions Lab is a dedicated service offering for AWS customers that provides access to the same talent that built many of Amazon’s machine learning-powered products and services. These Amazon experts help AWS customers utilize machine learning technology to build intelligent solutions that to address some of the world’s toughest challenges like predicting famine, identifying cancer faster, and expediting assistance to areas hard hit by natural disasters. Amazon ML Solutions Lab experts will be paired with Challenge participants to provide assistance throughout the competition.

In addition to serving as a founding member of the Partnership on AI, AWS is also joining the non-profit’s Steering Committee on AI and Media Integrity. The goal, as with sponsorship of the Deepfakes Deception Challenge, is to coordinate the activities of media, tech companies, governments, and academia to promote technologies and policies that strengthen trust in media and help audiences differentiate fact from fiction.

To learn more about the Deepfakes Detection Challenge and receive updates on how to register and participate, visit Stay tuned for more updates as we get closer to kick-off!


About the Author

Michelle Lee is vice president of the Machine Learning Solutions Lab at AWS.



5G Meets AI: NVIDIA CEO Details ‘Smart Everything Revolution,’ EGX for Edge AI, Partnerships with Leading Companies

The smartphone revolution that’s swept the globe over the past decade is just the start, NVIDIA CEO Jensen Huang declared Monday.

Next up: the “smart everything revolution,” Huang told a crowd of hundreds from telcos, device manufacturers, developers, and press at his keynote ahead of the Mobile World Congress gathering in Los Angeles this week.

“The smartphone revolution is the first of what people will realize someday is the IoT revolution, where everything is intelligent, where everything is smart,” Huang said. He squarely positioned NVIDIA to power AI at the edge of enterprise networks and in the virtual radio access networks – or vRANs – powering next-generation 5G wireless services.

Among the dozens of leading companies joining NVIDIA as customers and partners cited during Huang’s 90 minute address are WalMart — which is already building NVIDIA’s latest technologies into its showcase Intelligent Retail Lab — BMW, Ericsson, Microsoft, NTT, Procter & Gamble, Red Hat, and Samsung Electronics.

Anchoring NVIDIA’s story: the NVIDIA EGX edge supercomputing platform, a high-performance cloud-native edge computing platform optimized to take advantage of three key revolutions – AI, IoT and 5G – providing the world’s leading companies the ability to build next-generation services.

“The smartphone moment for edge computing is here and a new type of computer has to be created to provision these applications,” said Huang speaking at the LA Convention Center. He noted that if the global economy can be made just a little more efficient with such pervasive technology, the opportunity can be measured in “trillions of dollars per year.”

Ericsson Exec Joins on Stage Marking Collaboration

Ericsson’s Fredrik Jejdling, executive vice president and head of business area networks joined NVIDIA CEO Jensen Huang on stage to announce Ericsson and NVIDIA’s collaboration on 5G radio.

A key highlight: a new collaboration on 5G with Erisson to build high-performance software-defined radio access networks.

Joining Jensen on stage was Ericsson’s Fredrik Jejdling, executive vice president and head of business area networks. The company is a leader in the radio access network industry, one of the key building blocks for high-speed wireless networks.

“As an industry we’ve, in all honesty, been struggling to find alternatives that are better and higher performance than our current bespoke environment,” Jejdling said. “Our collaboration is figuring out an efficient way of providing that, combining your GPUs with our heritage.”

The collaboration brings Ericsson’s expertise in radio access network technology together with NVIDIA’s leadership in high-performance computing to fully virtualize the 5G Radio, giving telcos unprecedented flexibility.

Together NVIDIA and Ericsson are innovating to fuse 5G, supercomputing and AI for a revolutionary communications platform that will someday support trillions of always-on devices.

Red Hat, NVIDIA to Create Carrier-Grade Telecommunications Infrastructure

Red Hat, NVIDIA to create carrier-grade telecommunications infrastructure.

Huang also announced a new collaboration with Red Hat to building carrier-grade cloud native telecom infrastructure with EGX for AI, 5G RAN and other workloads.  The enterprise software provider already serves 120 telcos around the world, powering every member of the Fortune 500.

Together, NVIDIA and Red Hat will bring carrier-grade Kubernetes — which automates the deployment, scaling, and management of applications – to telcos so they can orchestrate and manage 5G RANs in a truly-software defined mobile edge.

“Red Hat is joining us to integrate everything we’re working on and make it a carrier grade stack,” Huang said. “The rest of the industry has joined us as well, every single data center computer maker, the world’s leading enterprise software makers, have all joined us to take this platform to market.”

Introducing the NVIDIA EGX edge supercomputing platform, a high-performance cloud-native edge computing platform optimized to take advantage of three key revolutions – AI, IoT and 5G.

NVIDIA Aerial to Accelerate 5G

For carriers, Huang also announced NVIDIA Aerial, a CUDA-X software developer kit running on top of EGX.

Aerial allows telecommunications companies to build completely virtualized 5G radio access networks that are highly programmable, scalable and energy efficient — enabling telcos to offer new AI services such as smart cities, smart factories, AR/VR and cloud gaming.

Technology for the Enterprise Edge

In addition to telcos, enterprises will also increasingly need high performance edge servers to make decisions from large amounts of data in real-time using AI.

EGX combines NVIDIA CUDA-X software, a collection of NVIDIA libraries that provide a flexible and high-performance programing language to developers,  with NVIDIA-certified GPU servers and devices.

The result enables companies to harness rapidly streaming data — from factory floors to manufacturing inspection lines to city streets — delivering AI and other next-generation services.

Microsoft, NVIDIA Technology Collaboration

To offer customers an end-to-end solution from edge to cloud, Microsoft and NVIDIA are working together in a new collaboration to more closely integrate Microsoft Azure with EGX. In addition, NVIDIA T4 GPUs are featured in a new form factor of Microsoft’s Azure Data Box edge appliance.

Other top technology companies collaborating with NVIDIA on the EGX platform include Cisco, Dell Technologies, Hewlett Packard Enterprise, Mellanox and VMware.

Walmart Adopts EGX to Create Store of the Future

Huang cited Walmart as an example of EGX’s power.

The retail giant is deploying it in its Levittown, New York, Intelligent Retail Lab. It’s a unique, fully operating grocery store where the retail giant explores the ways AI can further improve in-store shopping experiences.

Walmart is deploying EGX in its Levittown, New York, Intelligent Retail Lab.

Using EGX’s advanced AI and edge capabilities, Walmart can compute in real time more than 1.6 terabytes of data generated per second. This helps it use to automatically alert associates to restock shelves, open up new checkout lanes, retrieve shopping carts and ensure product freshness in meat and produce departments.

Just squeezing out a half a percent of efficiencies in the $30 trillion retail opportunity represents an enormous opportunity, Huang noted. “The opportunity for using automation to improve efficiency in retail is extraordinary,” Huang said.

BMW, Procter & Gamble, Samsung, Among Leaders Adopting EGX

That power is already being harnessed for a dizzying array of real-world applications across the world:

  • Korea’s Samsung Electronics, in another early EGX deployment, is using AI at the edge for highly complex semiconductor design and manufacturing processes.
  • Germany’s BMW is using intelligent video analytics and EGX edge servers in its South Carolina manufacturing facility to automate inspection.
  • Japan’s NTT East uses EGX in its data centers to develop new AI-powered services in remote areas through its broadband access network.
  • The U.S.’s Procter & Gamble the world’s top consumer goods company, is working with NVIDIA to develop AI-enabled applications on top of the EGX platform for the inspection of products and packaging.

Cities, too, are grasping the opportunity. Las Vegas uses EGX to capture vehicle and pedestrian data to ensure safer streets and expand economic opportunity.  And San Francisco’s prime shopping area, the Union Square Business Improvement District, uses EGX to capture real-time pedestrian counts for local retailers.

Stunning New Possibilities

To demonstrate the possibilities, Huang punctuated his keynote with demos showing what AI can unleash in the world around us.

In a flourish that stunned the crowd, Huang made a red McLaren Senna prototype — which carries a price of a hair under $1 million — materialize on stage in augmented reality. It could be viewed from any angle — including from the inside — on a smartphone streaming data over Verizon’s 5G network from a Verizon data center in Los Angeles

The technology behind the demo: Autodesk VRED running in a virtual machine on a Quadro RTX 8000 server. On the phone: a 5G client build with NVIDIA’s CloudXR client application software development kit for mobile devices and head mounted displays.

And, in a video, Huang showed how the Jarvis multi-modal AI was able to to follow queries from two different speakers conversing on different topics, the weather and restaurants, as they drove down the road – reacting to what the computer sees as well as what is said.

In another video, Jarvis guided a shopper through a purchase in a real-world store.

“In the future these kind of multi-modal AIs will make the conversation and the engagement you have with the AI much much better,” Huang said.

Cloud Gaming Goes Global

Huang also detailed how NVIDIA is expanding its cloud gaming network through partnerships with global telecommunications companies.

GeForce NOW, NVIDIA’s cloud gaming service, transforms underpowered or incompatible devices into a powerful GeForce gaming PC with access to popular online game stores.

Taiwan Mobile joins industry leaders rolling out GeForce NOW, including Korea’s LG U+, Japan’s Softbank, and Russia’s Rostelecom in partnership with GFN.RU. Additionally, Telefonica will kick-off a cloud gaming proof-of-concept in Spain.

Huang showed what’s now possible with a real-time demo of a gamer playing Assetto Corsa Competizione on GeForce Now — as a cameraman watched over his shoulder — on a smartphone over a 5G network. The gamer navigated through the demanding racing game’s action with no noticeable lag.

The mobile version of GeForce NOW for Android devices is available in Korea and will be available widely later this year, with a preview on display at Mobile World Congress Los Angeles.

“These servers are going to be the same servers that run intelligent agriculture and intelligent retail,” Huang said. “The future is software defined and these low latency services that need to be deployed at the edge can now be provisioned at the edge with these servers.”

A Trillion New Devices

The opportunities for AI, IoT, cloud gaming, augmented reality and 5G network acceleration are huge — with a trillion new IoT devices to be produced between now and 2035, according to industry estimates.

And GPUs are up to the challenge, with GPU computing power growing 300,000x from 2013, driving down the cost per teraflop of computing power, even as gains in CPU performance level off, Huang said.

NVIDIA is well positioned to help telcos and enterprises make the most of this by helping customers combine AI algorithms, powerful GPUs, smart NICs — or network interface cards, cloud native technologies, the NVIDIA EGX accelerated edge computing platform, and 5G high-speed wireless networks.

Huang compared all these elements to the powerful “infinity stones” featured in Marvel’s movies and comic books.

“What you’re looking at are the six miracles that will make it possible to put 5G at the edge, to virtualize the 5G data center and create a world of smart everything,” Huang said, and that, in turn, will add intelligence to everything in the world around us.

“This will be a pillar, a foundation for the smart everything revolution,” Huang said.

The post 5G Meets AI: NVIDIA CEO Details ‘Smart Everything Revolution,’ EGX for Edge AI, Partnerships with Leading Companies appeared first on The Official NVIDIA Blog.

Put AI Label on It: Startup Aids Annotators of Healthcare Training Data

Deep learning applications are data hungry. The more high-quality labeled data a developer feeds an AI model, the more accurate its inferences.

But creating robust datasets is the biggest obstacle for data scientists and developers building machine learning models, says Gaurav Gupta, CEO of, a member of the NVIDIA Inception virtual accelerator program.

The startup has created a web platform to help researchers and companies manage their data labeling workflow and use AI-assisted segmentation tools to improve the quality of their training datasets.

“When the labels are accurate, then the AI models learn faster and they reach higher accuracy faster,” said Gupta.

The company’s web interface, which runs on NVIDIA T4 GPUs for inference in Google Cloud, helped one healthcare radiology customer speed up labeling by 10x and decrease its labeling error rate by more than 15 percent.

The Devil Is in the Details 

The higher the data quality, the less data needed to achieve accurate results. A machine learning model can produce the same results after training on a million images with low-accuracy labels, Gupta says, or just 100,000 images with high-accuracy labels.

Getting data labeling right the first time is no easy task. Many developers outsource data labeling to companies or crowdsourced workers. It may take weeks to get back the annotated datasets, and the quality of the labels is often poor.

A rough annotated image of a car on the street, for example, may have a segmentation polygon around it that also includes part of the pavement, or doesn’t reach all the way to the roof of the car. Since neural networks parse images pixel by pixel, every mislabeled pixel makes the model less precise.

That margin of error is unacceptable for training a neural network that will eventually interact with people and objects in the real world — for example, identifying tumors from an MRI scan of the brain or controlling an autonomous vehicle.

Developers can manage their data labeling through’s web interface, while administrators can assign image labeling tasks to annotators, view metrics about individual data labelers’ performance and review the actual image annotations.

Using AI to Train Better AI 

When a data scientist first runs a machine learning model, it may only be 60 percent accurate. The developer then iterates several times to improve the performance of the neural network, each time adding new training data. is helping AI developers across industries use their early-stage machine learning models to ease the process of labeling new training data for future versions of the neural networks — a process known as active learning.

With this technique, the developer’s initial machine learning model can take the first pass at annotating the next set of training data. Instead of starting from scratch, annotators can just go through and tweak the AI-generated labels, saving valuable time and resources.

The startup offers active learning for data labeling across multiple industries. For healthcare data labeling, its platform integrates with the NVIDIA Clara Deploy SDK, allowing customers to use the software toolkit for AI-assisted segmentation of healthcare datasets.

Choose Your Own Annotation Adventure chose to deploy its platform on cloud-based GPUs to easily scale usage up and down based on customer demand. Researchers and companies using the tool can choose whether to use the interface online, connected to the cloud backend, or instead use a containerized application running on their own on-premises GPU system.

“It’s important for AI teams in healthcare to be able to protect patient information,” Gupta said. “Sometimes it’s necessary for them to manage the workflow of annotating data and training their machine learning models within the security of their private network. That’s why we provide Docker images to support on-premises annotation on local datasets.”

Balzano, a Swiss startup building deep learning models for radiologists, is using’s platform linked to an on-premises server of NVIDIA V100 Tensor Core GPUs. To develop training datasets for its musculoskeletal orthopedics AI tools, the company labels a few hundred radiology images each month. Adopting’s interface saved the company a year’s worth of engineering effort compared to building a similar solution from scratch.

“’s features allow us to annotate and segment anatomical features of the knee and cartilage more efficiently,” said Stefan Voser, chief operating officer and product manager at Balzano, which is also an Inception program member. “As we ramp up the annotation process, this platform will allow us to leverage AI capabilities and ensure the segmented images are high quality.”

Balzano and will showcase their latest demos in NVIDIA booth 10939 at the annual meeting of the Radiological Society of North America, Dec. 1-6 in Chicago.

The post Put AI Label on It: Startup Aids Annotators of Healthcare Training Data appeared first on The Official NVIDIA Blog.

Video Architecture Search

Video understanding is a challenging problem. Because a video contains spatio-temporal data, its feature representation is required to abstract both appearance and motion information. This is not only essential for automated understanding of the semantic content of videos, such as web-video classification or sport activity recognition, but is also crucial for robot perception and learning. Just like humans, an input from a robot’s camera is seldom a static snapshot of the world, but takes the form of a continuous video.

The abilities of today’s deep learning models are greatly dependent on their neural architectures. Convolutional neural networks (CNNs) for videos are normally built by manually extending known 2D architectures such as Inception and ResNet to 3D or by carefully designing two-stream CNN architectures that fuse together both appearance and motion information. However, designing an optimal video architecture to best take advantage of spatio-temporal information in videos still remains an open problem. Although neural architecture search (e.g., Zoph et al, Real et al) to discover good architectures has been widely explored for images, machine-optimized neural architectures for videos have not yet been developed. Video CNNs are typically computation- and memory-intensive, and designing an approach to efficiently search for them while capturing their unique properties has been difficult.

In response to these challenges, we have conducted a series of studies into automatic searches for more optimal network architectures for video understanding. We showcase three different neural architecture evolution algorithms: learning layers and their module configuration (EvaNet); learning multi-stream connectivity (AssembleNet); and building computationally efficient and compact networks (TinyVideoNet). The video architectures we developed outperform existing hand-made models on multiple public datasets by a significant margin, and demonstrate a 10x~100x improvement in network runtime.

EvaNet: The first evolved video architectures
EvaNet, which we introduce in “Evolving Space-Time Neural Architectures for Videos” at ICCV 2019, is the very first attempt to design neural architecture search for video architectures. EvaNet is a module-level architecture search that focuses on finding types of spatio-temporal convolutional layers as well as their optimal sequential or parallel configurations. An evolutionary algorithm with mutation operators is used for the search, iteratively updating a population of architectures. This allows for parallel and more efficient exploration of the search space, which is necessary for video architecture search to consider diverse spatio-temporal layers and their combinations. EvaNet evolves multiple modules (at different locations within the network) to generate different architectures.

Our experimental results confirm the benefits of such video CNN architectures obtained by evolving heterogeneous modules. The approach often finds that non-trivial modules composed of multiple parallel layers are most effective as they are faster and exhibit superior performance to hand-designed modules. Another interesting aspect is that we obtain a number of similarly well-performing, but diverse architectures as a result of the evolution, without extra computation. Forming an ensemble with them further improves performance. Due to their parallel nature, even an ensemble of models is computationally more efficient than the other standard video networks, such as (2+1)D ResNet. We have open sourced the code.

Examples of various EvaNet architectures. Each colored box (large or small) represents a layer with the color of the box indicating its type: 3D conv. (blue), (2+1)D conv. (orange), iTGM (green), max pooling (grey), averaging (purple), and 1×1 conv. (pink). Layers are often grouped to form modules (large boxes). Digits within each box indicate the filter size.

AssembleNet: Building stronger and better (multi-stream) models
In “AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures”, we look into a new method of fusing different sub-networks with different input modalities (e.g., RGB and optical flow) and temporal resolutions. AssembleNet is a “family” of learnable architectures that provide a generic approach to learn the “connectivity” among feature representations across input modalities, while being optimized for the target task. We introduce a general formulation that allows representation of various forms of multi-stream CNNs as directed graphs, coupled with an efficient evolutionary algorithm to explore the high-level network connectivity. The objective is to learn better feature representations across appearance and motion visual clues in videos. Unlike previous hand-designed two-stream models that use late fusion or fixed intermediate fusion, AssembleNet evolves a population of overly-connected, multi-stream, multi-resolution architectures while guiding their mutations by connection weight learning. We are looking at four-stream architectures with various intermediate connections for the first time — 2 streams per RGB and optical flow, each one at different temporal resolutions.

The figure below shows an example of an AssembleNet architecture, found by evolving a pool of random initial multi-stream architectures over 50~150 rounds. We tested AssembleNet on two very popular video recognition datasets: Charades and Moments-in-Time (MiT). Its performance on MiT is the first above 34%. The performances on Charades is even more impressive at 58.6% mean Average Precision (mAP), whereas previous best known results are 42.5 and 45.2.

The representative AssembleNet model evolved using the Moments-in-Time dataset. A node corresponds to a block of spatio-temporal convolutional layers, and each edge specifies their connectivity. Darker edges mean stronger connections. AssembleNet is a family of learnable multi-stream architectures, optimized for the target task.
A figure comparing AssembleNet with state-of-the-art, hand-designed models on Charades (left) and Moments-in-Time (right) datasets. AssembleNet-50 or AssembleNet-101 has an equivalent number of parameters to a two-stream ResNet-50 or ResNet-101.

Tiny Video Networks: The fastest video understanding networks
In order for a video CNN model to be useful for devices operating in a real-world environment, such as that needed by robots, real-time, efficient computation is necessary. However, achieving state-of-the-art results on video recognition tasks currently requires extremely large networks, often with tens to hundreds of convolutional layers, that are applied to many input frames. As a result, these networks often suffer from very slow runtimes, requiring at least 500+ ms per 1-second video snippet on a contemporary GPU and 2000+ ms on a CPU. In Tiny Video Networks, we address this by automatically designing networks that provide comparable performance at a fraction of the computational cost. Our Tiny Video Networks (TinyVideoNets) achieve competitive accuracy and run efficiently, at real-time or better speeds, within 37 to 100 ms on a CPU and 10 ms on a GPU per ~1 second video clip, achieving hundreds of times faster speeds than the other human-designed contemporary models.

These performance gains are achieved by explicitly considering the model run-time during the architecture evolution and forcing the algorithm to explore the search space while including spatial or temporal resolution and channel size to reduce computations. The below figure illustrates two simple, but very effective architectures, found by TinyVideoNet. Interestingly the learned model architectures have fewer convolutional layers than typical video architectures: Tiny Video Networks prefers lightweight elements, such as 2D pooling, gating layers, and squeeze-and-excitation layers. Further, TinyVideoNet is able to jointly optimize parameters and runtime to provide efficient networks that can be used by future network exploration.

TinyVideoNet (TVN) architectures evolved to maximize the recognition performance while keeping its computation time within the desired limit. For instance, TVN-1 (top) runs at 37 ms on a CPU and 10ms on a GPU. TVN-2 (bottom) runs at 65ms on a CPU and 13ms on a GPU.
CPU runtime of TinyVideoNet models compared to prior models (left) and runtime vs. model accuracy of TinyVideoNets compared to (2+1)D ResNet models (right). Note that TinyVideoNets take a part of this time-accuracy space where no other models exist, i.e., extremely fast but still accurate.

To our knowledge, this is the very first work on neural architecture search for video understanding. The video architectures we generate with our new evolutionary algorithms outperform the best known hand-designed CNN architectures on public datasets, by a significant margin. We also show that learning computationally efficient video models, TinyVideoNets, is possible with architecture evolution. This research opens new directions and demonstrates the promise of machine-evolved CNNs for video understanding.

This research was conducted by Michael S. Ryoo, AJ Piergiovanni, and Anelia Angelova. Alex Toshev and Mingxing Tan also contributed to this work. We thank Vincent Vanhoucke, Juhana Kangaspunta, Esteban Real, Ping Yu, Sarah Sirajuddin, and the Robotics at Google team for discussion and support.

Microsoft and Nuance join forces in quest to help doctors turn their focus back to patients

Imagine a visit to your doctor’s office in which your physician asks you how you’ve been feeling, whether your medication is working or if the shoulder pain from an old fall is still bothering you — and his or her focus is entirely on you and that conversation.

The doctor is looking at you, not at a computer screen. He or she isn’t moving a mouse around hunting for an old record or pecking on the keyboard to enter a diagnosis code.

This sounds like an ideal scenario, but as most people know from their own visits to the doctor, it’s far from the norm today.

But experts say that in an exam room of the future enhanced by artificial intelligence, the doctor would be able to call up a lab result or prescribe a new medicine with a simple voice command. She or he wouldn’t be distracted by entering symptoms into your electronic health record (EHR). And at the end of the visit, the essential elements of the conversation would have been securely captured and distilled into concise documentation that can be shared with nurses, specialists, insurance companies or anyone else you’ve entrusted with your care.

A new strategic partnership between Microsoft and Nuance Communications Inc. announced today will work to accelerate and deliver this level of ambient clinical intelligence to exam rooms, allowing ambient sensing and conversational AI to take care of some of the more burdensome administrative tasks and to provide clinical documentation that writes itself. That, in turn, will allow doctors to turn their attention fully to taking care of patients.

Of course, there are still immense technical challenges to getting to that ideal scenario of the future. But the companies say they believe that they already have a strong foundation in features from Nuance’s ambient clinical intelligence (ACI) technology unveiled earlier this year and Microsoft’s Project EmpowerMD Intelligent Scribe Service. Both are using AI technologies to learn how to convert doctor-patient conversations into useful clinical documentation, potentially reducing errors, saving doctors’ time and improving the overall physician experience.

“Physicians got into medicine because they wanted to help and heal people, but they are spending a lot of their time today outside of the care process,” said Joe Petro, Nuance executive vice president and chief technology officer. “They’re entering in data to make sure the appropriate bill can be generated. They’re capturing insights for population health and quality measures. And although this data is all important, it’s really outside a physician’s core focus on treating that patient.”


YouTube Video

Primary care doctors spend two hours on administrative tasks for every hour they’re involved in direct patient care, studies have shown. If they don’t capture a patient’s complaint or treatment plan during or shortly after an exam, that documentation burden will snowball as the day goes on. In another recent study, physicians reported one to two hours of after-hours work each night, mostly related to administrative tasks.

This shift to digital medical record keeping and so-called ‘meaningful use’ regulations is well-intentioned and has provided some important benefits, said Dr. Ranjani Ramamurthy, senior director at Microsoft Healthcare who leads the company’s EmpowerMD research.

People no longer have to worry about not being able to read a doctor’s handwriting or information that never makes it into the right paper file. But the unintended consequence has been that doctors are sometimes forced to focus on their computers and administrative tasks instead of their patients, she said.

After starting her career in computer science, Ramamurthy went back to school to get a medical degree and pursue cancer research. But as she walked the halls of the hospital every day, she couldn’t help thinking that she was missing an opportunity to use her background to create tech solutions that could reinvigorate the doctor-patient relationship.

Ramamurthy noted that most physicians got into healthcare because they want to use their skills and expertise to treat patients, not to feel tethered to their keyboards.

“We need to work on building frictionless systems that take care of the doctors so they can do what they do best, which is take care of patients,” she said.

Built on Microsoft Azure — and working in tandem with the EHR — this new technology will marry the two companies’ strengths in developing ambient sensing and conversational AI solutions. Those include ambient listening with patient consent, wake-up word, voice biometrics, signal enhancement, document summarization, natural language understanding, clinical intelligence and text-to-speech.

Nuance is a leading provider of AI-powered clinical documentation and decision-making support for physicians. Leveraging deep strategic partnerships with the major providers of EHRs, the company has spent decades developing medically relevant speech recognition and processing solutions such as its Dragon Medical One platform, which allows doctors to easily and naturally enter a patient’s story and relevant information into an EHR using dictation. Nuance conversational AI technologies are already used by more than 500,000 physicians worldwide, as well as in 90 percent of U.S. hospitals.

Microsoft brings deep research investments in AI and partner-driven healthcare technologies, commercial relationships with nearly 170,000 healthcare organizations, and enterprise-focused cloud and AI services that accelerate and enable scalable commercial solutions. Earlier this month, for instance, Microsoft announced a strategic collaboration to combine its AI technology with Novartis’ deep life sciences expertise to address challenges in developing new drugs.

In other areas, Azure Cognitive Services offers easy-to-deploy AI tools for speech recognition, computer vision and language understanding, and trusted Azure cloud services can support the user’s compliance with privacy and regulatory requirements for healthcare organizations.

As part of the agreement, Nuance will migrate the majority of its current on-site internal infrastructure and hosted products to Microsoft Azure. Nuance already is a Microsoft Office 365 customer for its more than 8,500 employees worldwide, empowering them with the latest in collaboration and communications tools, including Microsoft Teams.

“We need to work on building frictionless systems that take care of the doctors so they can do what they do best, which is take care of patients.”

~ Dr. Ranjani Ramamurthy, senior director at Microsoft Healthcare

“Just capturing a conversation between two people has been a thorny technical problem for a long time, and a lot of companies have attempted to crack it,” Petro said. “This partnership brings two trusted healthcare superpowers together to solve some of the most difficult challenges and also to leverage the most innovative advances we’ve made in AI, speech and natural language processing.”

The companies will expand upon Nuance’s early success with ACI and expect the technology to be introduced to an initial set of physician specialties in early 2020, and then it will be expanded to numerous other medical specialties over the next few years, Petro said. Initially, the ACI output may be checked by a remote reviewer with medical expertise to provide an important quality check and produce additional training data for the AI models. Once the system has proven its accuracy for a given physician, the ACI documentation will go directly to that physician, who can review it, make any necessary revisions and sign off on a treatment plan all in real-time, Petro said.

With a patient’s consent, ACI is designed to securely ingest and synthesize patient-doctor conversations, integrate that data with information from an EHR, populate a patient’s chart and also help the EHR deliver intelligent recommendations to the doctor.

With innovations in multi-party speech recognition, language understanding and computer vision, these tools can listen to the encounter between the doctor and a patient who grants consent, sense whether they’re pointing to a left knee or right knee when verbally describing a particular pain, extract medically relevant details and translate what just occurred in the exam room into actionable clinical documentation and care suggestions.

“Moving forward, we recognize that reducing the burden of clinical documentation is just the beginning,” said Dr. Greg Moore, Microsoft’s corporate vice president for health technology and alliances. “As the core AI improves and becomes more capable, it will be able to understand much more deeply what is going on by observing doctors and nurses in their day to day work. Ambient clinical intelligence will be able to work in tandem with the EHR to help convert those observations into supportive, augmenting actions.”

For instance, an AI-enabled system can learn to recognize when a doctor is talking to a patient about a new medication, and it can automatically review past conversations as well as the patient’s history to reduce the risk of a drug interaction or allergic reaction. Or it can mine a patient’s complicated medical history with new reported symptoms and offer suggestions for potential diagnoses for the doctor to consider.

In addition, the two companies will open up the ACI platform to an ecosystem of partners than can bring other highly valuable AI innovations to the exam room or at the bedside where the ambient sensing device will be present.

“We want ambient clinical intelligence to assist the EHR in delivering recommendations at the time when it matters — not three days later on your patient portal or when a nurse follows up, but when the doctor and patient are face to face and when that information can actually inform care,” Ramamurthy said.


The post Microsoft and Nuance join forces in quest to help doctors turn their focus back to patients appeared first on The AI Blog.

The Buck Starts Here: NVIDIA’s Ian Buck on What’s Next for the AI Revolution

AI is still young, but software is available to help even relatively unsophisticated users harness it.

That’s according to Ian Buck, general manager of NVIDIA’s accelerated computing group, who shared his views in our latest AI Podcast.

Buck, who helped lay the foundation for GPU computing as a Stanford doctoral candidate, will deliver the keynote address at GTC DC on Nov. 5. His talk will give an audience inside the Beltway a software-flavored update on the status and outlook of AI.

Like the tech industry, the U.S. government is embracing deep learning. “A few years ago, there was still some skepticism, but today that’s not the case,” said Buck.

Federal planners have “gotten the message for sure. You can see from the executive orders coming out and the work of the Office of Science and Technology Policy that they are putting out mandates and putting money into budgets — it’s great to see that literally billions of dollars are being invested,” he said.

The next steps will include nurturing a wide variety of AI projects to come.

“We have the mandate and budget, now we have to help all the agencies and parts of the government down to state and local levels help take advantage of this disruptive technology in areas like predictive maintenance, traffic congestion, power-grid management and disaster relief,” Buck said.

From Computer Vision to Tougher Challenges

On the commercial horizon, users already deeply engaged in AI are moving from work in computer vision to tougher challenges in natural language processing. The neural network models needed to understand human speech can be hundreds of thousands of times larger than the early models used, for example, to identify breeds of cats in the seminal 2012 ImageNet contest.

“Conversational AI represents a new level of complexity and a new level of opportunity with new use cases,” Buck said.

AI is definitely hard, he said. The good news is that companies like NVIDIA are bundling 80 percent of the software modules users need to get started into packages tailored for specific markets such as Clara for healthcare or Metropolis for smart cities.

Unleashing GPUs

Software is a field close to Ian Buck’s heart. As part of his PhD work, he developed the Brook language to harness the power of GPUs for parallel computing. His efforts evolved into CUDA, GPU programming tools at the foundation of offerings such as Clara, Metropolis and NVIDIA DRIVE software for automated vehicles.

Users “can program down at the CUDA level” or at the higher level of frameworks such as Pytorch and TensorFlow, “or go up the stack to work with our vertical market solutions,” Buck said.

It’s a journey that’s just getting started.

“AI will be pervasive all the way down to the doorbell and thermostat. NVIDIA’s mission is to help enable that future,” Buck said.

To hear our full conversation with Buck and other AI luminaries, tune into our AI Podcast wherever you download your podcasts.

(You can see Buck’s keynote live by attending GTC DC. Use the promotional code GMPOD for a 20 percent discount.) 

Help Make the AI Podcast Better

Have a few minutes to spare? Fill out this short listener survey. Your answers will help us make a better podcast.

How to Tune in to the AI Podcast

Get the AI Podcast through iTunes, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Stitcher and TuneIn. Your favorite not listed here? Email us at aipodcast [at] nvidia [dot] com.

The post The Buck Starts Here: NVIDIA’s Ian Buck on What’s Next for the AI Revolution appeared first on The Official NVIDIA Blog.

The AWS DeepRacer League and countdown to the re:Invent Championship Cup 2019

The AWS DeepRacer League is the world’s first autonomous racing league, open to anyone. Announced at re:Invent 2018, it puts machine learning in the hands of every developer in a fun and exciting way. Throughout 2019, developers of all skill levels have competed in the League at 21 Amazon events globally, including Amazon re:MARS and select AWS Summits, and put their skills to the test in the League’s virtual circuit via the AWS DeepRacer console. The League concludes at re:Invent 2019. Log in today and start racing—time is running out to win an expenses paid trip to re:Invent!

The final AWS Summit race in Toronto

In the eight months since the League kicked off in Santa Clara, the League has visited 17 countries, with thousands of developers completing over 13,000 laps and 165 miles of track. Each city has crowned its champion, and we will see each of them at re:Invent 2019!

On October 3, 2019, the 21st and final AWS DeepRacer Summit race took place in Toronto, Canada. The event concluded in-person racing for the AWS DeepRacer League, and not one, but four expenses paid trips were up for grabs.

First was the crowning of our Toronto champion Mohammad Al Ansari, with a winning time of 7.85 seconds, just 0.4 seconds away from beating the current world record of 7.44 seconds. Mohammad came to the AWS Summit with his colleague from Myplanet, where they took part in an AWS-led workshop for AWS DeepRacer to learn more about machine learning. They then made connections with AWS DeepRacer communities and received support from AWS DeepRacer enthusiasts such as Lyndon Leggate, a recently announced AWS ML Hero.

The re:Invent line up is shaping up

Once the racing concluded, it was time to tally up the scores for the overall competition and name the top three overall Summit participants. Foreign Exchange IT specialist Ray Goh traveled from Singapore to compete in his fourth race in his quest to top the overall leaderboard. Ray previously attended the Singapore, Hong Kong, and re:Mars races, and has steadily improved his models all year. He closed out the season with his fastest time of 8.15 seconds at the Toronto race. The other two spots went to ryan@ACloudGuru and Raycha@Kakao, who have also secured their place in the knockouts at re:Invent along with the 21 Summit Champions.

It could be you that lifts the Championship Cup

The Championship Cup at re:Invent is sure to be filled with fun and surprises, so watch this space for more information. There is still time for developers of all skill levels to advance to the knockouts. Compete now in the final AWS DeepRacer League Virtual Circuit, and it could be you who is the Champion of the 2019 AWS DeepRacer League!


About the Author

Alexandra Bush is a Senior Product Marketing Manager for AWS AI. She is passionate about how technology impacts the world around us and enjoys being able to help make it accessible to all. Out of the office she loves to run, travel and stay active in the outdoors with family and friends.



Calculating new stats in Major League Baseball with Amazon SageMaker

The 2019 Major League Baseball (MLB) postseason is here after an exhilarating regular season in which fans saw many exciting new developments. MLB and Amazon Web Services (AWS) teamed up to develop and deliver three new, real-time machine learning (ML) stats to MLB games: Stolen Base Success Probability, Shift Impact, and Pitcher Similarity Match-up Analysis. These features are giving fans a deeper understanding of America’s pastime through Statcast AI, MLB’s state-of-the-art technology for collecting massive amounts of baseball data and delivering more insights, perspectives, and context to fans in every way they’re consuming baseball games.

This post looks at the role machine learning plays in providing fans with deeper insights into the game. We also provide code snippets that show the training and deployment process behind these insights on Amazon SageMaker.

Machine learning steals second

Stolen Base Success Probability provides viewers with a new depth of understanding of the cat and mouse game between the pitcher and the baserunner.

To calculate the Stolen Base Success Probability, AWS used MLB data to train, test, and deploy an ML model that analyzes thousands of data points covering 37 variables that, together, determine whether or not a player safely arrives at second if he attempts to steal. Those variables include the runner’s speed and burst, the catcher’s average pop time to second base, the pitcher’s velocity and handedness, historical stolen base success rates for the runner, batter, and pitcher, along with relevant data about the game context.

We took a 10-fold cross-validation approach to explore a range of classification algorithms, such as logistic regression, support vector machines, random forests, and neural networks, by using historical play data from 2015 to 2018 provided by MLB that corresponds to ~7.3K stolen base attempts with ~5.5K successful stolen bases and ~1.8K runners caught stealing. We applied numerous strategies to deal with the class imbalance, including class weights, custom loss functions, and sampling strategies, and found that the best performing model for predicting the probability of stolen base success was a deep neural network trained on an Amazon Deep Learning (DL) AMI, pre-configured with popular DL frameworks. The trained model was deployed using Amazon SageMaker, which provided the subsecond response times required for integrating predictions into in-game graphics in real-time, and on ML instances that auto-scaled across multiple Availability Zones. For more information, see Deploy trained Keras or TensorFlow models using Amazon SageMaker.

As the player on first base contemplates stealing second, viewers can see his Stolen Base Success Probability score in real-time right on their screens.

MLB offered fans a pilot test and preview of Stolen Base Success Probability during the 2018 postseason. Thanks to feedback from broadcasters and fans, MLB and AWS collaborated during the past offseason to develop an enhanced version with new graphics, improved latency of real-time stats for replays, and a cleaner look. One particular enhancement is the “Go Zone,” the point along the baseline where the player’s chances of successfully making the steal reaches a minimum of 85%.

As the player extends his lead towards second, viewers can now see the probability changing dynamically and a jump in his chances of success when he hits the “Go Zone.” After the runner reaches second base, whether he gets called “safe” or “out,” viewers have the opportunity during a replay to see data generated from a variety of factors that may have determined the ultimate outcome, like the runner’s sprint speed and the catcher’s pop time. Plus, that data is color-coded in green, yellow, and red to help fans visualize the factors that played the most significant roles in determining whether or not the player successfully made it to second.

Predicting impact of infield defensive strategies

Over the last decade, there have been few changes in MLB as dramatic as the rise of the infield shift, a “situational defensive realignment of fielders away from their traditional starting points.” Teams use the shift to exploit batted-ball patterns, such as a batter’s tendency to pull batted balls (right field for left-handed hitters and left field for right-handed hitters). As a batter steps up to the plate, the defensive infielders adjust their positions to cover the area where the batter has historically hit the ball into play.

Using Statcast AI data, teams can give their defense an advantage by shifting players to prevent base hits—and teams are employing this strategy more often now than at any other time in baseball history. League-wide shifting rates have increased by 86% over the last three years, up to 25.6% in 2019 from 13.8% in 2016.

AWS and MLB teamed up to employ machine learning to give baseball fans insight into the effectiveness of a shifting strategy. We developed a model to estimate the Shift Impact—the change in a hitter’s expected batting average on ground balls—as he steps up to the plate, using historical data and Amazon SageMaker. As infielders move around the field, the Shift Impact dynamically updates by re-computing the expected batting average with the changing positions of the defenders. This provides a real-time experience for fans.

Using data to quantify the Shift Impact

A spray chart can illustrate the tendency batters have in hitting balls towards a particular direction. The chart indicates the percentage at which a player’s batted balls are hit through various sections of the field. The following chart shows the 2018 spray distribution of Joey Gallo’s (from the Texas Rangers) batted balls hit within the infielders’ reach, defined as having a projected distance of less than 200 feet away from home plate. For more information, see Joey Gallo’s current stats on Baseball Savant.

The preceding chart shows the tendency to pull the ball toward right field for Joey Gallo, who hit 74% of his balls to the right of second base in 2018. A prepared defense can take advantage of this observation by overloading the right side of the infield, cutting short the trajectory of the ball and increasing the chance of converting the batted ball into an out.

We estimated the value of specific infield alignments against batters based on their historical batted-ball distribution by taking into account the last three seasons of play, or approximately 60,000 batted balls in the infield. For each of these at-bats, we gathered the launch angle and exit velocity of the batted ball and infielder positions during the pitch, while looking up the known sprint speed and handedness of the batter. While there are many metrics for offensive production in baseball, we chose to use batting average on balls in play—that is, the probability of a ball in play resulting in a base hit.

We calculated how effective a shift might be by estimating the amount by which a specific alignment decreases our offensive measure. After deriving new features, such as the projected landing path of the ball and one-hot encoding the categorical variables, the data was ready for ingestion into various ML frameworks to estimate the probability that a ball in play results in a base hit. From that, we could compute the changes to the probability due to changing infielder alignments.

Using Amazon SageMaker to calculate Shift Impact

We trained ML models on more than 50,000 at-bat samples. We found that the results of a Bayesian search through a hyperparameter optimization (HPO) job using Amazon SageMaker’s Automatic Model Tuning feature over the pre-built XGBoost algorithm on Amazon SageMaker returned the most performant predictions with overall precision of 88%, recall of 88%, and an f1 score of 88% on the validation set of nearly 10,000 events. Launching an HPO job on Amazon SageMaker is as simple as defining the parameters to describe the job, then firing it off to the backend services that manage the core infrastructure (Amazon EC2, Amazon S3, Amazon ECS) to iterate through the defined hyperparameter space efficiently and find the optimal model.

The code snippets shown utilize boto3, the Python API for AWS products and tools. Amazon SageMaker also offers the SageMaker Python SDK, an open source library with several high-level abstractions for working with Amazon SageMaker and popular deep learning frameworks.

Defining the HPO job

We started by setting up the Amazon SageMaker client and defining the tuning job. This specifies which parameters to vary during tuning, along with the evaluation metric we wish to optimize towards. In the following code, we set it to minimize the log loss on the validation set:

import boto3
from sagemaker import get_execution_role
from import get_image_uri

sm_client = boto3.Session().client('sagemaker')
xgboost_image = get_image_uri(boto3.Session().region_name, 'xgboost')
role = get_execution_role()

tuning_job_config = {
    "ParameterRanges": {
      "CategoricalParameterRanges": [],
      "ContinuousParameterRanges": [
          "MaxValue": "1",
          "MinValue": "0",
          "Name": "eta"
          "MaxValue": "2",
          "MinValue": "0",
          "Name": "alpha"
      "IntegerParameterRanges": [
          "MaxValue": "10",
          "MinValue": "1",
          "Name": "max_depth"
    "ResourceLimits": {
      "MaxNumberOfTrainingJobs": 100,
      "MaxParallelTrainingJobs": 10
    "Strategy": "Bayesian",
    "HyperParameterTuningJobObjective": {
      "MetricName": "validation:logloss",
      "Type": "Minimize"
training_job_definition = {
    "AlgorithmSpecification": {
      "TrainingImage": xgboost_image,
      "TrainingInputMode": "File"
    "InputDataConfig": [
        "ChannelName": "train",
        "CompressionType": "None",
        "ContentType": "csv",
        "DataSource": {
          "S3DataSource": {
            "S3DataDistributionType": "FullyReplicated",
            "S3DataType": "S3Prefix",
            "S3Uri": s3_input_train # path to training data
        "ChannelName": "validation",
        "CompressionType": "None",
        "ContentType": "csv",
        "DataSource": {
          "S3DataSource": {
            "S3DataDistributionType": "FullyReplicated",
            "S3DataType": "S3Prefix",
            "S3Uri": s3_input_validation # path to validation data
    "OutputDataConfig": {
      "S3OutputPath": s3_output # outpath path for model artifacts
    "ResourceConfig": {
      "InstanceCount": 2,
      "InstanceType": "ml.c4.2xlarge",
      "VolumeSizeInGB": 10
    "RoleArn": role,
    "StaticHyperParameters": {
      "eval_metric": "logloss",
      "objective": "binary:logistic",
      "rate_drop": "0.3",
      "tweedie_variance_power": "1.4",
    "StoppingCondition": {
      "MaxRuntimeInSeconds": 43200

Launching the HPO job

With the tuning job defined in the Python dictionary above, we now submit it to the Amazon SageMaker client, which then automates the process of launching EC2 instances with containers optimized to run XGBoost from ECS. See the following code:

sm_client.create_hyper_parameter_tuning_job(HyperParameterTuningJobName = "tuning_job_name",
                                            HyperParameterTuningJobConfig = tuning_job_config,
                                            TrainingJobDefinition = training_job_definition)

During the game, we can analyze a given batter with his most recent at-bats and run those events through the model for all infielder positions as laid out on a grid. Since the amount of compute required for inference increases geometrically as the size of each grid cell is reduced, we adjusted the size to reach a balance between the resolution required for meaningful predictions and compute time. For example, consider a shortstop that shifts over to his left. If he moves over by only one foot, there will be a negligible effect on the outcome of a batted ball. However, if he repositions himself 10 feet to his left, that can very well put himself in a better position to field a ground ball pulled to right field. Examining all at-bats in our dataset, we found such a balance on a grid composed of 10-foot by 10-foot cells, accounting for more than 10,000 infielder configurations.

The process of obtaining the best performing model from the HPO job and deploying to production follows in the next section. Due to the large number of calls required for real-time inference, the results of the model are prepopulated into a lookup table that provides the relevant predictions during a live game.

Deploying the most performant model

Each tuning job launches a number of training jobs, from which the best model is selected according to the criteria defined earlier when configuring the HPO. From Amazon SageMaker, we first pull the best training job and its model artifacts. These are stored in the S3 bucket from which the training and validation datasets were pulled. See the following code:

# get best model from HPO job
best_training_job = smclient.describe_hyper_parameter_tuning_job(
info = smclient.describe_training_job(TrainingJobName=best_training_job['TrainingJobName'])
model_name = best_training_job['TrainingJobName'] + '-model'
model_data = info['ModelArtifacts']['S3ModelArtifacts']

Next, we refer to the pre-configured container optimized to run XGBoost models and link it to the model artifacts of the best-trained model. Once this model-container pair is created on our account, we can configure an endpoint with the instance type, number of instances, and traffic splits (for A/B testing) of our choice:

create_model_response = sm_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    PrimaryContainer = {
        'Image': xgboost_image,
        'ModelDataUrl': model_data})

# create endpoint configuration
endpoint_config_name = model_name+'-endpointconfig'
create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,

# create endpoint
endpoint_name = model_name+'-endpoint'
create_endpoint_response = smclient.create_endpoint(
resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']

print("Arn: " + resp['EndpointArn'])

Inference from the endpoint

The Amazon SageMaker runtime client makes predictions from the model, and sends a request to the endpoint hosting the model container on an EC2 instance and returns the output. We can configure entry points of the endpoint for custom models and data processing steps:

# invoke endpoint
runtime_client = boto3.client('runtime.sagemaker')
random_payload = np.array2string(np.random.random(num_features), separator=',', max_line_width=np.inf)[1:-1]
response = runtime_client.invoke_endpoint(EndpointName=endpoint_name, Body=random_payload)
prediction = response['Body'].read().decode("utf-8")

With all of the predictions for a given batter and infielder configurations, we then average the probability of a base hit returned from the model stored in the lookup table and subtract the expected batting average for the same sample of batted balls. The resulting metric is the Shift Impact.

Matchup Analysis

In interleague games, where teams from the American and National leagues compete against each other, many batters face pitchers they have never seen before. Estimating outcomes in interleague games is difficult because there is limited relevant historical data. AWS worked with MLB to group similar pitchers together to gain insight on how the batter has historically performed against similar pitchers. We took a machine learning approach, which allowed us to combine the domain knowledge of experts with data comprised of hundreds of thousands of pitches to find additional patterns we could use to identify similar pitchers.


Taking inspiration from the field of recommendation systems, in which the matching problem is typically solved by computing a user’s inclination towards a product, here we seek to determine the interaction between a pitcher and batter. There are many algorithms appropriate to building recommenders, but few that allow us to then cluster like items that are put into the algorithm. Neural networks shine in this area. End layers in a neural network architecture can be interpreted as numerical representations of the input data, whether it be an image or a pitcher ID. Given input data, its associated numerical representation–or embedding–can be compared against the embeddings of other input items. Those embeddings that lie near each other are similar, not just in this embedding space, but also in interpretable characteristics. For example, we expect handedness to play a role in defining which pitchers are similar. This approach to recommendation systems and clustering items is known as deep matrix factorization.

Deep matrix factorization accounts for nonlinear interactions between a pair of entities, while also mixing in the techniques of content-based and collaborative filtering. Rather than working solely with a pitcher-batter matrix, as in matrix factorization, we build a neural network that aligns each pitcher and batter with their own embedding and then pass them through a series of hidden layers that are trained towards predicting the outcome of a pitch. In addition to the collaborative nature of this architecture, additional contextual data is included for each pitch such as the count, number of runners on base, and the score.

The model is optimized against the predicted outcome of each pitch, including both the pitch characteristics (slider, changeup, fastball, etc.) and the outcome (ball, single, strike, swinging strike, etc.). After training a model on this classification problem, the end layer of the pitcher ID input is extracted as the embedding for that particular pitcher.


As a batter steps up to the plate against a pitcher he hasn’t faced before, we search for the nearest embeddings to that of the opposing pitcher and calculate the on-base plus slugging percentage (OPS) against that group of pitchers. To see the results in action, see 9/11/19: FSN-Ohio executes OPS comparison.


MLB uses cloud computing to create innovative experiences that introduce additional ways for fans to experience baseball. With Stolen Base Success Probability, Shift Impact, and Pitcher Similarity Match-up Analysis, MLB provides compelling, real-time insight into what’s happening on the field and a greater connection to the context that builds the unique drama of the game that fans love.

This postseason, fans will have many opportunities to see stolen base probability in action, the potential effects of infield alignments, and launch into debates with friends about what makes pitchers similar.

Fans can expect to see these new stats in live game broadcasts with partners such as ESPN and MLB Network. Plus, other professional sports leagues including the NFL and Formula 1 have selected AWS as their cloud and machine learning provider of choice.

You can find full, end-to-end examples of implementing an HPO job on Amazon SageMaker at the AWSLabs GitHub repo. If you’d like help accelerating your use of machine learning in your products and processes, please contact the Amazon ML Solutions Lab program.

About the Authors

Hussain Karimi is a data scientist at the Amazon ML Solutions Lab, where he works with AWS customers to develop machine learning models that uncover unique insights in various domains.





Travis Petersen is a Senior Data Scientist at MLB Advanced Media and an adjunct professor at Fordham University.





Priya Ponnapalli is a principal scientist and manager at Amazon ML Solutions Lab, where she helps AWS customers across different industries accelerate their AI and cloud adoption.

Heard Mentality: AI Voice Startup Helps Hear Customer Pain Points

Eleven years ago, Carnegie Mellon University alumni Anthony Gadient, Edward Lin and Rob Rutenbar were hunkered down in a garage, chowing pizza over late nights of coding. Eighteen months later, voice startup Voci emerged as a spinout from CMU.

Voci, like that of many early AI researchers, became a reality as a startup because of breakthroughs in deep neural networks paired with advances in GPU computing.

“Our academic roots are based in this idea that you can do better by taking advantage of application-specific hardware such as NVIDIA GPUs,” said Gadient, Voci’s chief strategy officer and co-founder.

Automated Speech Recognition 

Voci’s V-Blaze automated speech recognition offers real-time speech-to-text and audio analytics to analyze conversations between customers and call center representatives. The data can be used by customers to understand the sentiment and emotion of speakers.

Voci can provide customers with an open API to pipe the data into customer experience and sales applications.

Companies can use Voci to track what customers are saying about competitive products and different features offered elsewhere.

“There’s valuable data in those call center communications,” said Gadient.

AI Closes Deal

Voci’s automated speech recognition provides data to indicate how well sales representatives are handling calls, allowing companies to improve interactions with real-time feedback on best practices drawn from Voci’s metadata that drives products from analytics companies.

“Sales is very interesting in terms of understanding what message is effective and what is the reaction emotionally on the part of the potential buyer to different messaging,” he said.

Understanding the underlying emotion and sentiment is valuable for a number of these applications, said Gadient.

Voci’s customers include analytics companies such as Clairabridge, Call Journey and EpiAnalytics, which tap into the startup’s API for metadata that can highlight issues for customers.

Biometrics for Voice 

Voci is also addressing a problem that plagues automated customer service systems: caller verification. Many of these systems ask callers a handful of verification questions and then ask those same questions again if live support is required or if the call gets transferred.

Instead, Voci has developed an API for “voiceprints” that can identify people by voice, bypassing the maze of verification questions.

“Biometrics for voice is a problem worth solving, if only for our collective sanity. It enables machine verification of callers in the background instead of those maddening repeated questions you can face when handed off from operator to operator in a call center,” said Gadient.

GPU-Accelerated NLP 

Voci uses a multitude of neural networks and techniques to offer its natural language processing services. The service is offered either on premises or in the cloud and taps into NVIDIA V100 Tensor Core GPUs for inference.

For example, the company uses convolutional neural networks to process audio data and recurrent neural networks for language modeling to make predictions about text.

Developers at Voci trained their networks on more than 20,000 hours of audio from customers seeking results for their businesses.

“It took approximately one month to train the neural nets on a network of machines running a combination of NVIDIA P100 and V100 GPUs,” said Gadient.

Voci is a member of NVIDIA Inception, a virtual accelerator program that helps startups get to market faster.


The post Heard Mentality: AI Voice Startup Helps Hear Customer Pain Points appeared first on The Official NVIDIA Blog.

Exploring Massively Multilingual, Massive Neural Machine Translation

“… perhaps the way [of translation] is to descend, from each language, down to the common base of human communication — the real but as yet undiscovered universal language — and then re-emerge by whatever particular route is convenient.”Warren Weaver, 1949

Over the last few years there has been enormous progress in the quality of machine translation (MT) systems, breaking language barriers around the world thanks to the developments in neural machine translation (NMT). The success of NMT however, owes largely to the great amounts of supervised training data. But what about languages where data is scarce, or even absent? Multilingual NMT, with the inductive bias that “the learning signal from one language should benefit the quality of translation to other languages”, is a potential remedy.

Multilingual machine translation processes multiple languages using a single translation model. The success of multilingual training for data-scarce languages has been demonstrated for automatic speech recognition and text-to-speech systems, and by prior research on multilingual translation [1,2,3]. We previously studied the effect of scaling up the number of languages that can be learned in a single neural network, while controlling the amount of training data per language. But what happens once all constraints are removed? Can we train a single model using all of the available data, despite the huge differences across languages in data size, scripts, complexity and domains?

In “Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges” and follow-up papers [4,5,6,7], we push the limits of research on multilingual NMT by training a single NMT model on 25+ billion sentence pairs, from 100+ languages to and from English, with 50+ billion parameters. The result is an approach for massively multilingual, massive neural machine translation (M4) that demonstrates large quality improvements on both low- and high-resource languages and can be easily adapted to individual domains/languages, while showing great efficacy on cross-lingual downstream transfer tasks.

Massively Multilingual Machine Translation
Though data skew across language-pairs is a great challenge in NMT, it also creates an ideal scenario in which to study transfer, where insights gained through training on one language can be applied to the translation of other languages. On one end of the distribution, there are high-resource languages like French, German and Spanish where there are billions of parallel examples, while on the other end, supervised data for low-resource languages such as Yoruba, Sindhi and Hawaiian, is limited to a few tens of thousands.

The data distribution over all language pairs (in log scale) and the relative translation quality (BLEU score) of the bilingual baselines trained on each one of these specific language pairs.

Once trained using all of the available data (25+ billion examples from 103 languages), we observe strong positive transfer towards low-resource languages, dramatically improving the translation quality of 30+ languages at the tail of the distribution by an average of 5 BLEU points. This effect is already known, but surprisingly encouraging, considering the comparison is between bilingual baselines (i.e., models trained only on specific language pairs) and a single multilingual model with representational capacity similar to a single bilingual model. This finding hints that massively multilingual models are effective at generalization, and capable of capturing the representational similarity across a large body of languages.

Translation quality comparison of a single massively multilingual model against bilingual baselines that are trained for each one of the 103 language pairs.

In our EMNLP’19 paper [5], we compare the representations of multilingual models across different languages. We find that multilingual models learn shared representations for linguistically similar languages without the need for external constraints, validating long-standing intuitions and empirical results that exploit these similarities. In [6], we further demonstrate the effectiveness of these learned representations on cross-lingual transfer on downstream tasks.

Visualization of the clustering of the encoded representations of all 103 languages, based on representational similarity. Languages are color-coded by their linguistic family.

Building Massive Neural Networks
As we increase the number of low-resource languages in the model, the quality of high-resource language translations starts to decline. This regression is recognized in multi-task setups, arising from inter-task competition and the unidirectional nature of transfer (i.e., from high- to low-resource). While working on better learning and capacity control algorithms to mitigate this negative transfer, we also extend the representational capacity of our neural networks by making them bigger by increasing the number of model parameters to improve the quality of translation for high-resource languages.

Numerous design choices can be made to scale neural network capacity, including adding more layers or making the hidden representations wider. Continuing our study on training deeper networks for translation, we utilized GPipe [4] to train 128-layer Transformers with over 6 billion parameters. Increasing the model capacity resulted in significantly improved performance across all languages by an average of 5 BLEU points. We also studied other properties of very deep networks, including the depth-width trade-off, trainability challenges and design choices for scaling Transformers to over 1500 layers with 84 billion parameters.

While scaling depth is one approach to increasing model capacity, exploring architectures that can exploit the multi-task nature of the problem is a very plausible complementary way forward. By modifying the Transformer architecture through the substitution of the vanilla feed-forward layers with sparsely-gated mixture of experts, we drastically scale up the model capacity, allowing us to successfully train and pass 50 billion parameters, which further improved translation quality across the board.

Translation quality improvement of a single massively multilingual model as we increase the capacity (number of parameters) compared to 103 individual bilingual baselines.

Making M4 Practical
It is inefficient to train large models with extremely high computational costs for every individual language, domain or transfer task. Instead, we present methods [7] to make these models more practical by using capacity tunable layers to adapt a new model to specific languages or domains, without altering the original.

Next Steps
At least half of the 7,000 languages currently spoken will no longer exist by the end of this century*. Can multilingual machine translation come to the rescue? We see the M4 approach as a stepping stone towards serving the next 1,000 languages; starting from such multilingual models will allow us to easily extend to new languages, domains and down-stream tasks, even when parallel data is unavailable. Indeed the path is rocky, and on the road to universal MT many promising solutions appear to be interdisciplinary. This makes multilingual NMT a plausible test bed for machine learning practitioners and theoreticians interested in exploring the annals of multi-task learning, meta-learning, training dynamics of deep nets and much more. We still have a long way to go.

This effort is built on contributions from Naveen Arivazhagan, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Chen, Yuan Cao, Yanping Huang, Sneha Kudugunta, Isaac Caswell, Aditya Siddhant, Wei Wang, Roee Aharoni, Sébastien Jean, George Foster, Colin Cherry, Wolfgang Macherey, Zhifeng Chen and Yonghui Wu. We would also like to acknowledge support from the Google Translate, Brain, and Lingvo development teams, Jakob Uszkoreit, Noam Shazeer, Hyouk Joong Lee, Dehao Chen, Youlong Cheng, David Grangier, Colin Raffel, Katherine Lee, Thang Luong, Geoffrey Hinton, Manisha Jain, Pendar Yousefi and Macduff Hughes.

* The Cambridge Handbook of Endangered Languages (Austin and Sallabank, 2011).

Next Meetup




Plug yourself into AI and don't miss a beat


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.