Blog

Learn About Our Meetup

4500+ Members

Category: Global

Spotting Clouds on the Horizon: AI Resolves Uncertainties in Climate Projections

Climate researchers look into the future to project how much the planet will warm in coming decades — but they often rely on decades-old software to conduct their analyses.

This legacy software architecture is difficult to update with new methodologies that have emerged in recent years. So a consortium of researchers is starting from scratch, writing a new climate model that leverages AI, new software tools and NVIDIA GPUs.

Scientists from Caltech, MIT, the Naval Postgraduate School and NASA’s Jet Propulsion Laboratory are part of the initiative, named the Climate Modeling Alliance — or CliMA.

“Computing has advanced quite a bit since the ‘60s,” said Raffaele Ferrari, oceanography professor at MIT and principal investigator on the project. “We know much more than we did at that time, but a lot was hard-coded into climate models when they were first developed.”

Building a new climate model from the ground up allows climate researchers to better account for small-scale environmental features, including cloud cover, rainfall, sea ice and ocean turbulence.

These variables are too geographically miniscule to be precisely captured in climate models, but can be better approximated using AI. Incorporating the AI’s projections into the new climate model could reduce uncertainties by half compared to existing models.

The team is developing the new model using Julia, an MIT-developed programming language that was designed for parallelism and distributed computation, allowing the scientists to accelerate their climate model calculations using NVIDIA V100 Tensor Core GPUs onsite and on Google Cloud.

As the project progresses, the researchers plan to use supercomputers like the GPU-powered Summit system at Oak Ridge National Labs as well as commercial cloud resources to run the new climate model — which they hope to have running within the next five years.

AI Turns the Tide

Climate scientists use physics and thermodynamics equations to calculate the evolution of environmental variables like air temperature, sea level and rainfall. But it’s incredibly computationally intensive to run these calculations for the entire planet. So in existing models, researchers divide the globe into a grid of 100-square-kilometer sections.

They calculate every 100 km block independently, using mathematical approximations for smaller features like turbulent eddies in the ocean and low-lying clouds in the sky — which can measure less than one kilometer across. As a result, when stringing the grid back together into a global model, there’s a margin of uncertainty introduced in the output.

Small uncertainties can make a significant difference, especially when climate scientists are estimating for policymakers how many years it will take for average global temperature to rise by more than two degrees Celcius. Due to the current levels of uncertainty, researchers project that, with current emission levels, this threshold could be crossed as soon as 2040 — or as late as 2100.

“That’s a huge margin of uncertainty,” said Ferrari. “Anything to reduce that margin can provide a societal benefit estimated in trillions of dollars. If one knows better the likelihood of changes in rainfall patterns, for example, then everyone from civil engineers to farmers can decide what infrastructure and practices they may need to plan for.”

A Deep Dive into Ocean Data

The MIT researchers are focusing on building the ocean elements of CliMA’s new climate model. Covering around 70 percent of the planet’s surface, oceans are a major heat and carbon dioxide reservoir. To make ocean-related climate projections, scientists look at such variables as water temperature, salinity and velocity of ocean currents.

One such dynamic is turbulent streams of water that flow around in the ocean like “a lot of little storms,” Ferrari said. “If you don’t account for all that swirling motion, you strongly underestimate how the ocean is absorbing heat and carbon.”

Using GPUs, researchers can narrow the resolution of their high-resolution simulations from 100 square kilometers down to one square kilometer, dramatically reducing uncertainties. But these simulations are too expensive to directly incorporate into a climate model that looks decades into the future.

That’s where an AI model that learns from fine-resolution ocean and cloud simulations can help.

“Our goal is to run thousands of high-resolution simulations, one for each 100-by-100 kilometer block, that will resolve the small-scale physics presently not captured by climate models,” said Chris Hill, principal research engineer at MIT’s earth, atmospheric and planetary sciences department.

These high-resolution simulations produce abundant synthetic data. That data can be combined with sparser real-world measurements, creating a robust training dataset for an AI model that estimates the impact of small-scale physics like ocean turbulence and cloud patterns on large-scale climate variables.

CliMA researchers can then plug these AI tools into the new climate model software, improving the accuracy of long-term projections.

“We’re betting a lot on GPU technology to provide a boost in compute performance,” Hill said.

MIT hosted in June a weeklong GPU hackathon, where developers — including Hill’s team as well as research groups from other universities — used the CUDA parallel computing platform and the Julia programming language for projects such as ocean modeling, plasma fusion and astrophysics.

For more on how AI and GPUs accelerate scientific research, see the NVIDIA higher education page. Find the latest NVIDIA hardware discounts for academia on our educational pricing page.

Image by Tiago Fioreze, licensed from Wikimedia Commons under Creative Commons 3.0 license.

The post Spotting Clouds on the Horizon: AI Resolves Uncertainties in Climate Projections appeared first on The Official NVIDIA Blog.

Full ML Engineer scholarships from Udacity and the AWS DeepRacer Scholarship Challenge

The growth of artificial intelligence could create 58 million net new jobs in the next few years, states the World Economic Forum [1]. Yet, according to the Tencent Research Institute, it’s estimated that currently there are 300,000 AI engineers worldwide, but millions are needed [2]. As you can tell, there is a unique and immediate opportunity to develop creative experiences and introduce you—no matter what your developer skill levels are—to essential ML concepts. These experiences in fields of ML like deep learning, reinforcement learning, and so on, will expand your skills and help close the talent gap.

To help you advance your AI/ML capabilities with hands-on and fun ML learning experiences, I am thrilled to announce the AWS DeepRacer Scholarship Challenge. 

What is AWS DeepRacer?

In November 2018, Jeff Barr announced the launch of AWS DeepRacer on the AWS News Blog as a new way to learn ML. With AWS DeepRacer, you have an opportunity to get hands-on with a fully autonomous 1/18th-scale race car driven by reinforcement learning, a 3D racing simulator, and a global racing league.

What is the AWS DeepRacer Scholarship Challenge?

AWS and Udacity are collaborating to educate developers of all skill levels on ML concepts.  Those skills are reinforced by putting them to the test through the world’s first autonomous racing league—the AWS DeepRacer League.

Students enrolled in the AWS DeepRacer Scholarship Challenge who have the top lap times can win full scholarships to the Machine Learning Engineer nanodegree program. The Udacity Nanodegree program is a unique online educational offering designed to bridge the gap between learning and career goals. 

How does the AWS DeepRacer Scholarship Challenge work?

The program begins August 1, 2019 and runs through October 31, 2019. You can join the scholarship community at any point during these three months and immediately enroll in Udacity’s specialized AWS DeepRacer course. Register now to be in pole position for the start of the race.

After enrollment, you go through the AWS DeepRacer course, which consists of short, step-by-step modules (90 minutes in total). The modules prepare you to create, train, and fine-tune a reinforcement learning model in the AWS DeepRacer 3D racing simulator. Throughout the program and during each race, you have access to a custom scholarship student community to get pro tips from experts and exchange ideas with your classmates.

Each month, you can pit your skills against others in virtual races in the AWS DeepRacer console. Students compete for top spots in each month’s unique race course. Students that record the top lap times in August, September, and October 2019 qualify for one of 200 full scholarships to the Udacity Machine Learning Engineer nanodegree program, sponsored by Udacity.

Next steps

To get notified about the scholarship program and enrollment dates, register now. For a program FAQ, see AWS DeepRacer Scholarship Challenge.

Developers, start your engines! The first challenge starts August 1, 2019!


[1] Artificial Intelligence To Create 58 Million New Jobs By 2022, Says Report (Forbes)
[2] Tencent says there are only 300,000 AI engineers worldwide, but millions are needed (The Verge)


About the Author

Tara Shankar Jana is a Senior Product Marketing Manager for AWS Machine Learning. Currently he is working on building unique and scalable educational offerings for the aspiring ML developer communities- to help them expand their skills on ML. Outside of work he loves reading books, travelling and spending time with his family.

 

Parrotron: New Research into Improving Verbal Communication for People with Speech Impairments



Most people take for granted that when they speak, they will be heard and understood. But for the millions who live with speech impairments caused by physical or neurological conditions, trying to communicate with others can be difficult and lead to frustration. While there have been a great number of recent advances in automatic speech recognition (ASR; a.k.a. speech-to-text) technologies, these interfaces can be inaccessible for those with speech impairments. Further, applications that rely on speech recognition as input for text-to-speech synthesis (TTS) can exhibit word substitution, deletion, and insertion errors. Critically, in today’s technological environment, limited access to speech interfaces, such as digital assistants that depend on directly understanding one’s speech, means being excluded from state-of-the-art tools and experiences, widening the gap between what those with and without speech impairments can access.

Project Euphonia has demonstrated that speech recognition models can be significantly improved to better transcribe a variety of atypical and dysarthric speech. Today, we are presenting Parrotron, an ongoing research project that continues and extends our effort to build speech technologies that help those with impaired or atypical speech to be understood by both people and devices. Parrotron consists of a single end-to-end deep neural network trained to convert speech from a speaker with atypical speech patterns directly into fluent synthesized speech, without an intermediate step of generating text—skipping speech recognition altogether. Parrotron’s approach is speech-centric, looking at the problem only from the point of view of speech signals—e.g., without visual cues such as lip movements. Through this work, we show that Parrotron can help people with a variety of atypical speech patterns—including those with ALS, deafness, and muscular dystrophy—to be better understood in both human-to-human interactions and by ASR engines.

The Parrotron Speech Conversion Model
Parrotron is an attention-based sequence-to-sequence model trained in two phases using parallel corpora of input/output speech pairs. First, we build a general speech-to-speech conversion model for standard fluent speech, followed by a personalization phase that adjusts the model parameters to the atypical speech patterns from the target speaker. The primary challenge in such a configuration lies in the collection of the parallel training data needed for supervised training, which consists of utterances spoken by many speakers and mapped to the same output speech content spoken by a single speaker. Since it is impractical to have a single speaker record the many hours of training data needed to build a high quality model, Parrotron uses parallel data automatically derived with a TTS system. This allows us to make use of a pre-existing anonymized, transcribed speech recognition corpus to obtain training targets.

The first training phase uses a corpus of ~30,000 hours that consists of millions of anonymized utterance pairs. Each pair includes a natural utterance paired with an automatically synthesized speech utterance that results from running our state-of-the-art Parallel WaveNet TTS system on the transcript of the first. This dataset includes utterances from thousands of speakers spanning hundreds of dialects/accents and acoustic conditions, allowing us to model a large variety of voices, linguistic and non-linguistic contents, accents, and noise conditions with “typical” speech all in the same language. The resulting conversion model projects away all non-linguistic information, including speaker characteristics, and retains only what is being said, not who, where, or how it is said. This base model is used to seed the second personalization phase of training.

The second training phase utilizes a corpus of utterance pairs generated in the same manner as the first dataset. In this case, however, the corpus is used to adapt the network to the acoustic/phonetic, phonotactic and language patterns specific to the input speaker, which might include, for example, learning how the target speaker alters, substitutes, and reduces or removes certain vowels or consonants. To model ALS speech characteristics in general, we use utterances taken from an ALS speech corpus derived from Project Euphonia. If instead we want to personalize the model for a particular speaker, then the utterances are contributed by that person. The larger this corpus is, the better the model is likely to be at correctly converting to fluent speech. Using this second smaller and personalized parallel corpus, we run the neural-training algorithm, updating the parameters of the pre-trained base model to generate the final personalized model.

We found that training the model with a multitask objective to predict the target phonemes while simultaneously generating spectrograms of the target speech led to significant quality improvements. Such a multitask trained encoder can be thought of as learning a latent representation of the input that maintains information about the underlying linguistic content.

Overview of the Parrotron model architecture. An input speech spectrogram is passed through encoder and decoder neural networks to generate an output spectrogram in a new voice.

Case Studies
To demonstrate a proof of concept, we worked with our fellow Google research scientist and mathematician Dimitri Kanevsky, who was born in Russia to Russian speaking, normal-hearing parents but has been profoundly deaf from a very young age. He learned to speak English as a teenager, by using Russian phonetic representations of English words, learning to pronounce English using transliteration into Russian (e.g., The quick brown fox jumps over the lazy dog => ЗИ КВИК БРАУН ДОГ ЖАМПС ОУВЕР ЛАЙЗИ ДОГ). As a result, Dimitri’s speech is substantially distinct from native English speakers, and can be challenging to comprehend for systems or listeners who are not accustomed to it.

Dimitri recorded a corpus of 15 hours of speech, which was used to adapt the base model to the nuances specific to his speech. The resulting Parrotron system helped him be better understood by both people and Google’s ASR system alike. Running Google’s ASR engine on the output of Parrotron significantly reduced the word error rate from 89% to 32%, on a held out test set from Dimitri. Below is an example of Parrotron’s successful conversion of input speech from Dimitri:

Input from Dimitri Audio
Output from Parrotron Audio

We also worked with Aubrie Lee, a Googler and advocate for disability inclusion, who has muscular dystrophy, a condition that causes progressive muscle weakness, and sometimes impacts speech production. Aubrie contributed 1.5 hours of speech, which has been instrumental in showing promising outcomes of the applicability of this speech-to-speech technology. Below is an example of Parrotron’s successful conversion of input speech from Aubrie:

Input from Aubrie Audio
Output from Parrotron Audio
Input from Aubrie Audio
Output from Parrotron Audio

We also tested Parrotron’s performance on speech from speakers with ALS by adapting the pretrained model on multiple speakers who share similar speech characteristics grouped together, rather than on a single speaker. We conducted a preliminary listening study and observed an increase in intelligibility when comparing natural ALS speech to the corresponding speech obtained from running the Parroton model, for the majority of our test speakers.

Cascaded Approach
Project Euphonia has built a personalized speech-to-text model that has reduced the word error rate for a deaf speaker from 89% to 25%, and ongoing research is also likely to improve upon these results. One could use such a speech-to-text model to achieve a similar goal as Parrotron by simply passing its output into a TTS system to synthesize speech from the result. In such a cascaded approach, however, the recognizer may choose an incorrect word (roughly 1 out 4 times, in this case)—i.e., it may yield words/sentences with unintended meaning and, as a result, the synthesized audio of these words would be far from the speaker’s intention. Given the end-to-end speech-to-speech training objective function of Parrotron, even when errors are made, the generated output speech is likely to sound acoustically similar to the input speech, and thus the speaker’s original intention is less likely to be significantly altered and it is often still possible to understand what is intended:

Input from Dimitri Audio
Output from Parrotron Audio
Input from Dimitri Audio
Output from Parrotron/Input to Assistant Audio
Output from Assistant Audio
Input from Aubrie Audio
Output from Parrotron Audio

Furthermore, since Parrotron is not strongly biased to producing words from a predefined vocabulary set, input to the model may contain completely new invented words, foreign words/names, and even nonsense words. We observe that feeding Arabic and Spanish utterances into the US-English Parrotron model often results in output which echoes the original speech content with an American accent, in the target voice. Such behavior is qualitatively different from what one would obtain by simply running an ASR followed by a TTS. Finally, by going from a combination of independently tuned neural networks to a single one, we also believe there are improvements and simplifications that could be substantial.

Conclusion
Parrotron makes it easier for users with atypical speech to talk to and be understood by other people and by speech interfaces, with its end-to-end speech conversion approach more likely to reproduce the user’s intended speech. More exciting applications of Parrotron are discussed in our paper. If you would like to participate in this ongoing research, please fill out this short form and volunteer to record a set of phrases. We look forward to working with you!

Acknowledgements
This project was joint work between the Speech and Google Brain teams. Contributors include Fadi Biadsy, Ron J. Weiss, Pedro Moreno, Dimitri Kanevsky, Ye Jia, Suzan Schwartz, Landis Baker, Zelin Wu, Johan Schalkwyk, Yonghui Wu, Zhifeng Chen, Patrick Nguyen, Aubrie Lee, Andrew Rosenberg, Bhuvana Ramabhadran, Jason Pelecanos, Julie Cattiau, Michael Brenner, Dotan Emanuel and Joel Shor. Our data collection efforts have been vastly accelerated by our collaborations with ALS-TDI.

Evening the Odds: Cornell’s STORK AI Tool Evaluates Embryo Candidates for Better IVF

There’s less than a 50 percent chance that a round of in vitro fertilization — one of the most common treatments for infertility, running up to $15,000 — will succeed. But those odds could be dramatically improved with an AI tool developed by researchers at Cornell University.

Introduced in 1978, IVF is a process through which eggs are fertilized with sperm in a lab, creating multiple embryos that can be transferred into a patient’s uterus. Clinics monitor embryo development to pick the highest-quality embryos for transfer, improving the odds of pregnancy.

Still, less than half of transferred blastocysts (embryos that have grown for around five days) successfully implant in a patient’s uterus, according to the CDC. That figure drops below 15 percent for patients over the age of 40.

Trained and tested on a dataset of over 10,000 time-lapse images of human embryos, Cornell researchers created an AI model dubbed STORK that uses convolutional neural networks to analyze embryo growth and evaluate which candidates are most likely to lead to successful implantation.

To increase the probability of pregnancy, clinics often transfer multiple embryos at once. And that carries risks.

“This can lead to twins, triplets and other multiples, which adds to the complications,” said Iman Hajirasouliha, assistant professor of computational genomics at Weill Cornell Medicine. “If we can reliably predict the implantation success rate based on an algorithm, then we can limit the number of transfers.”

Betting on the Best Embryo Candidate

Over 2.5 million cycles of IVF are performed each year, resulting in around 500,000 births. For each of these cycles, the task of choosing which embryos are most likely to result in a successful pregnancy lies with a team of embryologists.

These experts manually grade the developing embryos based on time-lapse images — a time-consuming and subjective evaluation. With no universal grading system, there’s little agreement among embryologists on which are the best embryo candidates.

The scientists developing STORK found that a panel of five embryologists unanimously agreed less than 25 percent of the time on whether an embryo was high, fair or low quality.

In contrast, STORK’s predictions agreed with the embryologist panel’s majority vote more than 95 percent of the time — suggesting that the tool may outperform individual embryologists and bring better consistency to the embryo evaluation process.

AI is also much faster at analyzing the image data. A clinic that treats around 4,000 people a year may have three embryologists manually evaluate embryo candidates for each patient. STORK can evaluate embryo candidate quality for 2,000 patients in just four minutes.

The Cornell researchers developed the deep learning model using the TensorFlow framework and four NVIDIA GPUs, accelerating the training process up to 4x over CPUs.

So far, the scientists have tested their tool on embryo images from clinics in New York, Spain and the United Kingdom. They hope any IVF facility that collects time-series images of embryos could use the tool.

However, embryo quality is just one clinical factor behind IVF success rates. Patient age is a key variable affecting the probability of implantation — and the likelihood of a healthy full-term pregnancy.

To better assess the rate of successful pregnancy and live birth, the researchers have developed a decision tree model that incorporates STORK’s embryo quality analyses as well as patient age data.

The post Evening the Odds: Cornell’s STORK AI Tool Evaluates Embryo Candidates for Better IVF appeared first on The Official NVIDIA Blog.

Pricing housing just right: Entrata enables apartments to fill capacity with Amazon SageMaker and 1Strategy

The housing market is complex.  There is a continuously changing supply of student housing units around any given education campus. Moreover, the accepted value of a unit continuously changes based on physical and social variables. These variables could include proximity to campus with regard to other available options, friend groups living nearby, and the availability of nearby parking as other properties fill. The interplay happens at all levels—entire properties may shift in value and specific units within them may exacerbate or counteract those shifts.

For a property management company to earn the maximum revenue from its rental units, it needs to price each unit just within the price-point for the tenants—but it doesn’t know what their price constraints are.  The company would not want to leave money on the table by setting a price too low. Setting a price too high can mean that the unit sits empty—effectively costing the company to maintain the unit. Finding that balance is a difficult problem.

Entrata, a comprehensive technology provider of multifamily property management solutions, solves this problem by employing machine learning (ML) with AWS.  Specifically, they feed location-specific and even building-specific data (such as occupancy, proximity to campus, and lease term length) into an ML-based dynamic pricing engine running on Amazon SageMaker. The model helps Entrata’s customers—property managers—to predict occupancy levels and in turn optimize their prices of student housing.

At the implementation level, this solution relies on a number of AWS offerings.  AWS Glue extracts Entrata’s historical data into Amazon S3. This data enables Amazon SageMaker to make pricing predictions, which are written to an output bucket back into Amazon S3. Entrata’s applications consume this data request using API Gateway, which triggers AWS Lambda functions to deliver the most relevant forecast for any available unit.

Entrata developed this solution in partnership with AWS Premier Consulting Partner 1Strategy, a Seattle-based consultancy that helps businesses architect, migrate, and optimize their workloads on AWS. The partnership between 1Strategy and Entrata has existed for years, but the ML work is their most recent—and arguably, most impressive—joint technical accomplishment.

Their collaboration previously focused exclusively on data management through AWS—which in itself proves a non-trivial challenge due to the location, size, and complexity of the data. Entrata currently serves greater than 20,000 apartment communities nationwide and offers a variety of tools, from mobile apps to lease signing portals to accounting platforms.

The novel ML solution is exciting. Entrata’s CTO, Ryan Byrd, says, “The impact is far ranging and positive. Automating back-office functions with Amazon ML frees property management to focus on people first, instead of performing rote behind-the-scenes guessing of price recommendations.”

Entrata plans even more work with AWS in the future. Byrd adds, “AWS technologies will decrease our time to market with various ML projects.” He and his colleagues on the Entrata team are keen to aid customers in their decision-making efforts. They also use ML for various operational elements for their and their customers’ businesses, strategic planning, and maintenance management.


About the Author

Marisa Messina is on the AWS ML marketing team, where her job includes identifying the most innovative AWS-using customers and showcasing their inspiring stories. Prior to AWS, she worked on consumer-facing hardware and then university-facing cloud offerings at Microsoft. Outside of work, she enjoys exploring the Pacific Northwest hiking trails, cooking without recipes, and dancing in the rain.

 

 

 

Multilingual Universal Sentence Encoder for Semantic Retrieval

Since it was introduced last year, “Universal Sentence Encoder (USE) for English’’ has become one of the most downloaded pre-trained text modules in Tensorflow Hub, providing versatile sentence embedding models that convert sentences into vector representations. These vectors capture rich semantic information that can be used to train classifiers for a broad range of downstream tasks. For example, a strong sentiment classifier can be trained from as few as one hundred labeled examples, and still be used to measure semantic similarity and for meaning-based clustering.

Today, we are pleased to announce the release of three new USE multilingual modules with additional features and potential applications. The first two modules provide multilingual models for retrieving semantically similar text, one optimized for retrieval performance and the other for speed and less memory usage. The third model is specialized for question-answer retrieval in sixteen languages (USE-QA), and represents an entirely new application of USE. All three multilingual modules are trained using a multi-task dual-encoder framework, similar to the original USE model for English, while using techniques we developed for improving the dual-encoder with additive margin softmax approach. They are designed not only to maintain good transfer learning performance, but to perform well on semantic retrieval tasks.

Multi-task training structure of the Universal Sentence Encoder. A variety of tasks and task structures are joined by shared encoder layers/parameters (pink boxes).

Semantic Retrieval Applications
The three new modules are all built on semantic retrieval architectures, which typically split the encoding of questions and answers into separate neural networks, which makes it possible to search among billions of potential answers within milliseconds. The key to using dual encoders for efficient semantic retrieval is to pre-encode all candidate answers to expected input queries and store them in a vector database that is optimized for solving the nearest neighbor problem, which allows a large number of candidates to be searched quickly with good precision and recall. For all three modules, the input query is then encoded into a vector on which we can perform an approximate nearest neighbor search. Together, this enables good results to be found quickly without needing to do a direct query/candidate comparison for every candidate. The prototypical pipeline is illustrated below:

A prototypical semantic retrieval pipeline, used for textual similarity.

Semantic Similarity Modules
For semantic similarity tasks, the query and candidates are encoded using the same neural network. Two common semantic retrieval tasks made possible by the new modules include Multilingual Semantic Textual Similarity Retrieval and Multilingual Translation Pair Retrieval.

  • Multilingual Semantic Textual Similarity Retrieval
    Most existing approaches for finding semantically similar text require being given a pair of texts to compare. However, using the Universal Sentence Encoder, semantically similar text can be extracted directly from a very large database. For example, in an application like FAQ search, a system can first index all possible questions with associated answers. Then, given a user’s question, the system can search for known questions that are semantically similar enough to provide an answer. A similar approach was used to find comparable sentences from 50 million sentences in wikipedia. With the new multilingual USE models, this can be done in any of supported non-English languages.
  • Multilingual Translation Pair Retrieval
    The newly released modules can also be used to mine translation pairs to train neural machine translation systems. Given a source sentence in one language (“How do I get to the restroom?”), they can find the potential translation target in any other supported language (“¿Cómo llego al baño?”).

Both new semantic similarity modules are cross-lingual. Given an input in Chinese, for example, the modules can find the best candidates, regardless of which language it is expressed in. This versatility can be particularly useful for languages that are underrepresented on the internet. For example, an early version of these modules has been used by Chidambaram et al. (2018) to provide classifications in circumstances where the training data is only available in a single language, e.g. English, but the end system must function in a range of other languages.

USE for Question-Answer Retrieval
The USE-QA module extends the USE architecture to question-answer retrieval applications, which generally take an input query and find relevant answers from a large set of documents that may be indexed at the document, paragraph, or even sentence level. The input query is encoded with the question encoding network, while the candidates are encoded with the answer encoding network.

Visualizing the action of a neural answer retrieval system. The blue point at the north pole represents the question vector. The other points represent the embeddings of various answers. The correct answer, highlighted here in red, is “closest” to the question, in that it minimizes the angular distance. The points in this diagram are produced by an actual USE-QA model, however, they have been projected downwards from ℝ500 to ℝ3 to assist the reader’s visualization.

Question-answer retrieval systems also rely on the ability to understand semantics. For example, consider a possible query to one such system, Google Talk to Books, which was launched in early 2018 and backed by a sentence-level index of over 100,000 books. A query, “What fragrance brings back memories?”, yields the result, “And for me, the smell of jasmine along with the pan bagnat, it brings back my entire carefree childhood.” Without specifying any explicit rules or substitutions, the vector encoding captures the semantic similarity between the terms fragrance and smell. The advantage provided by the USE-QA module is that it can extend question-answer retrieval tasks such as this to multilingual applications.

For Researchers and Developers
We’re pleased to share the latest additions to the Universal Sentence Encoder family with the research community, and are excited to see what other applications will be found. These modules can be used as-is, or fine tuned using domain-specific data. Lastly, we will also host the semantic similarity for natural language page on Cloud AI Workshop to further encourage research in this area.

Acknowledgements
Mandy Guo, Daniel Cer, Noah Constant, Jax Law, Muthuraman Chidambaram for core modeling, Gustavo Hernandez Abrego, Chen Chen, Mario Guajardo-Cespedes for infrastructure and colabs, Steve Yuan, Chris Tar, Yunhsuan Sung, Brian Strope, Ray Kurzweil for discussion of the model architecture.

Helping students learn with Course Hero, powered by Amazon SageMaker

Course Hero is an online learning platform that provides students access to over 25 million course-specific study materials, including study guides, class notes, and practice problems for numerous subjects. The platform, which runs on AWS, is designed to enable every student to take on their courses feeling confident and prepared. To make that possible, Course Hero is equipped to do some learning of its own, using Amazon Machine Learning (Amazon ML), which powers Course Hero and serves as its primary artificial intelligence and ML platform.

The artificial intelligence group at Course Hero is tasked with building the company’s semantic knowledge graph. This constantly expanding graph enables students to access personalized learning experiences and gives educators tools to create unique course content.

Most aspects of Course Hero’s offerings rely on AWS in some form or another (either compute or ML). For example, Amazon Elasticsearch Service (Amazon ES) powers the search function that students and educators use to search for materials. The Amazon ES platform allows the Course Hero team to write custom implementations through its API extension plugin. The plugin gives them the flexibility to create relevant user experiences, even for more esoteric searches that require locally dense semantic search capability.

Students and educators search within Course Hero’s document library (which is freely accessible) in exchange for uploading one’s own content. Course Hero does not accept all documents as publishable library material; documents gain acceptance to the library after going through a cloud-driven vetting process. When new documents are uploaded, an artificial intelligence platform running on Amazon EMR and Amazon SageMaker Inference Pipelines checks and validates the documents for fraud, honor code violations, copyright infringements, and spam.

The documents that pass quality review then move to further processing and tagging using ML models that are built on the label data that Amazon SageMaker Ground Truth has collected. This document labeling enables Course Hero to learn what kind of materials are used by a given student, then predict what else might be useful for them.

By personalizing the experience in this way, Course Hero provides each user with relevant content for their studying needs. With the right content in hand, students gain a deeper understanding and meet their learning objectives more efficiently.

AWS is a comprehensive platform for Course Hero. In addition to the student-facing use cases described above, Course Hero uses AWS services for ad hoc analyses, data exploration, trend discovery, real-time analytics, fraud detection, and more. Course Hero constructs its data platform using key AWS services, including the following:

Course Hero’s planning, tracking, and monitoring platforms also use Kibana, Logstash, and Amazon CloudWatch to keep all monitoring and service centers running smoothly.

The following diagram shows how all of these components work together.

To further augment the existing AWS technology that powers Course Hero, the team is exploring additional Amazon services, including Amazon Forecast, for time series and financial forecasting. It is also looking at possibilities using Amazon Echo that will allow users to ask questions via Alexa,

Course Hero’s Saurabh Khanwalkar, the Director of Machine Learning & Search Sciences, says, “The entire machine learning, engineering, and artificial intelligence stack runs on AWS. From our CI/CD pipelines to our code workbench to our end-to-end model development to our staging and production inferences, we’re on AWS.”


About the Author

Marisa Messina is on the AWS ML marketing team, where her job includes identifying the most innovative AWS-using customers and showcasing their inspiring stories. Prior to AWS, she worked on consumer-facing hardware and then university-facing cloud offerings at Microsoft. Outside of work, she enjoys exploring the Pacific Northwest hiking trails, cooking without recipes, and dancing in the rain.

 

 

 

Voicing play with Volley, where words are the gameboard and Amazon Polly brings the fun

Voice-powered experiences are gaining traction and customer love. Volley is at the cutting edge of voice-controlled entertainment with its series of popular smart-speaker games, and many aspects of Volley rely on Amazon Polly.

Every day, more and more people switch on lights, check the weather, and play music not by pushing buttons but with verbal commands to smart speakers. Volley is a San Francisco–based startup co-founded in 2016 by former Harvard roommates Max Child (CEO) and James Wilsterman (CTO). They’re on a mission to use smart speakers as the basis for building fun experiences.

Volley creates games of all sorts, from song quizzes to political satire to role-playing games. Many of the latter, such as “Yes Sire,” feature choose-your-own-adventure style games, in which infinite dialogue permutations can flow from each player’s choices. Volley relies heavily on Amazon Polly to enable these growing dialogue permutations amid multiple characters’ interactions.

“We associate each character with a particular Amazon Polly voice,” said Wilsterman. “Our on-the-fly TTS generation only works because Amazon Polly’s text-to-speech API latency is low enough to be essentially imperceptible to the user.”

From a cost perspective, the comparison is a no-brainer: hiring voice actors to voice the games would be a thousand times more expensive (literally–Volley ran the numbers). Amazon Polly has reaction speed nailed, with faster reactions than a human option. It also provides more diverse characters and reactions than recorded, scripted voice actors.

“We want our games to showcase diverse, memorable characters,” said Wilsterman. “We appreciate that Amazon Polly supports many different languages, accents, and age ranges to help us in that effort.” For example, Amazon Polly’s built-in German language support proved essential to Volley’s recent launch of a localized version of “Yes Sire” for Germany (called “Ja Exzellenz”).

Along with Amazon Polly, many other AWS services support Volley’s fun and games. This platform choice dates to Volley’s beginnings, when the co-founders were looking for the best services to host backend game logic and store persistent customer data.

“We realized quickly that AWS Lambda and Amazon DynamoDB would be ideal options,” said Wilsterman. He soon discovered that AWS also offered appealing scalability and affordability. The Volley team now uses Lambda not only to host the backend logic for their games but also to host a variety of internal tools and microservices deployed through Lambda functions.

DynamoDB supports Volley’s games by storing persistent data like users’ scores and levels, so they can return to the games and pick up right where they left off. And many of the in-game assets are stored in Amazon S3, which makes them instantly accessible to the backend Lambda functions. All those pieces are visualized together in the following workflow diagram.

Volley recently added a layer of sophistication to its machine learning work with Amazon SageMaker. They’re using Amazon SageMaker to strengthen their business by understanding user behavior and promoting their games accordingly. Specifically, the Volley team faces a bit of challenge because users don’t carry persistent tags. So, if someone finishes playing “World Detective” and immediately starts to play “Castle Master,” there is no way to identify that they’re the same user.

As a result, the Volley team must find creative ways to measure the impact of their cross-promotional efforts. With Amazon SageMaker, they can predictively generate the outcomes of their marketing based on the active users of each of the games and the timestamps. That helps them make sure that future marketing is better-targeted—and that future games meet the audience trends that Volley is seeing.

As Volley continues to expand its repertoire, the team is also considering new directions beyond sheer entertainment. “Self-improvement is an interesting space, like meditation, fitness, and other coaches,” said Wilsterman. “Also, learning and teaching. We are constantly asking, ‘What new experiences can be possible with voice as an input?’”

No matter what Volley chooses to pursue next, one thing is for sure: their cloud platform of choice. “The entire architecture runs on AWS; we use it for everything from storage to machine learning,” said Wilsterman.


About the Author

Marisa Messina is on the AWS ML marketing team, where her job includes identifying the most innovative AWS-using customers and showcasing their inspiring stories. Prior to AWS, she worked on consumer-facing hardware and then university-facing cloud offerings at Microsoft. Outside of work, she enjoys exploring the Pacific Northwest hiking trails, cooking without recipes, and dancing in the rain.

 

 

 

NVIDIA DGX-Ready Program Goes Global, Doubles Colocation Partners

To help businesses deploy AI infrastructure to power their most important opportunities, our DGX-Ready Data Center program is going global. We’ve also added new services that will help organizations accelerate their progress.

We’ve added to the program three new partners in Europe, five in Asia and two in North America. With these additions, customers now have access to a global network of 19 validated partners around the world.

DGX-Ready Data Center partners help companies access modern data center facilities for their AI infrastructure. They offer world-class facilities to host DGX AI compute infrastructure, giving more organizations access to AI-ready data center facilities while saving on capital expenditures and keeping operational costs low.

The program is now offered in 24 markets, including Australia, Austria, Brazil, Canada, China, Colombia, Denmark, France, Germany, Hong Kong, Iceland, Ireland, Italy, Japan, the Netherlands, Peru, Singapore, South Korea, Spain, Sweden, Switzerland, Turkey, the United Kingdom and the United States — with more coming soon.

Among the new locations is the Fujitsu Yokohama Data Center in Japan, which hosts dozens of NVIDIA AI systems.

“The Fujitsu Yokohama Data Center hosts more than 60 NVIDIA DGX-1 and DGX-2 systems,” said Hisaya Nakagawa, director at Fujitsu. “As a DGX-Ready Data Center program partner, we’re able to offer customers our world-class, state-of-the-art facility to run their most important AI workloads. With this program, customers can operationalize AI infrastructure swiftly and enjoy a jumpstart on their business transformation.”

DGX-Ready Program
Among the new DGX-Ready colocation partners is Fujitsu, equipped with more than 60 NVIDIA DGX-1 and DGX-2 systems in the Fujitsu Yokohama Data Center in Japan. Image courtesy of Fujitsu Ltd.

Enhanced Services That Accelerate Time to Insight

In addition to access to a world-class data center, the DGX-Ready Data Center program offers services that can reduce the risks of new infrastructure investment.

Select DGX-Ready colocation partners are adding “try-and-buy” options that let enterprises “test drive” their DGX infrastructure. Customers can gain valuable operational experience before they decide to deploy these systems in their own data center. Core Scientific and Flexential are among the first partners to offer this capability.

Additionally, select partners offer GPU-as-a-service options that let businesses access DGX-powered compute in an affordable model, without committing to a full system.

Mobile game developer Jam City is taking advantage of this capability to accelerate game development using Core Scientific’s AI-Optimized Cloud, powered by NVIDIA DGX.

“We’re relying on machine learning and artificial intelligence to guide game design and transform our business,” said Rami Safadi, chief data officer at Jam City. “Core Scientific’s cloud has enhanced how we utilize data and allowed us to analyze billions of rows of data per day. We’ve seen an 8x increase in speed, enabling us to train an entirely new set of winning AI business models.”

Meet the Perfect DGX-Ready Partner Fast

With the many options for AI infrastructure hosting, it’s important to choose a colocation partner that suits your needs.

To make it simpler, we’ve introduced the DGX-Ready Data Center portal, which lets customers search our global network of providers, filtered by region, supported systems and enhanced services. The portal make it faster and easier to find the perfect match.

The post NVIDIA DGX-Ready Program Goes Global, Doubles Colocation Partners appeared first on The Official NVIDIA Blog.

Get Your Fashion Fix: Stitch Fix Adds AI Flair to Your Closet

Some say style never fades, and now with the help of AI, finding one’s fashion sense is about to get a whole lot easier.

Fashion ecommerce startup Stitch Fix is piecing together a seamless balance between AI-powered decision making and human judgement.

“We really want to be a partner and personal stylist for people over a long period of time,” said Stitch Fix’s Chief Algorithms Officer Brad Klingenberg in a conversation with AI Podcast host Noah Kravitz.

“A lot of our clients find it really rewarding to be able to have their stylists get to know them … and this is all augmented and complemented with what we can learn algorithmically,” he added. ‘But I think there’s a really rich human component there that is not something easily replaced by an algorithm.”

Since launching in 2011, Stitch Fix has attracted over 3 million clients. Users complete a style profile and are assigned a personal stylist. Stylists will send a box — also referred to as a “fix” — with a curated selection of clothes, accessories, and shoes that fit within one’s taste and budget. Using clients’ feedback per fix, both the stylist and Stitch Fix’s algorithms gain a better sense of their styles.

As a service, Stitch Fix benefits from a “human-in-the-loop” method to help users experiment with their own aesthetic. The stylist acts as a check to the algorithm by evaluating if a selected piece either deviates too much from or helps diversify a client’s existing wardrobe.

“[This] really allows data scientists and folks on my team to really focus on things that dramatically improve the client experience and worry less about rare edge cases,” said Klingenberg. “The stylist will be able to help us make the right decision.”

Personalized curation, Klingenberg explains, is an increasing trend in not just retail, but also in other consumer services such as television and music.

“There’s certainly a central aspect to the Stitch Fix value proposition where… the goal isn’t to present clients with an unlimited selection of everything they could ever want… but to actually just share what they want,” said Klingenberg. “And so I think this counter trend to just limitless availability will show up in a few places.”

If you are interested in learning more about Klingenberg’s work at Stitch Fix, you can check out their technical blog, Multithreaded, and venture into the science behind the fashion with their Algorithms Tour.

Help Make the AI Podcast Better

Have a few minutes to spare? It’d help us if you fill out this short listener survey.

Your answers will help us learn more about our audience, which will help us deliver podcasts that meet your needs, what we can do better, and what we’re doing right.

How to Tune into the AI Podcast

Our AI Podcast is available through iTunesCastbox, DoggCatcher, Google Play MusicOvercastPlayerFMPodbayPodBean, Pocket Casts, PodCruncher, PodKicker, Stitcher, Soundcloud and TuneIn.

If your favorite isn’t listed here, email us at aipodcast [at] nvidia [dot] com.

The post Get Your Fashion Fix: Stitch Fix Adds AI Flair to Your Closet appeared first on The Official NVIDIA Blog.

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat