Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Global

Become a certified machine learning developer with the new AWS Certified Machine Learning – Specialty certification

Back in November 2018 we announced on this blog that the same machine learning (ML) courses used to train engineers at Amazon are now available to all developers through AWS. Today, we’re letting you know that there is a way to enhance and validate your ability to build, train, tune, and deploy machine learning models with AWS.

AWS Training and Certification is excited to announce the availability of the new AWS Certified Machine Learning – Specialty certification. This new exam was created by AWS experts for developers and data scientists who want to validate their ability to design, implement, deploy, and maintain ML solutions for given business problems. In addition, it validates their ability to select and justify the appropriate ML approach for a given business problem, identify appropriate AWS services to implement ML solutions, and design and implement scalable, cost-optimized, reliable, and secure ML solutions. AWS Training and Certification recommends one or more years of hands-on experience using ML  and artificial intelligence (AI) services, particularly Amazon SageMaker, and other services such as Amazon EMR, AWS Lambda, AWS Glue, and Amazon S3. It is also encouraged, but not required, that candidates hold an Associate-level certification or the Cloud Practitioner certification.

I had a chance to catch up with Swami Sivasubramanian, who is the Vice President of Machine Learning at AWS, to hear his thoughts on the needs in this growing field. “Customers tell us that they need more skilled talent in the area of machine learning, which has been a familiar problem within Amazon as well. It’s why we invested so deeply in creating internal education to train developers on machine learning,” he said. “AWS Training and Certification is helping to put those same resources in the hands of our customers so that they can develop and validate skills among their staff which will enable them to take full advantage of the growing AI/ML suite from AWS and the transformational effect that this technology will have on organizations and the global economy.”

Your ability to build, train, tune, and deploy ML models opens up many opportunities for new business ideas, new employment opportunities, and new customer experiences. Are you ready to get started?

Next Steps

The exam is available in English and Japanese at testing centers worldwide for $300 USD. You can learn more about the exam here. AWS offers an exam guide which provides training resources and materials to help you prepare. In addition, a learning path, called “Exam Preparation,” specifically designed for those taking the exam is available at aws.training/machinelearning.


– Maureen Lonergan – Director, AWS Training and Certification

A Summary of the Google Flood Forecasting Meets Machine Learning Workshop

Recently, we hosted the Google Flood Forecasting Meets Machine Learning workshop in our Tel Aviv office, which brought hydrology and machine learning experts from Google and the broader research community to discuss existing efforts in this space, build a common vocabulary between these groups, and catalyze promising collaborations. In line with our belief that machine learning has the potential to significantly improve flood forecasting efforts and help the hundreds of millions of people affected by floods every year, this workshop discussed improving flood forecasting by aggregating and sharing large data sets, automating calibration and modeling processes, and applying modern statistical and machine learning tools to the problem.

Panel on challenges and opportunities in flood forecasting, featuring (from left to right): Prof. Paolo Burlando (ETH Zürich), Dr. Tyler Erickson (Google Earth Engine), Dr. Peter Salamon (Joint Research Centre) and Prof. Dawei Han (University of Bristol).

The event was kicked off by Google’s Yossi Matias, who discussed recent machine learning work and its potential relevance for flood forecasting, crisis response and AI for Social Good, followed by two introductory sessions aimed at bridging some of the knowledge gap between the two fields – introduction to hydrology for computer scientists by Prof. Peter Molnar of ETH Zürich, and introduction to machine learning for hydrologists by Prof. Yishay Mansour of Tel Aviv University and Google

Included in the 2-day event was a wide range of fascinating talks and posters across the flood forecasting landscape, from both hydrologic and machine learning points of view.

An overview of research areas in flood forecasting addressed in the workshop.

Presentations from the research community included:

Alongside these talks, we presented the various efforts across Google to try and improve flood forecasting and foster collaborations in the field, including:

Additionally, at this workshop we piloted an experimental “ML Consultation” panel, where Googlers Gal Elidan, Sasha Goldshtein and Doron Kukliansky gave advice on how to best use machine learning in several hydrology-related tasks. Finally, we concluded the workshop with a moderated panel on the greatest challenges and opportunities in flood forecasting, with hydrology experts Prof. Paolo Burlando of ETH Zürich, Prof. Dawei Han of the University of Bristol, Dr. Peter Salamon of the Joint Research Centre and Dr. Tyler Erickson of Google Earth Engine.
Flood forecasting is an incredibly important and challenging task that is one part of our larger AI for Social Good efforts. We believe that effective global-scale solutions can be achieved by combining modern techniques with the domain expertise already existing in the field. The workshop was a great first step towards creating much-needed understanding, communication and collaboration between the flood forecasting community and the machine learning community, and we look forward to our continued engagement with the broad research community to tackle this challenge.

Acknowledgements
We would like to thank Avinatan Hassidim, Carla Bromberg, Doron Kukliansky, Efrat Morin, Gal Elidan, Guy Shalev, Jennifer Ye, Nadav Rabani and Sasha Goldshtein for their contributions to making this workshop happen.

Google Faculty Research Awards 2018

We just completed another round of the Google Faculty Research Awards, our annual open call for proposals on computer science and related topics, such as quantum computing, machine learning, algorithms and theory, natural language processing and more. Our grants cover tuition for a graduate student and provide both faculty and students the opportunity to work directly with Google researchers and engineers.

This round we received 910 proposals covering 40 countries and over 320 universities. After expert reviews and committee discussions, we decided to fund 158 projects. The subject areas that received the most support this year were human computer interaction, machine learning, machine perception, and systems.

Congratulations to the well-deserving recipients of this round’s awards. More information on how to apply for the next round will be available at the end of the summer on our website. You can find award recipients from previous years here.

Harnessing Organizational Knowledge for Machine Learning

One of the biggest bottlenecks in developing machine learning (ML) applications is the need for the large, labeled datasets used to train modern ML models. Creating these datasets involves the investment of significant time and expense, requiring annotators with the right expertise. Moreover, due to the evolution of real-world applications, labeled datasets often need to be thrown out or re-labeled.

In collaboration with Stanford and Brown University, we present “Snorkel Drybell: A Case Study in Deploying Weak Supervision at Industrial Scale,” which explores how existing knowledge in an organization can be used as noisier, higher-level supervision—or, as it is often termed, weak supervision—to quickly label large training datasets. In this study, we use an experimental internal system, Snorkel Drybell, which adapts the open-source Snorkel framework to use diverse organizational knowledge resources—like internal models, ontologies, legacy rules, knowledge graphs and more—in order to generate training data for machine learning models at web scale. We find that this approach can match the efficacy of hand-labeling tens of thousands of data points, and reveals some core lessons about how training datasets for modern machine learning models can be created in practice.

Rather than labeling training data by hand, Snorkel DryBell enables writing labeling functions that label training data programmatically. In this work, we explored how these labeling functions can capture engineers’ knowledge about how to use existing resources as heuristics for weak supervision. As an example, suppose our goal is to identify content related to celebrities. One can leverage an existing named-entity recognition (NER) model for this task by labeling any content that does not contain a person as not related to celebrities. This illustrates how existing knowledge resources (in this case, a trained model) can be combined with simple programmatic logic to label training data for a new model. Note also, importantly, that this labeling function returns None—i.e. abstains—in many cases, and thus only labels some small part of the data; our overall goal is to use these labels to train a modern machine learning model that can generalize to new data.

In our example of a labeling function, rather than hand-labeling a data point (1), one utilizes an existing knowledge resource—in this case, a NER model (2)—together with some simple logic expressed in code (3) to heuristically label data.

This programmatic interface for labeling training data is much faster and more flexible than hand-labeling individual data points, but the resulting labels are obviously of much lower quality than manually-specified labels. The labels generated by these labeling functions will often overlap and disagree, as the labeling functions may not only have arbitrary unknown accuracies, but may also be correlated in arbitrary ways (for example, from sharing a common data source or heuristic).

To solve the problem of noisy and correlated labels, Snorkel DryBell uses a generative modeling technique to automatically estimate the accuracies and correlations of the labeling functions in a provably consistent way—without any ground truth training labels—then uses this to re-weight and combine their outputs into a single probabilistic label per data point. At a high level, we rely on the observed agreements and disagreements between the labeling functions (the covariance matrix), and learn the labeling function accuracy and correlation parameters that best explain this observed output using a new matrix completion-style approach. The resulting labels can then be used to train an arbitrary model (e.g. in TensorFlow), as shown in the system diagram below.

Using Diverse Knowledge Sources as Weak Supervision
To study the efficacy of Snorkel Drybell, we used three production tasks and corresponding datasets, aimed at classifying topics in web content, identifying mentions of certain products, and detecting certain real-time events. Using Snorkel DryBell, we were able to make use of various existing or quickly specified sources of information such as:

  • Heuristics and rules: e.g. existing human-authored rules about the target domain.
  • Topic models, taggers, and classifiers: e.g. machine learning models about the target domain or a related domain.
  • Aggregate statistics: e.g. tracked metrics about the target domain.
  • Knowledge or entity graphs: e.g. databases of facts about the target domain.
In Snorkel DryBell, the goal is to train a machine learning model (C), for example to do content or event classification over web data. Rather than hand-labeling training data to do this, in Snorkel DryBell users write labeling functions that express various organizational knowledge resources (A), which are then automatically reweighted and combined (B).

We used these organizational knowledge resources to write labeling functions in a MapReduce template-based pipeline. Each labeling function takes in a data point and either abstains, or outputs a label. The result is a large set of programmatically-generated training labels. However, many of these labels were very noisy (e.g. from the heuristics), conflicted with each other, or were far too coarse-grained (e.g. the topic models) for our task, leading to the next stage of Snorkel DryBell, aimed at automatically cleaning and integrating the labels into a final training set.

Modeling the Accuracies to Combine & Repurpose Existing Sources
To handle these noisy labels, the next stage of Snorkel DryBell combines the outputs from the labeling functions into a single, confidence-weighted training label for each data point. The challenging technical aspect is that this must be done without any ground-truth labels. We use a generative modeling technique that learns the accuracy of each labeling function using only unlabeled data. This technique learns by observing the matrix of agreements and disagreements between the labeling functions’ outputs, taking into account known (or statistically estimated) correlation structures between them. In Snorkel DryBell, we also implement a new faster, sampling-free version of this modeling approach, implemented in TensorFlow, in order to handle web-scale data.

By combining and modeling the output of the labeling functions using this procedure in Snorkel DryBell, we were able to generate high-quality training labels. In fact, on the two applications where hand-labeled training data was available for comparison, we achieved the same predictive accuracy training a model with Snorkel DryBell’s labels as we did when training that same model with 12,000 and 80,000 hand-labeled training data points.

Transferring Non-Servable Knowledge to Servable Models
In many settings, there is also an important distinction between servable features—which can be used in production—and non-servable features, that are too slow or expensive to be used in production. These non-servable features may have very rich signal, but a general question is how to use them to train or otherwise help servable models that can be deployed in production?

In many settings, users write labeling functions that leverage organizational knowledge resources that are not servable in production (a)—e.g. aggregate statistics, internal models, or knowledge graphs that are too slow or expensive to use in production—in order to train models that are only defined over production-servable features (b), e.g. cheap, real-time web signals.

In Snorkel DryBell, we found that users could write the labeling functions—i.e. express their organizational knowledge—over one feature set that was not servable, and then use the resulting training labels output by Snorkel DryBell to train a model defined over a different, servable feature set. This cross-feature transfer boosted our performance by an average 52% on the benchmark datasets we created. More broadly, it represents a simple but powerful way to use resources that are too slow (e.g. expensive models or aggregate statistics), private (e.g. entity or knowledge graphs), or otherwise unsuitable for deployment, to train servable models over cheap, real-time features. This approach can be viewed as a new type of transfer learning, where instead of transferring a model between different datasets, we’re transferring domain knowledge between different feature sets- an approach which has potential use cases not just in industry, but in medical settings and beyond.

Next Steps
Moving forward, we’re excited to see what other types of organizational knowledge can be used as weak supervision, and how the approach used by Snorkel DryBell can enable new modes of information reuse and sharing across organizations. For more details, check out our paper, and for further technical details, blog posts, and tutorials, check out the open-source Snorkel implementation at snorkel.stanford.edu.

Acknowledgments
This research was done in collaboration between Google, Stanford, and Brown. We would like to thank all the people who were involved, including Stephen Bach (Brown), Daniel Rodriguez, Yintao Liu, Chong Luo, Haidong Shao, Souvik Sen, Braden Hancock (Stanford), Houman Alborzi, Rahul Kuchhal, Christopher Ré (Stanford), Rob Malkin.

An All-Neural On-Device Speech Recognizer

In 2012, speech recognition research showed significant accuracy improvements with deep learning, leading to early adoption in products such as Google’s Voice Search. It was the beginning of a revolution in the field: each year, new architectures were developed that further increased quality, from deep neural networks (DNNs) to recurrent neural networks (RNNs), long short-term memory networks (LSTMs), convolutional networks (CNNs), and more. During this time, latency remained a prime focus — an automated assistant feels a lot more helpful when it responds quickly to requests.

Today, we’re happy to announce the rollout of an end-to-end, all-neural, on-device speech recognizer to power speech input in Gboard. In our recent paper, “Streaming End-to-End Speech Recognition for Mobile Devices“, we present a model trained using RNN transducer (RNN-T) technology that is compact enough to reside on a phone. This means no more network latency or spottiness — the new recognizer is always available, even when you are offline. The model works at the character level, so that as you speak, it outputs words character-by-character, just as if someone was typing out what you say in real-time, and exactly as you’d expect from a keyboard dictation system.

This video compares the production, server-side speech recognizer (left panel) to the new on-device recognizer (right panel) when recognizing the same spoken sentence. Video credit: Akshay Kannan and Elnaz Sarbar

A Bit of History
Traditionally, speech recognition systems consisted of several components – an acoustic model that maps segments of audio (typically 10 millisecond frames) to phonemes, a pronunciation model that connects phonemes together to form words, and a language model that expresses the likelihood of given phrases. In early systems, these components remained independently-optimized.

Around 2014, researchers began to focus on training a single neural network to directly map an input audio waveform to an output sentence. This sequence-to-sequence approach to learning a model by generating a sequence of words or graphemes given a sequence of audio features led to the development of “attention-based” and “listen-attend-spell” models. While these models showed great promise in terms of accuracy, they typically work by reviewing the entire input sequence, and do not allow streaming outputs as the input comes in, a necessary feature for real-time voice transcription.

Meanwhile, an independent technique called connectionist temporal classification (CTC) had helped halve the latency of the production recognizer at that time. This proved to be an important step in creating the RNN-T architecture adopted in this latest release, which can be seen as a generalization of CTC.

Recurrent Neural Network Transducers
RNN-Ts are a form of sequence-to-sequence models that do not employ attention mechanisms. Unlike most sequence-to-sequence models, which typically need to process the entire input sequence (the waveform in our case) to produce an output (the sentence), the RNN-T continuously processes input samples and streams output symbols, a property that is welcome for speech dictation. In our implementation, the output symbols are the characters of the alphabet. The RNN-T recognizer outputs characters one-by-one, as you speak, with white spaces where appropriate. It does this with a feedback loop that feeds symbols predicted by the model back into it to predict the next symbols, as described in the figure below.

Representation of an RNN-T, with the input audio samples, x, and the predicted symbols y. The predicted symbols (outputs of the Softmax layer) are fed back into the model through the Prediction network, as yu-1, ensuring that the predictions are conditioned both on the audio samples so far and on past outputs. The Prediction and Encoder Networks are LSTM RNNs, the Joint model is a feedforward network (paper). The Prediction Network comprises 2 layers of 2048 units, with a 640-dimensional projection layer. The Encoder Network comprises 8 such layers. Image credit: Chris Thornton

Training such models efficiently was already difficult, but with our development of a new training technique that further reduced the word error rate by 5%, it became even more computationally intensive. To deal with this, we developed a parallel implementation so the RNN-T loss function could run efficiently in large batches on Google’s high-performance Cloud TPU v2 hardware. This yielded an approximate 3x speedup in training.

Offline Recognition
In a traditional speech recognition engine, the acoustic, pronunciation, and language models we described above are “composed” together into a large search graph whose edges are labeled with the speech units and their probabilities. When a speech waveform is presented to the recognizer, a “decoder” searches this graph for the path of highest likelihood, given the input signal, and reads out the word sequence that path takes. Typically, the decoder assumes a Finite State Transducer (FST) representation of the underlying models. Yet, despite sophisticated decoding techniques, the search graph remains quite large, almost 2GB for our production models. Since this is not something that could be hosted easily on a mobile phone, this method requires online connectivity to work properly.

To improve the usefulness of speech recognition, we sought to avoid the latency and inherent unreliability of communication networks by hosting the new models directly on device. As such, our end-to-end approach does not need a search over a large decoder graph. Instead, decoding consists of a beam search through a single neural network. The RNN-T we trained offers the same accuracy as the traditional server-based models but is only 450MB, essentially making a smarter use of parameters and packing information more densely. However, even on today’s smartphones, 450MB is a lot, and propagating signals through such a large network can be slow.

We further reduced the model size by using the parameter quantization and hybrid kernel techniques we developed in 2016 and made publicly available through the model optimization toolkit in the TensorFlow Lite library. Model quantization delivered a 4x compression with respect to the trained floating point models and a 4x speedup at run-time, enabling our RNN-T to run faster than real time speech on a single core. After compression, the final model is 80MB.

Our new all-neural, on-device Gboard speech recognizer is initially being launched to all Pixel phones in American English only. Given the trends in the industry, with the convergence of specialized hardware and algorithmic improvements, we are hopeful that the techniques presented here can soon be adopted in more languages and across broader domains of application.

Acknowledgements:
Raziel Alvarez, Michiel Bacchiani, Tom Bagby, Françoise Beaufays, Deepti Bhatia, Shuo-yiin Chang, Zhifeng Chen, Chung-Chen Chiu, Yanzhang He, Alex Gruenstein, Anjuli Kannan, Bo Li, Wei Li, Qiao Liang, Ian McGraw, Patrick Nguyen, Ruoming Pang, Rohit Prabhavalkar, Golan Pundak, Kanishka Rao, David Rybach, Tara Sainath, Haşim Sak, June Yuan Shangguan, Matt Shannon, Mohammadinamul Sheik, Khe Chai Sim, Gabor Simko, Trevor Strohman, Mirkó Visontai, Ron Weiss, Yonghui Wu, Ding Zhao, Dan Zivkovic, and Yu Zhang.