Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Author: torontoai

[D] Whats the deal of NeurIPS 2019 reserved ticket?

I am an astrophysics student in Toronto who are doing my PhD project something about deep learning on astrophysical data. My supervisor advised me to apply for NeurIPS this year to chat and learn with other people in different field. I know NeurIPS changed the ticket system to a lottery. But I still have difficulty to figure it out the whole process.

So I did join the lottery and received an email with link to get a “reserved ticket” during the past weekend. I don’t know what does it mean? Like I actually get the ticket and I just have to complete the process and paid?

Moreover, I am especially interested in the workshop ” Machine Learning and the Physical Sciences”. But it does not seems like I can submit my work/poster to them, or even apply to the workshop. I can only apply to workshops after I get my ticket or I already missed the deadline? Thanks

submitted by /u/henrysky
[link] [comments]

Michigan Startup Gets Customer Traction for Conversational AI Research

In the wake of Amazon’s boffo Alexa voice debut, a University of Michigan team published pioneering research on building conversational AI, attracting a wave of customer interest.

Jason Mars, a professor advising them, suggested they form a startup. And with that, Ann Arbor, Michigan-based conversational AI startup Clinc was born.

Clinc’s conversational AI platform enables customers to build voice applications — like in-car voice features, fast-food restaurant order services or personal banking assistants.   

“What really tipped the scales to start something even bigger was that the industry was reaching out saying we want to commercialize it,” said Johann Hauswald, chief product officer and co-founder at Clinc, who recounted the company’s start five years ago.

Fast forward to today, and that’s turned into a big opportunity: Clinc has attracted a flood of customers and revenue.

The startup’s financial customers include Barclays, US Bank, S&P Global, and Turkey’s Ishbank, which taps Clinc to offer a personal finance assistant, dubbed Maxi, to 6 million users.

Large financial institutions are well-aware of Clinc, which has been “dominating the space,” said Mars, Clinc’s CEO, speaking on stage last year at TechCrunch Disrupt.

The company’s roster of customers doesn’t stop at finance. Clinc’s AI platform — built to handle voice assistants for any stage startup to a Fortune 500 company — can provide services for call centers, drive-thru restaurants, in-car systems, gaming and healthcare applications.

Breakthrough performance from NVIDIA’s AI platform has helped enable Clinc to push the boundaries on conversational AI to “deliver revolutionary services,” according to Mars.

Conversational AI Boom

To be sure, Clinc’s application-focused research stands out. It’s a mix of academic AI and how-to information for solving specific industry problems, which has attracted interest from some of the customers it’s landed to date.

Clinc raised a $52 million Series B round of funding earlier this year to help scale up to meet its customer demand.

Research firm Gartner forecasts that 15 percent of all customer service interactions will be handled by AI in 2021, a 400 percent jump from 2017.

Clinc: Talking Model Research

Academic discoveries are common launch pads for startups. But Clinc’s team at the University of Michigan built working models and provided the details for companies to develop their own voice models as well as spelled out the data center requirements to deliver the compute resources.

Clinc offers research, outlined in a published paper, on its Sirius voice personal assistant and an in-car assistant that it worked on with Ford aimed at applications for automakers.

Today it offers conversational AI in 80 languages and has production deployments on three continents.

Hardware to the Core

The Clinc team several years ago ran a cost-benefit analysis, finding that NVIDIA GPUs were the right choice for accelerated computing in the data center.

“GPUs were a big story in our research lab at the university,” said Hauswald.

Often times, complex applications require a multitude of complicated algorithms and optimizations to create the best performance possible, which is also the most compute intensive, he said.

“We want to be able to train our models in a way that doesn’t take days to train or then our customers are unable to iterate on the quality of them,” said Hauswald.

The post Michigan Startup Gets Customer Traction for Conversational AI Research appeared first on The Official NVIDIA Blog.

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Google’s mission is not just to organize the world’s information but to make it universally accessible, which means ensuring that our products work in as many of the world’s languages as possible. When it comes to understanding human speech, which is a core capability of the Google Assistant, extending to more languages poses a challenge: high-quality automatic speech recognition (ASR) systems require large amounts of audio and text data — even more so as data-hungry neural models continue to revolutionize the field. Yet many languages have little data available.

We wondered how we could keep the quality of speech recognition high for speakers of data-scarce languages. A key insight from the research community was that much of the “knowledge” a neural network learns from audio data of a data-rich language is re-usable by data-scarce languages; we don’t need to learn everything from scratch. This led us to study multilingual speech recognition, in which a single model learns to transcribe multiple languages.

In “Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model”, published at Interspeech 2019, we present an end-to-end (E2E) system trained as a single model, which allows for real-time multilingual speech recognition. Using nine Indian languages, we demonstrated a dramatic improvement in the ASR quality on several data-scarce languages, while still improving performance for the data-rich languages.

India: A Land of Languages
For this study, we focused on India, an inherently multilingual society where there are more than thirty languages with at least a million native speakers. Many of these languages overlap in acoustic and lexical content due to the geographic proximity of the native speakers and shared cultural history. Additionally, many Indians are bilingual or trilingual, making the use of multiple languages within a conversation a common phenomenon, and a natural case for training a single multilingual model. In this work, we combined nine primary Indian languages, namely Hindi, Marathi, Urdu, Bengali, Tamil, Telugu, Kannada, Malayalam and Gujarati.

A Low-latency All-neural Multilingual Model
Traditional ASR systems contain separate components for acoustic, pronunciation, and language models. While there have been attempts to make some or all of the traditional ASR components multilingual [1,2,3,4], this approach can be complex and difficult to scale. E2E ASR models combine all three components into a single neural network and promise scalability and ease of parameter sharing. Recent works have extended E2E models to be multilingual [1,2], but they did not address the need for real-time speech recognition, a key requirement for applications such as the Assistant, Voice Search and GBoard dictation. For this, we turned to recent research at Google that used a Recurrent Neural Network Transducer (RNN-T) model to achieve streaming E2E ASR. The RNN-T system outputs words one character at a time, just as if someone was typing in real time, however this was not multilingual. We built upon this architecture to develop a low-latency model for multilingual speech recognition.

[Left] A traditional monolingual speech recognizer comprising of Acoustic, Pronunciation and Language Models for each language. [Middle] A traditional multilingual speech recognizer where the Acoustic and Pronunciation model is multilingual, while the Language model is language-specific. [Right] An E2E multilingual speech recognizer where the Acoustic, Pronunciation and Language Model is combined into a single multilingual model.

Large-Scale Data Challenges
Using large-scale, real-world data for training a multilingual model is complicated by data imbalance. Given the steep skew in the distribution of speakers across the languages and speech product maturity, it is not surprising to have varying amounts of transcribed data available per language. As a result, a multilingual model can tend to be more influenced by languages that are over-represented in the training set. This bias is more prominent in an E2E model, which unlike a traditional ASR system, does not have access to additional in-language text data and learns lexical characteristics of the languages solely from the audio training data.

Histogram of training data for the nine languages showing the steep skew in the data available.

We addressed this issue with a few architectural modifications. First, we provided an extra language identifier input, which is an external signal derived from the language locale of the training data; i.e. the language preference set in an individual’s phone. This signal is combined with the audio input as a one-hot feature vector. We hypothesize that the model is able to use the language vector not only to disambiguate the language but also to learn separate features for separate languages, as needed, which helped with data imbalance.

Building on the idea of language-specific representations within the global model, we further augmented the network architecture by allocating extra parameters per language in the form of residual adapter modules. Adapters helped fine-tune a global model on each language while maintaining parameter efficiency of a single global model, and in turn, improved performance.

[Left] Multilingual RNN-T architecture with a language identifier. [Middle] Residual adapters inside the encoder. For a Tamil utterance, only the Tamil adapters are applied to each activation. [Right] Architecture details of the Residual Adapter modules. For more details please see our paper.

Putting all of these elements together, our multilingual model outperforms all the single-language recognizers, with especially large improvements in data-scarce languages like Kannada and Urdu. Moreover, since it is a streaming E2E model, it simplifies training and serving, and is also usable in low-latency applications like the Assistant. Building on this result, we hope to continue our research on multilingual ASRs for other language groups, to better assist our growing body of diverse users.

Acknowledgements
We would like to thank the following for their contribution to this research: Tara N. Sainath, Eugene Weinstein, Bo Li, Shubham Toshniwal, Ron Weiss, Bhuvana Ramabhadran, Yonghui Wu, Ankur Bapna, Zhifeng Chen, Seungji Lee, Meysam Bastani, Mikaela Grace, Pedro Moreno, Yanzhang (Ryan) He, Khe Chai Sim.

[D] How do you run your CPU-intensive ML tasks?

Hello

To be frank, I need a piece of advice. I intend to run a long task (preferably within a Jupyter Notebook) bound mainly by CPU. More specifically I want to run TPOT on a fairly large visual dataset. Now, as I do no physically own any machine suited for this kind of endeavor, I resort to the cloud. And, as I have some experience with AWS, I instinctively opted for AWS SageMaker to host some JupyterLab instance. One pain-point most of you have already experience is the fact that you need to keep the JupyerLab (or Notebook) window open at all times with a working connection in order not to lose the cell outputs. Keeping all that in mind, I left my computer open while I was gone for a couple of hours only to come back and realize my AWS auth token expired automatically and I was logged out from their platform. This also invalidated the connection with SageMaker notebook instance and made me conscious of all that hosting money that went down the drain.

Anyway, how do you guys run your ML tasks? Especially those on Jupyter Notebooks and CPU-intensive?

Side note: did anyone experiment with an EC2 instance hosting their Jupyter instance & not relying on AWS’ authentication?

submitted by /u/loopiezlol
[link] [comments]

[P] Experiments with Making Convincing AI-Generated Fake News w/ the new CTRL model

There has been pretty much zero talk/experiments of the new CTRL model, so I played around with it myself, and found that it could generate fake news convincingly! Certainly much better than GPT-2 in that area.

https://minimaxir.com/2019/09/ctrl-fake-news/

ICYMI, there’s now also a Colaboratory Notebook for CTRL, although your mileage may vary.

submitted by /u/minimaxir
[link] [comments]

[P] Comparing 7 Deep Dependency parsing models using Tensorflow

Trained on CONLL English Dependency, https://github.com/UniversalDependencies/UD_English-EWT. Train set to train, dev and test sets to test.

Stackpointer and Biaffine-attention originally from https://github.com/XuezheMax/NeuroNLP2 written in Pytorch.

Accuracy based on arc, types and root accuracies after 15 epochs only.

  1. Bidirectional RNN + CRF + Biaffine, arc accuracy 70.48%, types accuracy 65.18%, root accuracy 66.4%
  2. Bidirectional RNN + Bahdanau + CRF + Biaffine, arc accuracy 70.82%, types accuracy 65.33%, root accuracy 66.77%
  3. Bidirectional RNN + Luong + CRF + Biaffine, arc accuracy 71.22%, types accuracy 65.73%, root accuracy 67.23%
  4. BERT Base + CRF + Biaffine, arc accuracy 64.30%, types accuracy 62.89%, root accuracy 74.19%
  5. Bidirectional RNN + Biaffine Attention + Cross Entropy, arc accuracy 72.42%, types accuracy 63.53%, root accuracy 68.51%
  6. BERT Base + Biaffine Attention + Cross Entropy, arc accuracy 72.85%, types accuracy 67.11%, root accuracy 73.93%
  7. Bidirectional RNN + Stackpointer, arc accuracy 61.88%, types accuracy 48.20%, root accuracy 89.39%

Link to repository, https://github.com/huseinzol05/NLP-Models-Tensorflow#dependency-parser

Discussion

  1. Based on 15 epochs only.
  2. No dropout here, feel free to do it.
  3. BERT cannot implemented in Stackpointer model, stack pointer model required each decoder step.

submitted by /u/huseinzol05
[link] [comments]

[D] The problem with anthropomorphizing AI

When you start to humanize (current) AI technologies and describe them in ways you would talk about persons, you can draw all the wrong conclusions. This happens often when we see AI algorithms perform tasks that were previously thought to be off-limits for computers, such as playing Go or detecting cancer or converting text to speech.

Without a reality check on the capabilities and limits of current AI, we tend to have trumped up expectations of what AI can do for us, and become disenchanted when those expectations aren’t met.

https://bdtechtalks.com/2019/01/02/humanizing-ai-deep-learning-alphazero/

submitted by /u/bendee983
[link] [comments]

Deep Dynamics Models for Dexterous Manipulation

<!–

–>



Figure 1: Our approach (PDDM) can efficiently and effectively learn complex
dexterous manipulation skills in both simulation and the real world. Here, the
learned model is able to control the 24-DoF Shadow Hand to rotate two
free-floating Baoding balls in the palm, using just 4 hours of real-world data
with no prior knowledge/assumptions of system or environment dynamics.

Dexterous manipulation with multi-fingered hands is a grand challenge in
robotics: the versatility of the human hand is as yet unrivaled by the
capabilities of robotic systems, and bridging this gap will enable more general
and capable robots. Although some real-world tasks (like picking up a
television remote or a screwdriver) can be accomplished with simple parallel
jaw grippers, there are countless tasks (like functionally using the remote to
change the channel or using the screwdriver to screw in a nail) in which
dexterity enabled by redundant degrees of freedom is critical. In fact,
dexterous manipulation is defined as being object-centric, with the goal
of controlling object movement through precise control of forces and motions
— something that is not possible without the ability to simultaneously impact
the object from multiple directions. For example, using only two fingers to
attempt common tasks such as opening the lid of a jar or hitting a nail with a
hammer would quickly encounter the challenges of slippage, complex contact
forces, and underactuation. Although dexterous multi-fingered hands can indeed
enable flexibility and success of a wide range of manipulation skills, many of
these more complex behaviors are also notoriously difficult to control: They
require finely balancing contact forces, breaking and reestablishing contacts
repeatedly, and maintaining control of unactuated objects. Success in such
settings requires a sufficiently dexterous hand, as well as an intelligent
policy that can endow such a hand with the appropriate control strategy. We
study precisely this in our work on Deep Dynamics Models for Learning Dexterous
Manipulation.

Continue reading