Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Author: torontoai

[P] A Visual Guide to Using BERT for the First Time

Hi r/MachineLearning,

I wrote a blog post that I hope could be the gentlest way for you to start playing with BERT for the first time;

https://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/

It uses a lighter version of BERT (the distilled version from HuggingFace, distilBERT) to do sentence embedding, then uses Scikit Learn for Linear Regression classification. As a first exposure to BERT, I’m having people use the general trained model and not worry about fine-tuning for now. After getting people through this initial hump, I’m hoping readers would get more comfortable doing more exploration and poking around with the model and its usecases.

I hope you enjoy it. All feedback/corrections are appreciated.

submitted by /u/nortab
[link] [comments]

[P] I re-implemented Hyperband, check it out!

[P] I re-implemented Hyperband, check it out!

Hyperband is a state-of-the-art algorithm for hyperparameter tunning that focuses on resource efficiency. It does so by encorperating early-stopping into it’s strategy. Here are some of the results:

For more, go here: http://www.argmin.net/2016/06/23/hyperband/

I was unable to find any great implementations of hyperband, so I implemented it! Here it is: https://gist.github.com/PetrochukM/2c5fae9daf0529ed589018c6353c9f7b

The implementation is commented and documented to help ensure correctness and improve code readability.

I believe I improve hyperband by allowing support for model checkpoints. The original hyperband assumed that each model was trained from scratch instead of checkpointing. We don’t need to train the same model with the same hyperparameters over and over again!

Finally, I also explored other improvements to hyperband like splitting based on the largest performance gap instead of splitting in half the search space every time.

submitted by /u/Deepblue129
[link] [comments]

[D] Where are the good machine learning books for practitioners?

For beginners there’s PRML by Bishop and maybe Understanding Machine Learning by Shai2 but for advanced readers or those interested in the deep learning and GAN research landscape (and how to apply it) there really isn’t anything good out there.

I personally don’t like Goodfellow’s Deep Learning book. I wish there was a good deep-dive out there but there just isn’t what I need.

I think Andrej Karpathy is a good writer, kind of wish he could throw something together!

submitted by /u/Ctown_struggles00
[link] [comments]

[D] Chinese government uses machine learning not only for surveillance, but also for predictive policing and for deciding who to arrest in Xinjiang

Link to story

This post is not an ML research related post. I am posting this because I think it is important for the community to see how research is applied by authoritarian governments to achieve their goals. It is related to a few previous popular posts on this thread with high upvotes, which prompted me to post this story.

Previous related stories:

The story reports the details of a new leak of highly classified Chinese government documents reveals the operations manual for running the mass detention camps in Xinjiang and exposed the mechanics of the region’s system of mass surveillance.

The lead journalist‘s summary of findings

The China Cables represent the first leak of a classified Chinese government document revealing the inner workings of the detention camps, as well as the first leak of classified government documents unveiling the predictive policing system in Xinjiang.

The leak features classified intelligence briefings that reveal, in the government’s own words, how Xinjiang police essentially take orders from a massive “cybernetic brain” known as IJOP, which flags entire categories of people for investigation & detention.

These secret intelligence briefings reveal the scope and ambition of the government’s AI-powered policing platform, which purports to predict crimes based on computer-generated findings alone. The result? Arrest by algorithm.

The article describe methods used for algorithmic policing

The classified intelligence briefings reveal the scope and ambition of the government’s artificial-intelligence-powered policing platform, which purports to predict crimes based on these computer-generated findings alone. Experts say the platform, which is used in both policing and military contexts, demonstrates the power of technology to help drive industrial-scale human rights abuses.

“The Chinese have bought into a model of policing where they believe that through the collection of large-scale data run through artificial intelligence and machine learning that they can, in fact, predict ahead of time where possible incidents might take place, as well as identify possible populations that have the propensity to engage in anti-state anti-regime action,” said Mulvenon, the SOS International document expert and director of intelligence integration. “And then they are preemptively going after those people using that data.”

In addition to the predictive policing aspect of the article, there are side articles about the entire ML stack, including how mobile apps are used to target Uighurs, and also how the inmates are re-educated once inside the concentration camps. The documents reveal how every aspect of a detainee’s life is monitored and controlled.

Note: My motivation for posting this story is to raise ethical concerns and awareness in the research community. I do not want to heighten levels of racism towards the Chinese research community (not that it may matter, but I am Chinese). See this thread for some context about what I don’t want these discussions to become.

I am aware of the fact that the Chinese government’s policy is to integrate the state and the people as one, so accusing the party is perceived domestically as insulting the Chinese people, but I also believe that we as a research community is intelligent enough to be able to separate government, and those in power, from individual researchers. We as a community should keep in mind that there are many Chinese researchers (in mainland and abroad) who are not supportive of the actions of the CCP, but they may not be able to voice their concerns due to personal risk.

submitted by /u/sensetime
[link] [comments]

[P] Machine Learning Systems Design (open source book by @chipro)

An open source book compiled by Chip Huyen. Feel free to contribute:

This booklet covers four main steps of designing a machine learning system:

  1. Project setup

  2. Data pipeline

  3. Modeling: selecting, training, and debugging

  4. Serving: testing, deploying, and maintaining

It comes with links to practical resources that explain each aspect in more details. It also suggests case studies written by machine learning engineers at major tech companies who have deployed machine learning systems to solve real-world problems.

At the end, the booklet contains 27 open-ended machine learning systems design questions that might come up in machine learning interviews. The answers for these questions will be published in the book Machine Learning Interviews.

project: https://github.com/chiphuyen/machine-learning-systems-design

PDF: https://github.com/chiphuyen/machine-learning-systems-design/blob/master/build/build1/consolidated.pdf

submitted by /u/hardmaru
[link] [comments]

[R] Understanding the generalization of “lottery tickets” in neural networks

Sharing our recent blog post summarizing some of our recent work understanding the boundaries of the lottery ticket hypothesis. In particular, we make some progress towards the following questions:

  • Do winning ticket initializations contain generic inductive biases or are they overfit to the particular dataset and optimizer used to generate them?
  • Is the lottery ticket phenomenon limited to supervised image classification, or is it also present in other domains like RL and NLP?
  • Can we begin to explain lottery tickets theoretically?

The blog post is below:

Understanding the generalization of “lottery tickets” in neural networks

And the papers covered can be found here:

One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers

Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP

Luck Matters: Understanding Training Dynamics of Deep ReLU Networks

Student Specialization in Deep ReLU Networks With Finite Width and Input Dimension

submitted by /u/arimorcos
[link] [comments]

Amazon Transcribe now supports speech-to-text in 31 languages

We recently announced that Amazon Transcribe now supports transcription for audio and video for 7 additional languages including Gulf Arabic, Swiss German, Hebrew, Japanese, Malay, Telugu, and Turkish languages.  Using Amazon Transcribe, customers can now take advantage of 31 supported languages for transcription use cases such as improving customer service, captioning and subtitling, meeting accessibility requirements, and cataloging audio archives.

Using Amazon Transcribe

Amazon Transcribe is an easy-to-use automatic speech recognition (ASR) service that makes it easy to analyze audio files and convert those into text that includes enrichment such as speaker identification, timestamp generation, punctuation, and formatting. With the recent announcement, customers can now transcribe audio from even more languages.

Using the AWS Management Console, let’s check out one of the latest languages in action. Amazon Transcribe allows users to transcribe streaming audio or perform asynchronous transcription.  For this, we will create a job for asynchronous transcription using an audio file stored in Amazon S3 as input.

Upon completion of the job, the audio-to-text transcription is provided in the response.   For those of us that don’t know Turkish, you can then run your transcribed text through the Amazon Translate service to translate the transcribed text into your preferred language.

In the example above, we are using the console to create the transcription job; however, customers can also programmatically submit transcription jobs using the Amazon Transcribe APIs.   The APIs are available in the AWS SDKs.  The example below demonstrates invoking the Amazon Transcribe APIs through the AWS CLI:

Start Transcription Job

$ aws transcribe start-transcription-job --transcription-job-name Transcribe-Turkish-Audio-CLI --language-code tr-TR --media MediaFileUri=s3://whats-new-transcribe/Turkish-Audio.mp3 --media-format mp3 --region us-east-1
{
    "TranscriptionJob": {
        "TranscriptionJobName": "Transcribe-Turkish-Audio-CLI", 
        "LanguageCode": "tr-TR", 
        "TranscriptionJobStatus": "IN_PROGRESS", 
        "Media": {
            "MediaFileUri": "s3://whats-new-transcribe/Turkish-Audio.mp3"
        }, 
        "CreationTime": 1574392674.948, 
        "MediaFormat": "mp3"
    }
}

Get Information about Transcription Job

$ aws transcribe get-transcription-job --transcription-job-name Transcribe-Turkish-Audio-CLI --region us-east-1
{
    "TranscriptionJob": {
        "TranscriptionJobName": "Transcribe-Turkish-Audio-CLI", 
        "LanguageCode": "tr-TR", 
        "MediaSampleRateHertz": 22050, 
        "TranscriptionJobStatus": "COMPLETED", 
        "Settings": {
            "ChannelIdentification": false
        }, 
        "Media": {
            "MediaFileUri": "s3://whats-new-transcribe/Turkish-Audio.mp3"
        }, 
        "CreationTime": 1574392674.948, 
        "CompletionTime": 1574392772.813, 
        "MediaFormat": "mp3", 
        "Transcript": {
            "TranscriptFileUri": "https://s3.amazonaws.com/aws-transcribe-us-east-1-prod/"
        }
    }
}

Because we did not explicitly specify a bucket to direct transcription output to, the transcription result is provided via a presigned URL that provides secure access to that transcription.  Using the TranscriptFileUri returned on output, we can view/parse that JSON object returned for the transcript text returned.

The transcribed text can then be used for a variety of use cases such as input into Amazon Comprehend for identification of key phrases and key entities such as names, organization names, or as input into Amazon Translate as shown above for translation into one or more languages.

Amazon Transcribe and Amazon Translate for multilingual subtitles

The combined capability of using Amazon Transcribe and Amazon Translate allows customers to quickly transcribe audio as well as convert it into multiple target languages for meeting globalization requirements as well as use cases such as extended global reach on videos or adding subtitles to training videos for your organization.  These capabilities can be extended into multilingual subtitles for videos and podcasts. Providing these capabilities allows users to extend global reach by including more language options for broader audiences.

AWS Architecture provides a quick start solution and deployment guide customers can leverage for Live Streaming with Automated Multi-Language Subtitling.   This real-time subtitling solution for live streaming video generates multi-language subtitles for live streaming videos using Amazon Transcribe for audio-to-text and Amazon Translate for language translation.

Available Now!

These new languages are available today in all Regions where Amazon Transcribe is available. The free tier offers 60 minutes per month for the first 12 months, starting from your first transcription request.

We’re looking forward to your feedback! Please post it to the AWS Forum for Amazon Transcribe, or send it to your usual AWS Support contacts.


About the author

Shelbee Eigenbrode is a solutions architect at Amazon Web Services (AWS). Her current areas of depth include DevOps combined with machine learning and artificial intelligence. She’s been in technology for 22 years, spanning multiple roles and technologies. In her spare time she enjoys reading, spending time with her family, friends and her fur family (aka. dogs).

 

 

 

 

 

 

 

 

 

Engage listeners with Amazon Polly’s Conversational speaking style voices

All voices are unique, yet speakers tend to adjust their delivery, or speaking style, according to their context and audience. Before Amazon Polly used Neural Text-to-Speech technology (NTTS) to build voices, TTS (Standard Text-to-Speech) voices couldn’t change their speech patterns to match any particular speaking style. When Amazon Polly introduced NTTS, Newscaster voices were launched as the first speaking style.

Matthew and Joanna, two of the US English voices in the Amazon Polly portfolio, are now also available in a Conversational speaking style, which simulates the speech patterns of a friendly conversation. Similar to how people learn to talk as a children, TTS voices acquire intonation patterns from natural speech data, then try to reproduce synthesized utterances in similar manners. Amazon Polly’s NTTS technology, a neural network-based machine learning model, makes this learning possible. It is capable of picking up nuances in various speaking styles and applying them when synthesizing text into speech.

Pillo Health is a startup that uses Amazon Polly to voice their in-home devices. Paige Baeder, Pillo Health’s product manager, says, “Pillo Health serves individuals who manage chronic conditions in the comfort of their home. Maintaining our community’s trust starts with each daily interaction. The Conversational version of Amazon Polly’s Joanna voice provides clarity and expression that inspires trust and is easy to understand, allowing us to connect with our users through a voice that brings Pillo (our in-home companion device) persona to life. Making the decision to switch to Joanna in Amazon Polly was easy—it was the top pick amongst all of our voice testers.”

Unlike traditional synthesis approaches that rely heavily on constructed rules, NTTS builds its own model based on given training data. Dynamic intonation and expressiveness used to be obstacles because linguistic rules could not cover them, but now they are the key to voices sounding natural in NTTS. The system needs to recognize the diversity in speech, in order to mimic it when generating speech. In the studio, Amazon Polly’s voice talents record in an engaging tone, as they would when they engage in normal day-to-day conversation. A few characteristics of natural speech include reduced syllables, pitch change, pausing, and contractions. The recording script for training data is carefully designed based on common utterances, which helps deliver natural speech data.

The Conversational speaking style feature generally makes neural voices sound more friendly and expressive. For example, listen to the following audio sample from Matthew in the Conversational speaking style, as compared to the neutral neural style (speaking-style free):

Neutral sample (Matthew)

Listen now

Voiced by Amazon Polly

Conversational sample (Matthew)

Listen now

Voiced by Amazon Polly

In the Conversational speech sample, the word “sorry” is emphasized with a slight pause and a stress, which sounds more empathetic in this given situation. The question also sounds more friendly in the Conversational version, providing a better user experience.

Here’s Joanna introducing the Conversational style:

Neutral sample (Joanna)

Listen now

Voiced by Amazon Polly

Conversational sample (Joanna)

Listen now

Voiced by Amazon Polly

To synthesize the Conversational style, make sure to enclose the input with the following SSML tag and set the text type to ssml in the command line:

<speak>
<amazon:domain name="conversational">
We are excited to share that Matthew and Joanna, the US English voices available in Polly, sound more natural thanks to the conversational style.
</amazon:domain>
</speak>
$ aws polly start-speech-synthesis-task
       --voice-id Joanna --engine neural
       --text file://s3.ssml --text-type ssml
       --output-s3-bucket-name "polly-conversational-synth" --output-format mp3
       --query "SynthesisTask.TaskId"
       "14e73ba4-ec52-4811-b597-9b07a368c213"
$ wget https://polly-conversational-synth.s3.amazonaws.com/14e73ba4-ec52-4811-b597-9b07a368c213.mp3 -O joanna-conversational.mp3

You can trigger the Conversational speaking style with US English voices Matthew and Joanna within the Amazon Polly console, AWS CLI, or SDK. The feature is currently available in US East (N. Virginia), US West (Oregon), and EU (Ireland) Regions. For more information, see What Is Amazon Polly?


About the author

Chiao-ting Fang is a TTS language engineer for Amazon text-to-speech. She enjoys applying her linguistic knowledge at work to build better, more natural-sounding voices. She loves languages, traveling, and star-gazing.