Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Global

BERT Does Europe: AI Language Model Learns German, Swedish

BERT is at work in Europe, tackling natural-language processing jobs in multiple industries and languages with help from NVIDIA’s products and partners.

The AI model formally known as Bidirectional Encoder Representations from Transformers debuted just last year as a state-of-the-art approach to machine learning for text. Though new, BERT is already finding use in avionics, finance, semiconductor and telecom companies on the continent, said developers optimizing it for German and Swedish.

“There are so many use cases for BERT because text is one of the most common data types companies have,” said Anders Arpteg, head of research for Peltarion, a Stockholm-based developer that aims to make the latest AI techniques such as BERT inexpensive and easy for companies to adopt.

Natural-language processing will outpace today’s AI work in computer vision because “text has way more apps than images — we started our company on that hypothesis,” said Milos Rusic, chief executive of deepset in Berlin. He called BERT “a revolution, a milestone we bet on.”

Deepset is working with PricewaterhouseCoopers to create a system that uses BERT to help strategists at a chip maker query piles of annual reports and market data for key insights. In another project, a manufacturing company is using NLP to search technical documents to speed maintenance of their products and predict needed repairs.

Peltarion, a member of NVIDIA’s Inception program that nurtures startups with access to its technology and ecosystem, packed support for BERT into its tools in November. It is already using NLP to help a large telecom company automate parts of its process for responding to product and service requests. And it’s using the technology to let a large market research company more easily query its database of surveys.

Work in Localization

Peltarion is collaborating with three other organizations on a three-year, government-backed project to optimize BERT for Swedish. Interestingly, a new model from Facebook called XLM-R suggests training on multiple languages at once could be more effective than optimizing for just one.

“In our initial results, XLM-R, which Facebook trained on 100 languages at once, outperformed a vanilla version of BERT trained for Swedish by a significant amount,” said Arpteg, whose team is preparing a paper on their analysis.

Nevertheless, the group hopes to have before summer a first version of a Swedish BERT model that performs really well, said Arpteg, who headed up an AI research group at Spotify before joining Peltarion three years ago.

An analysis by deepset of its German version of BERT.

In June, deepset released as open source a version of BERT optimized for German. Although its performance is only a couple percentage points ahead of the original model, two winners in an annual NLP competition in Germany used the deepset model.

Right Tool for the Job

BERT also benefits from optimizations for specific tasks such as text classification, question answering and sentiment analysis, said Arpteg. Peltarion researchers plans to publish in 2020 results of an analysis of gains from tuning BERT for areas with their own vocabularies such as medicine and legal.

The question-answering task has become so strategic for deepset it created Haystack, a version of its FARM transfer-learning framework to handle the job.

In hardware, the latest NVIDIA GPUs are among the favorite tools both companies use to tame big NLP models. That’s not surprising given NVIDIA recently broke records lowering BERT training time.

“The vanilla BERT has 100 million parameters and XML-R has 270 million,” said Arpteg, whose team recently purchased systems using NVIDIA Quadro and TITAN GPUs with up to 48GB of memory. It also has access to NVIDIA DGX-1 servers because “for training language models from scratch, we need these super-fast systems,” he said.

More memory is better, said Rusic, whose German BERT models weigh in at 400MB. Deepset taps into NVIDIA V100 Tensor Core 100 GPUs on cloud services and uses another NVIDIA GPU locally.

The post BERT Does Europe: AI Language Model Learns German, Swedish appeared first on The Official NVIDIA Blog.

Playing Pod: Our Top 5 AI Podcast Episodes of 2019

If it’s worth doing, it’s worth doing with AI.

2019 was the year deep-learning driven AI made the leap from the rarefied world of elite computer scientists and the world’s biggest web companies to the rest of us.

Everyone from startups to hobbyists to researchers are picking up powerful GPUs and putting this new kind of computing to work. And they’re doing amazing things.

And with NVIDIA’s AI Podcast, now in its third year, we’re bringing the people behind these wonders to more listeners than ever, with the podcast reaching more than 650,000 downloads in 2019.

Here are the episodes that were listener favorites in 2019.

A Man, a GAN, and a 1080 Ti: How Jason Antic Created ‘De-Oldify’

You don’t need to be an academic or to work for a big company to get into deep learning. You can just be a guy with a NVIDIA GeForce 1080 Ti and a generative adversarial network. Jason Antic, who describes himself as “a software guy,” began digging deep into GANs. Next thing you know, he’s created an increasingly popular tool that colors old black-and-white shots. Interested in digging into AI for yourself? Listen and get inspired.

Sort Circuit: How GPUs Helped One Man Conquer His Lego Pile

At some point in life, every man faces the same great challenge: sorting out his children’s Lego pile. Thanks to GPU-driven deep learning, Francisco “Paco” Garcia is one of the few men who can say they’ve conquered it. Here’s how.

UC Berkeley’s Pieter Abbeel on How Deep Learning Will Help Robots Learn

Robots can do amazing things. Compare even the most advanced robots to a three-year-old, however, and they can come up short. UC Berkeley Professor Pieter Abbeel has pioneered the idea that deep learning could be the key to bridging that gap: creating robots that can learn how to move through the world more fluidly and naturally. We caught up with Abbeel, who is director of the Berkeley Robot Learning Lab and cofounder of Covariant AI, a Bay Area company developing AI software that makes it easy to teach robots new and complex skills, at GTC 2019.

How the Breakthrough Listen Harnessed AI in the Search for Aliens

UC Berkeley’s Gerry Zhang talks about his work using deep learning to analyze signals from space for signs of intelligent extraterrestrial civilizations. And while we haven’t found aliens, yet, the doctoral student has already made some extraordinary discoveries.

How AI Helps GOAT Keep Sneakerheads a Step Ahead

GOAT Group helps sneaker enthusiasts get their hands on authentic Air Jordans, Yeezys and a variety of old-school kicks with the help of AI. Michael Hall, director of data at GOAT Group, explains how in a conversation with AI Podcast host and raging sneakerhead Noah Kravitz.

Tune in to the AI Podcast

We’re available through iTunesGoogle Play MusicGoogle PodcastsCastboxCastro, DoggCatcher, OvercastPodbayPocket Casts, PodCruncher, PodKicker, SpotifyStitcher and Soundcloud. If your favorite isn’t listed here, drop us a note.

  The AI Podcast SpotifyThe AI Podcast Google Podcasts The AI Podcast Apple Podcasts

Make the AI Podcast Better

Have a few minutes to spare? Fill out this listener survey. Your answers will help us make a better podcast.

The post Playing Pod: Our Top 5 AI Podcast Episodes of 2019 appeared first on The Official NVIDIA Blog.

ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

Ever since the advent of BERT a year ago, natural language research has embraced a new paradigm, leveraging large amounts of existing text to pretrain a model’s parameters using self-supervision, with no data annotation required. So, rather than needing to train a machine-learning model for natural language processing (NLP) from scratch, one can start from a model primed with knowledge of a language. But, in order to improve upon this new approach to NLP, one must develop an understanding of what, exactly, is contributing to language-understanding performance — the network’s height (i.e., number of layers), its width (size of the hidden layer representations), the learning criteria for self-supervision, or something else entirely?

In “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations”, accepted at ICLR 2020, we present an upgrade to BERT that advances the state-of-the-art performance on 12 NLP tasks, including the competitive Stanford Question Answering Dataset (SQuAD v2.0) and the SAT-style reading comprehension RACE benchmark. ALBERT is being released as an open-source implementation on top of TensorFlow, and includes a number of ready-to-use ALBERT pre-trained language representation models.

What Contributes to NLP Performance?
Identifying the dominant driver of NLP performance is complex — some settings are more important than others, and, as our study reveals, a simple, one-at-a-time exploration of these settings would not yield the correct answers.

The key to optimizing performance, captured in the design of ALBERT, is to allocate the model’s capacity more efficiently. Input-level embeddings (words, sub-tokens, etc.) need to learn context-independent representations, a representation for the word “bank”, for example. In contrast, hidden-layer embeddings need to refine that into context-dependent representations, e.g., a representation for “bank” in the context of financial transactions, and a different representation for “bank” in the context of river-flow management.

This is achieved by factorization of the embedding parametrization — the embedding matrix is split between input-level embeddings with a relatively-low dimension (e.g., 128), while the hidden-layer embeddings use higher dimensionalities (768 as in the BERT case, or more). With this step alone, ALBERT achieves an 80% reduction in the parameters of the projection block, at the expense of only a minor drop in performance — 80.3 SQuAD2.0 score, down from 80.4; or 67.9 on RACE, down from 68.2 — with all other conditions the same as for BERT.

Another critical design decision for ALBERT stems from a different observation that examines redundancy. Transformer-based neural network architectures (such as BERT, XLNet, and RoBERTa) rely on independent layers stacked on top of each other. However, we observed that the network often learned to perform similar operations at various layers, using different parameters of the network. This possible redundancy is eliminated in ALBERT by parameter-sharing across the layers, i.e., the same layer is applied on top of each other. This approach slightly diminishes the accuracy, but the more compact size is well worth the tradeoff. Parameter sharing achieves a 90% parameter reduction for the attention-feedforward block (a 70% reduction overall), which, when applied in addition to the factorization of the embedding parameterization, incur a slight performance drop of -0.3 on SQuAD2.0 to 80.0, and a larger drop of -3.9 on RACE score to 64.0.

Implementing these two design changes together yields an ALBERT-base model that has only 12M parameters, an 89% parameter reduction compared to the BERT-base model, yet still achieves respectable performance across the benchmarks considered. But this parameter-size reduction provides the opportunity to scale up the model again. Assuming that memory size allows, one can scale up the size of the hidden-layer embeddings by 10-20x. With a hidden-size of 4096, the ALBERT-xxlarge configuration achieves both an overall 30% parameter reduction compared to the BERT-large model, and, more importantly, significant performance gains: +4.2 on SQuAD2.0 (88.1, up from 83.9), and +8.5 on RACE (82.3, up from 73.8).

These results indicate that accurate language understanding depends on developing robust, high-capacity contextual representations. The context, modeled in the hidden-layer embeddings, captures the meaning of the words, which in turn drives the overall understanding, as directly measured by model performance on standard benchmarks.

Optimized Model Performance with the RACE Dataset
To evaluate the language understanding capability of a model, one can administer a reading comprehension test (e.g., similar to the SAT Reading Test). This can be done with the RACE dataset (2017), the largest publicly available resource for this purpose. Computer performance on this reading comprehension challenge mirrors well the language modeling advances of the last few years: a model pre-trained with only context-independent word representations scores poorly on this test (45.9; left-most bar), while BERT, with context-dependent language knowledge, scores relatively well with a 72.0. Refined BERT models, such as XLNet and RoBERTa, set the bar even higher, in the 82-83 score range. The ALBERT-xxlarge configuration mentioned above yields a RACE score in the same range (82.3), when trained on the base BERT dataset (Wikipedia and Books). However, when trained on the same larger dataset as XLNet and RoBERTa, it significantly outperforms all other approaches to date, and establishes a new state-of-the-art score at 89.4.

Machine performance on the RACE challenge (SAT-like reading comprehension). A random-guess baseline score is 25.0. The maximum possible score is 95.0.

The success of ALBERT demonstrates the importance of identifying the aspects of a model that give rise to powerful contextual representations. By focusing improvement efforts on these aspects of the model architecture, it is possible to greatly improve both the model efficiency and performance on a wide range of NLP tasks. To facilitate further advances in the field of NLP, we are open-sourcing ALBERT to the research community.

Using Amazon Lex Conversation logs to monitor and improve interactions

As a product owner for a conversational interface, understanding and improving the user experience without the corresponding visibility or telemetry can feel like driving a car blindfolded. It is important to understand how users are interacting with your bot so that you can continuously improve the bot based on past interactions. You can gain these actionable insights by monitoring bot conversations. To capture user input, you could write custom logic in your application, but building and managing additional code and associated infrastructure is both cumbersome and time-consuming. You would also need to make sure that the custom logic does not increase the latency for end-users.

We are excited to announce Conversation logs for Amazon Lex, which enables you to natively save interactions with your bot. You can configure logging text input to Amazon CloudWatch Logs and audio input to Amazon S3. In addition to the user input and bot responses, the logs contain information such as matched intent and missed utterances. To protect sensitive data captured as slot values, you can enable slot obfuscation to mask those values for logging.

You can use Conversation logs to track utterances that were not mapped to any configured intent. These missed utterances allow you to improve the bot design. Now that conversation transcripts for the entire session are available, you can better analyze the conversation flow, improve conversational design, and increase user engagement.

This post demonstrates how to enable conversation logs, obfuscate sensitive slots, and set up advanced monitoring capabilities for your bot.

Building a bot

This post uses the following conversation to model a bot for an auto loan:

User: I’d like to check the outstanding balance on my car loan.
Agent: Sure, can you provide me your account number?
User: It is 12345678.
Agent: Can you please confirm the last four digits of your SSN for verification?
User: 1234
Agent: Thank you for the information. The balance on your car loan for account 12345678 is $12,345.
User: Ok thanks.

Build the Amazon Lex bot AutoLoanBot (download) with the following intents:

  • ApplyLoan – Elicits necessary information, such as name and SSN, and creates a new request.
  • PayInstallment – Captures the user’s account number, the last four digits of the user’s SSN, and payment information, and processes the monthly installment.
  • CheckBalance – Elicits the user’s account number and the last four digits of their SSN and provides the outstanding balance.
  • Fallback – Captures any input that the bot cannot process by the configured intents.

Create an alias Prod and publish a version of the bot using this alias.

Enabling Conversation logs

You can create Conversation logs for both text and audio.

Text logs

Before configuring Conversation text logs, create a new log group car-loan-bot-text-logs in CloudWatch Logs. As an optional step, you can encrypt the text logs. For more information, see Encrypt Log Data in CloudWatch Logs Using AWS KMS. Additionally, create the IAM role LexCarLoanBotRole to provide write permissions to the car-loan-bot-text-logs log group.

To set up text logs for a bot alias, complete the following steps:

  1. On the Amazon Lex console, choose the AutoLoanBot.
  2. Choose Settings.
  3. Choose Conversation logs.
  4. Choose the Settings gear icon that corresponds to the Prod
  5. Under Log type, choose Text logs.
  6. For Log group name, select car-loan-bot-text-logs from the dropdown list.
  7. For IAM role, select LexCarLoanBotRole from the dropdown list.
  8. Choose Save.

Audio logs

Before configuring Conversation audio logs, create the S3 bucket car-loan-bot-audio-logs. As an optional step, you can encrypt the audio logs with a KMS customer managed CMK. To do so, create the new KMS key car-loan-bot-audio-logs-kms-key. Additionally, create the IAM role LexCarLoanBotRole to provide write permissions to the car-loan-bot-audio-logs S3 bucket and access permissions to car-loan-bot-audio-logs-kms-key.

To enable audio logging, complete the following steps:

  1. On the Amazon Lex console, choose the AutoLoanBot.
  2. Choose Settings.
  3. Choose Conversation logs.
  4. Choose the Settings gear icon that corresponds to the Prod
  5. For Log type, select Audio logs.
  6. For S3 bucket, select car-loan-bot-audio-logs from the dropdown menu.
  7. For KMS key, select car-loan-bot-audio-logs-kms-key from the dropdown menu.
  8. For IAM role, select LexCarLoanBotRole from the dropdown menu.
  9. Choose Save.

You can modify or disable Conversation logs settings for an alias by choosing the Settings gear icon. If you are enabling both text and audio logs, the LexCarLoanBotRole must have write permissions to car-loan-bot-text-logs log group and car-loan-bot-audio-logs S3 bucket.

Marking sensitive slots as obfuscated

To protect sensitive data captured as slot values, you can enable slot obfuscation to mask those values for logging. To mask the SSN or last four digits of the SSN in the Conversation logs, mark them as obfuscated in Applyloan, PayInstallment, and CheckBalance. Complete the following steps:

  1. On the Amazon Lex console, choose AutoLoanBot.
  2. Under Intents, choose CheckBalance.
  3. Choose the Settings gear icon corresponding to the SSNFourDigit slot.

The popup SSNFourDigit settings appears.

  1. For Slot Obfuscation section, choose Store as {SSNFourDigit}.
  2. Choose Save.
  3. Choose Build.
  4. Choose Publish using the Prod alias.

Your bot is now ready to deploy.

Reviewing Conversation logs data

After you enable Conversation logs for the alias Prod, the user input is saved in CloudWatch Logs log group in the following format, with the SSN slot obfuscated:

    "messageVersion": "1.0",
    "botName": "AutoLoanBot",
    "botAlias": "Prod",
    "botVersion": "2",
    "inputTranscript": "Yes",
    "botResponse": "Thank you, your application for a car loan of $50000 has been submitted.",
    "intent": "ApplyLoan",
    "slots": {
        "DateOfBirth": "1990-01-01",
        "LoanAmount": "50000",
        "Address": "1234 First Avenue",
        "FirstName": "John",
        "PhoneNumber": "1234567890",
        "LastName": "Doe",
        "SSN": "{SSN}"
    "missedUtterance": false,
    "inputDialogMode": "Speech",
    "requestId": "24fcb5b5-fb84-4fb0-90ad-3e13a3e7bada",
    "s3PathForAudio": "<bucket-name>/aws/lex/AutoLoanBot/Prod/2/5f13cab7-cac2-42ff-a382-3918d21239fa/2019-12-17T18:32:23.435Z-iMfpxOMK/ae3954d6-f999-4668-bf14-0671ab2f10ea.wav"
    "userId": "User4",
    "sessionId": "2019-12-15T02:55:45.746Z-ztLBPmkJ"

Similarly, you can navigate to the S3 bucket path present in s3PathForAudio to review audio logs. The following screenshot shows the audio files stored in your S3 bucket

Improving bot performance with missed utterances

Now that you have set up Conversation logs and verified that user input is saved, you can use these logs to improve conversational experiences. Conversation logs provide you the details about user inputs that the bot didn’t recognize. These missed utterances can provide useful insights to better train your bot. Also, you can now prioritize new capabilities for your bot.

To generate a list of missed utterances for your AutoLoanBot, complete the following steps:

  1. On the AWS Management Console, choose CloudWatch.
  2. Under Logs, choose Insights.
  3. From the dropdown list, choose car-loan-bot-text-logs.
  4. To extract all the utterances mapped to the Fallback intent, run the following query:
    fields inputTranscript 
    | filter (intent == "Fallback")

    Alternatively, if you do not have a Fallback intent configured for your bot, you can query for the field missedUtterance with the following code:

    fields inputTranscript 
    | filter (missedUtterance == 1)

Diving deep into conversations

With Conversation logs, you have access to the entire interaction for all your sessions. As you go through the interactions, you can identify gaps, make design changes, and deploy better flows.

Amazon Lex uses the sessionId attribute to log every entry that belongs to the same session. You can use sessionId to drill down into a single conversation.

To monitor conversations across different sessions, complete the following steps:

  1. On the console, choose CloudWatch.
  2. Under Logs, choose Insights.
  3. From the dropdown list, choose car-loan-bot-text-logs.
  4. To generate a list of unique sessionId and number of turns per conversation, run the following query:
    fields sessionId
    | stats count(sessionId) as NumberOfUtterances by sessionId

    The following screenshot shows the output of this query. There are several sessionId queries and the number of utterances logged.

  5. For each listed sessionId, view the complete conversation by running the following query:
    fields inputTranscript, botResponse 
    | filter sessionId == "session id" 
    | sort @timestamp asc

    The following screenshot shows the output of this query. It displays both the user and bot interactions.


You can use Conversation logs to capture useful insights from user conversations and use these insights to improve your bot performance for enhancing user experience. You can also use the conversational data for auditing purposes. Get started with Conversation logs today!

About the Authors

Anubhav Mishra is a Product Manager with AWS. He spends his time understanding customers and designing product experiences to address their business challenges.




Hammad Mirza works as a Software Development Engineer at Amazon AI. He works on building and maintaining scalable distributed systems for Amazon Lex. Outside of work, he can be found spending quality time with friends and family.




Goutham Venkatesan works as a Software Development Engineer at Amazon AI. He works on enhancing the Lex customer experience building distributed systems at scale. Outside of work, he can be found traveling to sunny destinations & sipping coconuts on the beach.





The On-Device Machine Learning Behind Recorder

Over the past two decades, Google has made information widely accessible through search — from textual information, photos and videos, to maps and jobs. But much of the world’s information is conveyed through speech. Yet even though many people use audio recording devices to capture important information in conversations, interviews, lectures and more, it can be very difficult to later parse through hours of recordings to identify and extract information of interest. But what if there was the ability to automatically transcribe and tag long recordings in real-time, enabling you to intuitively find the relevant information you need, when you need it?

For this reason, we launched Recorder, a new kind of audio recording app for Pixel phones that leverages recent developments in on-device machine learning (ML) to transcribe conversations, to detect and identify the type of audio recorded (from broad categories like music or speech to particular sounds, such as applause, laughter and whistling), and to index recordings so users can quickly find and extract segments of interest. All of these features run entirely on-device, without the need for an internet connection.

Recorder transcribes speech in real-time using an on-device automatic speech recognition model based on improvements announced earlier this year. Being a key component to many of Recorder’s smart features, we made sure that this model can transcribe long audio recordings (a few hours) reliably, while also indexing conversation by mapping words to timestamps as computed by the speech recognition model. This enables the user to click on a word in the transcription and initiate playback starting from that point in the recording, or to search for a word and jump to the exact point in the recording where it was being said.

Recording Content Visualization via Sound Classification
While presenting a transcript for a recording is useful and allows one to search for specific words, sometimes (especially for very long recordings) it’s more useful to visually search for sections of a recording based on specific moments or sounds. To enable this, Recorder additionally represents audio visually as a colored waveform where each color is associated with a different sound category. This is done by combining research into using CNNs to classify audio sounds (e.g., identifying a dog barking or a musical instrument playing) with previously published datasets for audio event detection to classify apparent sound events in individual audio frames.

Of course, in most situations many sounds can appear at the same time. In order to visualize the audio in a very clear way, we decided to color each waveform bar in a single color that represents the most dominant sound in a given time frame (in our case, 50ms bars). The colorized waveform lets users understand what type of content was captured in a specific recording and navigate along an ever-growing audio library more easily. This brings a visual representation of the audio recordings to the users, and also enables them to search over audio events in their recordings.

Recorder implements a sliding window capability that processes partially overlapping 960ms audio frames at 50ms intervals and outputs a sigmoid scores vector, representing the probability for each supported audio class within the frame. We apply a linearization process on the sigmoid scores in combination with a thresholding mechanism, in order to maximize the system precision and report the correct sound classification. This process of analyzing the content of the 960ms window with small 50ms offsets makes it possible to pinpoint exact start and end times in a manner that is less prone to mistakes than analyzing consecutive large 960ms window slices on their own.

Since the model analyzes each audio frame independently, it can be prone to quick jittering between audio classes. This is solved with an adaptive-size median filtering technique applied to the most recent model audio class outputs, thus providing a smoothed consecutive output. The process runs continuously in real-time, requiring it to meet very strict power consumption limitations.

Suggesting Tags for Titles
Once a recording is done, Recorder suggests three tags that the app deems to represent the most memorable content, enabling the user to quickly compose a meaningful title.

To be able to suggest these tags immediately when the recording ends, Recorder analyzes the content of the recording as it is being transcribed. First, Recorder counts term occurrences as well as their grammatical role in the sentence. The terms identified as entities are capitalized. Then, we utilize an on-device part-of-speech-tagger — a model that labels each word in the sentence according to its grammatical role — to detect common nouns and proper nouns, which appear to be more memorable by users. Recorder utilizes a prior scores table supporting both unigram and bigram terms extraction. To generate the scores, we trained a boosted decision tree with conversational data and utilized textual features like document words frequency and specificity. Last, filtering of stop words and swear words is applied and the top tags are outputted.

Tags extraction pipeline architecture

Recorder galvanized some of our most recent on-device ML research efforts into helpful features, running models on-device to ensure user privacy. The positive feedback loop between machine learning investigations and user needs revealed exciting opportunities to make our software even more useful. We’re excited for future research that will make everyone’s ideas and conversations even more easily accessible and searchable.

Special thanks to Dror Ayalon who played a key role in developing and forming the above features and without whom this blog post wouldn’t have been possible. We would also want to thank all our team members and collaborators who worked on this project with us: Amit Pitaru, Kelsie Van Deman, Isaac Blankensmith, Teo Soares, John Watkinson, Matt Hall, Josh Deitel, Benny Schlesinger, Yoni Tsafir, Michelle Tadmor Ramanovich, Danielle Cohen, Sushant Prakash, Renat Aksitov, Ed West, Max Gubin, Tiantian Zhang, Aaron Cohen, Yunhsuan Sung, Chung-Ching Chang, Nathan Dass, Amin Ahmad, Tiago Camolesi, Guilherme Santos‎, Julio da Silva, Dan Ellis, Qiao Liang, Arun Narayanan‎, Rohit Prabhavalkar, Benyah Shaparenko‎, Alex Salcianu, Mike Tsao, Shenaz Zak, Sherry Lin, James Lemieux, Jason Cho, Thomas Hall‎, Brian Chen, Allen Su, Vincent Peng‎, Richard Chou‎, Henry Liu‎, Edward Chen, Yitong Lin, Tracy Wu, Yvonne Yang‎.

Amazon Textract becomes PCI DSS certified, and retrieves even more data from tables and forms

Amazon Textract automatically extracts text and data from scanned documents, and goes beyond simple optical character recognition (OCR) to also identify the contents of fields and information in tables, without templates, configuration, or machine learning experience required. Customers such as Intuit, PitchBook, Change Healthcare, Alfresco, and more are already using Amazon Textract to automate their document processing workflows so that they can accurately process millions of pages in hours. Additionally, you can create smart search indexes, build automated approval workflows, and better maintain compliance with document archival rules by flagging data that may require redaction.

Today, Amazon Web Services (AWS) announced that Amazon Textract is now PCI DSS certified.  This means that you can now use Amazon Textract for all workloads that require Payment Card Industry Data Security Standard (PCI DSS) information security standard, such as cardholder data (CHD) or sensitive authentication data (SAD). You can also process protected health information (PHI) workloads on Amazon Textract, because it is a HIPAA eligible service. Also starting today, AWS has also launched new quality enhancements so you can retrieve even more data from tables (structured data organized into rigid rows and columns) and forms (structured data represented as key-value pairs and selectable elements such as check boxes and radio buttons).

Amazon Textract now retrieves more data with more accuracy from complex tables that contain split cells and merged cells. Amazon Textract also identifies rows and columns for cells with wrapped text (text present across multiple lines) with more accuracy, even for tables without explicitly drawn borders. Amazon Textract also more accurately retrieves form data from documents that also contain tables on the same page and key-value pairs that are nested within a table. These enhancements build upon an update launched in October 2019 to improve the accuracy of text retrieval, and to more accurately correct the rotation and deformation present in documents with imperfect scans.

Customers using Amazon Textract

PitchBook, MSP Recovery, and Filevine are customers using Amazon Textract, and have shared their experiences with AWS.

PitchBook is the leading provider of data in the private capital markets, specifically VC, PE, and M&A. As a part of that market, a portion of their data comes from surveys, particularly in PDF. PitchBook started using Amazon Textract to improve this part of their research process. “Before using Amazon Textract, this process took hundreds of manual hours going through PDFs and manually entering information as it came in,” says Tyler Martinez, Director of Data Science and Software Engineering at PitchBook. “With Amazon Textract, we have seen gains as high as 60% in our process. We’re hoping to use Amazon Textract in other areas that may improve our data collection processes as well.”

MSP Recovery offers a comprehensive healthcare claims platform to determine primary payment responsibility among multiple insurance carriers. “Amazon Textract is very impressive,” said Franklin Perez, Head of Software Development at MSP Recovery. “We decided to use Amazon Textract to detect different document formats to process information and data properly and efficiently. The feature is designed to have the ability to recognize the various different formats it’s pulling text from, whether this is tables or forms, which is an AI dream come true for us. We needed a solution that would be scalable to various documents, as we receive different document types on a regular basis and need to be efficient at reading them. With a lean team, we are able to allow the machine learning to handle the heavy lifting by automating reading thousands of documents, allowing our team to focus on higher-order assignments.”

Filevine is the operating core for legal professionals, including cloud-based case and matter management, document management, and in-depth reporting analytics. From its launch in 2015, Filevine focused on rapid innovation and award-winning design, and earned the highest ratings from independent review sites. “Millions of matters and case files are handled in Filevine every day,” says Ryan Anderson, Chief Executive Officer at Filevine. “We chose Amazon Web Services because we wanted to deliver best-in-class document search solutions for our customers. Amazon Textract is fast, accurate, and scalable—it helps Filevine meet the exacting requirements of the world’s largest and most sophisticated legal organizations. With Filevine and Amazon, finding the proverbial needle in the haystack has never been easier for legal professionals.”


With the newest improvements to Amazon Textract, you can retrieve more information from the same document, with more accuracy. And Amazon Textract continues to improve; at AWS re:Invent 2019, AWS announced a public preview of Amazon Textract’s integration with the Amazon Augmented Artificial Intelligence service for the forms features. This enables you to apply human validation on your AI inference output from Amazon Textract. Amazon Textract has also increased the file size limit for synchronous APIs to 10 MB. You can also continue to use asynchronous APIs to process files up to 500 MB each. For more information, see the video AWS re:Invent 2019: [REPEAT] AI document processing for business automation On YouTube.

You can get started with Amazon Textract today. Try Amazon Textract with your images or PDF documents and get high-quality results in seconds.

About the Author

Kriti Bharti is the Product Lead for Amazon Textract. Kriti has over 15 years’ experience in Product Management, Program Management, and Technology Management across multiple industries such as Healthcare, Banking and Finance, and Retail. In her spare time, you can find Kriti spending pawsome time with Fifi and her cousins, reading, or learning different dance forms.

As AI Universe Keeps Expanding, NVIDIA CEO Lays Out Plan to Accelerate All of It

With the AI revolution spreading across industries everywhere, NVIDIA founder and CEO Jensen Huang took the stage Wednesday to unveil the latest technology for speeding its mass adoption.

His talk — to more than 6,000 scientists, engineers and entrepreneurs gathered for this week’s GPU Technology Conference in Suzhou, two hours west of Shanghai — touched on advancements in AI deployment, as well as NVIDIA’s work in the automotive, gaming, and healthcare industries.

“We build computers for the Einsteins, Leonardo di Vincis, Michaelngelos of our time,” Huang told the crowd, which overflowed into the aisles. “We build these computers for all of you.”

Huang explained that demand is surging for technology that can accelerate the delivery of AI services of all kinds. And NVIDIA’s deep learning platform — which the company updated Wednesday with new inferencing software — promises to be the fastest, most efficient way to deliver these services.

It’s the latest example of how NVIDIA achieves spectacular speedups by applying a combination of GPUs optimized for parallel computation, work across the entire computing stack, and algorithm and ecosystem expertise in focused vertical markets.

“It is accepted now that GPU accelerated computing is the path forward as Moore’s law has ended,” Huang said.

Real-Time Recommendations: Baidu and Alibaba

The latest challenge for accelerated computing: driving a new generation of powerful systems, known as recommender systems, able to connect individuals with what they’re looking for in a world where the options available to them is spiraling exponentially.

“The era of search has ended: if I put out a trillion, billion million things and they’re changing all the time, how can you find anything,” Huang asked. “The era of search is over. The era of recommendations has arrived.

Baidu — one of the world’s largest search companies – is harnessing NVIDIA technology to power advanced recommendation engines.

“It solves this problem of taking this enormous amount of data, and filtering it through this recommendation system so you only see 10 things,” Huang said.

With GPUs, Baidu can now train the models that power its recommender systems 10x faster, reducing costs, and, over the long term, increasing the accuracy of its models, improving the quality of its recommendations.

Another example such systems’ power: Alibaba, which relies on NVIDIA technology to help power the recommendation engines behind the success of Single’s Day.

This new shopping festival which takes place on Nov. 11 — or 11.11 — generated $38 billion in sales last month. That’s up by nearly a quarter from last year’s $31 billion, and more than double the online sales on Black Friday and Cyber Monday combined.

Helping to drive its success are recommender systems that display items that match user preferences, improving the click-through rate — which is closely watched in the e-commerce industry as a key sales driver. Its systems need to run in real-time and at an incredible scale, something that’s only possible with GPUs.

“Deep learning inference is wonderful for deep recommender systems and these recommender systems will be the engine for the Internet,” Huang said. “Everything we do in the future, everything we do now, passes through a recommender system.”

Real-Time Conversational AI

Huang also announced groundbreaking new inference software enabling smarter, real-time conversational AI.

NVIDIA TensorRT 7 — the seventh generation of the company’s inference software development kit — features a new deep learning compiler designed to automatically optimize and accelerate the increasingly complex recurrent and transformer-based neural networks needed for complex new applications, such as AI speech.

This speeds the components of conversational AI by 10x compared to CPUs, driving latency below the 300-millisecond threshold considered necessary for real-time interactions.

“To have the ability to understand your intention, make recommendations, do searches and queries for you, and summarize what they’ve learned to a text to speech system… that loop is now possible,” Huang said. “It is now possible to achieve very natural, very rich, conversational AI in real time.”

Accelerating Automotive Innovations

Huang also announced NVIDIA will provide the transportation industry with source access to its NVIDIA DRIVE deep neural networks (DNNs) for autonomous vehicle development.

NVIDIA DRIVE has become a de facto standard for AV development, used broadly by automakers, truck manufacturers, robotaxi companies, software companies and universities.

Now, NVIDIA is providing source access of it’s pre-trained AI models and training code to AV developers. Using a suite of NVIDIA AI tools, the ecosystem can freely extend and customize the models to increase the robustness and capabilities of their self-driving systems.

In addition to providing source access to the DNNs, Huang announcing the availability of a suite of advanced tools so developers can customize and enhance NVIDIA’s DNNs, utilizing their own data sets and target feature set. These tools allow the training of DNNs utilizing active learning, federated learning and transfer learning, Huang said.

Haung also announced NVIDIA DRIVE AGX Orin, the world’s highest performance and most advanced system-on-a-chip. It delivers 7x the performance and 3x the efficiency per watt of Xavier, NVIDIA’s previous-generation automotive SoC. Orin — which will be available to be incorporated in customer production runs for 2022 — boasts 17 billion transistors, 12 CPU cores, and is capable of over 200 trillion operations per second.

Orin will be woven into a stack of products — all running a single architecture and compatible with software developed on Xavier — able to scale from simple level 2 autonomy, all the way up to full Level 5 autonomy.

And Huang announced that Didi — the world’s largest ride hailing company — will adopt NVIDIA DRIVE to bring robotaxis and intelligent ride-hailing services to market.

“We believe everything that moves will be autonomous some day,” Huang said. “This is not the work of one company, this is the work of one industry, and we’ve created an open platform so we can all team up together to realize this autonomous future.”

Game On

Adding to NVIDIA’s growing footprint in cloud gaming, Huang announced a collaboration with Tencent Games in cloud gaming.

“We are going to extend the wonderful experience of PC gaming to all the computers that are underpowered today, the opportunity is quite extraordinary,” Huang said. “We can extend PC gaming to the other 800 milliion gamers in the world.”

NVIDIA’s technology will power Tencent Games’ START cloud gaming service, which began testing earlier this year. START gives gamers access to AAA games on underpowered devices anytime, anywhere.

Huang also announced that six leading game developers will join the ranks of game developers around the world who have been using the realtime ray tracing capabilities of NVIDIA’s GeForce RTX to transform the image quality and lighting effects of their upcoming titles

Ray tracing is a graphics rendering technique that brings real-time, cinematic-quality rendering to content creators and game developers. NVIDIA GeForce RTX GPUs contain specialized processor cores designed to accelerate ray tracing so the visual effects in games can be rendered in real time.

The upcoming games include a mix of blockbusters, new franchises, triple-A titles and indie fare — all using real-time ray tracing to bring ultra-realistic lighting models to their gameplay.

They include Boundary, from Surgical Scalpels Studios; Convallarioa, from LoongForce;  F.I.S.T. from  Shanghai TiGames; an unnamed project from Mihyo; Ring of Elysium, from TenCent; and Xuan Yuan Sword VII from Softstar.

Accelerating Medical Advances, 5G

This year, Huang said, NVIDIA has added two major new applications to CUDA – 5G vRAN and genomic processing. With each, NVIDIA’s supported by world leaders in their respective industries – Ericsson in telecommunication and BGI in genomics.

Since the first human genome was sequenced in 2003, the cost of whole genome sequencing has steadily shrunk, far outstripping the pace of Moore’s law. That’s led to an explosion of genomic data, with the total amount of sequence data is doubling every seven months.

“The ability to sequence the human genome in its totality is incredibly powerful,” Huang said.

To put this data to work — and unlock the promise of truly personalized medicine — Huang announced that NVIDIA is working with Beijing Genomics Institute.

BGI is using NVIDIA V100 GPUs and software from Parabricks, an Ann Arbor, Michigan- based startup acquired by NVIDIA earlier this month — to build the highest throughput genome sequencer yet, potentially driving down the cost of genomics-based personalized medicine.

“It took 15 years to sequence the human genome for the first time,” Huang said. “It is now possible to sequence 16 whole genomes per day.”

Huang also announced the availability of the NVIDIA Parabricks Genomic Analysis Toolkit, and its availability on NGC, NVIDIA’s hub for GPU-optimized software for deep learning, machine learning, and high-performance computing.

Accelerated Robotics with NVIDIA Isaac

As the talk wound to a close, Huang announced a new version of NVIDIA’s Isaac software development kit. The Isaac SDK achieves an important milestone in establishing a unified robotic development platform — enabling AI, simulation and manipulation capabilities.

The showstopper: Leonardo, a robotic arm with exquisite articulation created by NVIDIA researchers in Seattle, that not only performed a sophisticated task — recognizing and rearranging four colored cubes — but responded almost tenderly to the actions of the people around it in real time. It purred out a deep squeak, seemingly out of a Steven Spielberg movie.

As the audience watched the robotic arm was able to gently pluck a yellow colored block from Hunag’s hand and set it down. It then went on to rearrange four colored blocks, gently stacking them with fine precision.

The feat was the result of sophisticated simulation and training, that allows the robot to learn in virtual worlds, before being put to work in the real world. “And this is how we’re going to create robots in the future,” Huang said.

Accelerating Everything

Huang finished his talk by by recapping NVIDIA’s sprawling accelerated computing story, one that spans ray tracing, cloud gaming, recommendation systems, real-time conversational AI, 5G, genomics analysis, autonomous vehicle and robotis, and more.

“I want to thank you for your collaboration to make accelerated computing amazing and thank you for coming to GTC,” Huang said.

The post As AI Universe Keeps Expanding, NVIDIA CEO Lays Out Plan to Accelerate All of It appeared first on The Official NVIDIA Blog.

AI, Accelerated Computing Drive Shift to Personalized Healthcare

Genomics is finally poised to go mainstream, with help from deep learning and accelerated-computing technologies from NVIDIA.

Since the first human genome was sequenced in 2003, the cost of whole genome sequencing has steadily shrunk, far faster than suggested by Moore’s law. From sequencing the genomes of newborn babies to conducting national population genomics programs, the field is gaining momentum and getting more personal by the day.

Advances in sequencing technology have led to an explosion of genomic data. The total amount of sequence data is doubling every seven months. This breakneck pace could see genomics in 2025 surpass by 10x the amount of data generated by other big data sources such as astronomy, Twitter and YouTube — hitting the double-digit exabyte range.

New sequencing systems, like the DNBSEQ-T7 from BGI Group, the world’s largest genomics research group, are pushing the technology into broad use. The system generates a whopping 60 genomes per day, equaling 6 terabytes of data.

With advancements in BGI’s flow cell technology and acceleration by a pair of NVIDIA V100 Tensor Core GPUs, DNBSEQ-T7 sequencing is sped up 50x, making it the highest throughput genome sequencer to date.

As costs decline and sequencing times accelerate, more use cases emerge, such as the ability to sequence a newborn in intensive care where every minute counts.

Getting Past the Genome Analysis Bottleneck: GPU-Accelerated GATK

NVIDIA Parabricks GPU-accelerated GATK

The genomics community continues to extract new insights from DNA. Recent breakthroughs include single-cell sequencing to understand mutations at a cellular level, and liquid biopsies that detect and monitor cancer using blood for circulating DNA.

But genomic analysis has traditionally been a computational bottleneck in the sequencing pipeline — one that can be surmounted using GPU acceleration.

To deliver a roadmap of continuing GPU acceleration for key genomic analysis pipelines, the team at Parabricks — an Ann Arbor, Michigan-based developer of GPU software for genomics — is joining NVIDIA’s healthcare team, NVIDIA founder and CEO Jensen Huang shared today onstage at GTC China.

Teaming up with BGI, the Parabricks’ software can analyze a genome in under an hour. Using a server with eight NVIDIA T4 Tensor Core GPUs, BGI showed the throughput could lower the cost of genome sequencing to $2 — less than half the cost of existing systems.

See More, Do More with Smart Medical Devices

New medical devices are being invented across the healthcare industry. United Imaging Healthcare has introduced two industry-first medical devices. The uEXPLORER is the world’s first total body PET-CT scanner. Its pioneering ability to image an individual in one position enables it to carry out fast, continuous tracking of tracer distribution over the entire body.

A full body PET/CT image from uEXPLORER. Courtesy of United Imaging.

The total-body coverage of uEXPLORER can significantly shorten scan time. Scans as brief as 30 seconds provide good image quality, compared to traditional systems requiring over 20 minutes of scan time. uEXPLORER is also setting a new benchmark in tracer dose — imaging at about 1/50 of the regular dose, without compromising image quality.

The FDA-approved system uses 16 NVIDIA V100 Tensor Core GPUs and eight 56 GB/s InfiniBand network links from Mellanox to process movie-like scans that can acquire up to a terabyte of data. The system is already deployed in the U.S. at the University of California, Davis, where scientists helped design the system. It’s also the subject of an article in Nature, as well as videos watched by nearly half a million viewers on YouTube.

United’s other groundbreaking system, the uRT-Linac, is the first instrument to support a full radiation therapy suite, from detection to prevention.

With this instrument, a patient from a remote village can make the long trek to the nearest clinic just once to get diagnostic tests and treatment. The uRT-Linac combines CT imaging, AI processing to assist in treatment planning, and simulation with the radiation therapy delivery system. Using multi-modal technologies and AI, United has changed the nature of delivering cancer treatment.

Further afield, a growing number of smart medical devices are using AI for enhanced signal and image processing, workflow optimizations and data analysis.

And on the horizon are patient monitors that can sense when a patient is in danger and smart endoscopes that can guide surgeons during surgery. It’s no exaggeration to state that, in the future, every sensor in the hospital will have AI-infused capabilities.

Our recently announced NVIDIA Clara AGX developer kit helps address this trend. Clara AGX comprises hardware based on NVIDIA Xavier SoCs and Volta Tensor Core GPUs, along with a Clara AGX software development kit, to enable the proliferation of smart medical devices that make healthcare both smarter and more personal.

Apply for early access to Clara AGX.

The post AI, Accelerated Computing Drive Shift to Personalized Healthcare appeared first on The Official NVIDIA Blog.