Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Global

Google at ACL 2019

This week, Florence, Italy hosts the 2019 Annual Meeting of the Association for Computational Linguistics (ACL 2019), the premier conference in the field of natural language understanding, covering a broad spectrum of research areas that are concerned with computational approaches to natural language.

As a leader in natural language processing and understanding, and a Diamond Level sponsor of ACL 2019, Google will be on hand to showcase the latest research on syntax, semantics, discourse, conversation, multilingual modeling, sentiment analysis, question answering, summarization, and generally building better systems using labeled and unlabeled data.

If you’re attending ACL 2019, we hope that you’ll stop by the Google booth to meet our researchers and discuss projects and opportunities at Google that go into solving interesting problems for billions of people. Our researchers will also be on hand to demo the Natural Questions corpus, the Multilingual Universal Sentence Encoder and more. You can also learn more about the Google research being presented at ACL 2019 below (Google affiliations in blue).

Organizing Committee includes:
Enrique Alfonseca

Accepted Publications
A Joint Named-Entity Recognizer for Heterogeneous Tag-sets Using a Tag Hierarchy
Genady Beryozkin, Yoel Drori, Oren Gilon, Tzvika Hartman, Idan Szpektor

Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study
Chinnadhurai Sankar, Sandeep Subramanian, Chris Pal, Sarath Chandar, Yoshua Bengio

Generating Logical Forms from Graph Representations of Text and Entities
Peter Shaw, Philip Massey, Angelica Chen, Francesco Piccinno, Yasemin Altun

Extracting Symptoms and their Status from Clinical Conversations
Nan Du, Kai Chen, Anjuli Kannan, Linh Trans, Yuhui Chen, Izhak Shafran

Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
Vihan Jain, Gabriel Magalhaes, Alexander Ku, Ashish Vaswani, Eugene Le, Jason Baldridge

Meaning to Form: Measuring Systematicity as Information
Tiago Pimentel, Arya D. McCarthy, Damian Blasi, Brian Roark, Ryan Cotterell

Matching the Blanks: Distributional Similarityfor Relation Learning
Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, Tom Kwiatkowski

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc Le, Ruslan Salakhutdinov

HighRES: Highlight-based Reference-less Evaluation of Summarization
Hardy Hardy, Shashi Narayan, Andreas Vlachos

Zero-Shot Entity Linking by Reading Entity Descriptions
Lajanugen Logeswaran, Ming-Wei Chang, Kristina Toutanova, Kenton Lee, Jacob Devlin, Honglak Lee

Robust Neural Machine Translation with Doubly Adversarial Inputs
Yong Cheng, Lu Jiang, Wolfgang Macherey

Natural Questions: a Benchmark for Question Answering Research
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Matthew Kelcey, Jacob Devlin, Kenton Lee, Kristina N. Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, Slav Petrov

Like a Baby: Visually Situated Neural Language Acquisition
Alexander Ororbia, Ankur Mali, Matthew Kelly, David Reitter

What Kind of Language Is Hard to Language-Model?
Sebastian J. Mielke, Ryan Cotterell, Kyle Gorman, Brian Roark, Jason Eisner

How Multilingual is Multilingual BERT?
Telmo Pires, Eva Schlinger, Dan Garrette

Handling Divergent Reference Texts when Evaluating Table-to-Text Generation
Bhuwan Dhingra, Manaal Faruqui, Ankur Parikh, Ming-Wei Chang, Dipanjan Das, William Cohen

BAM! Born-Again Multi-Task Networks for Natural Language Understanding
Kevin Clark, Minh-Thang Luong, Urvashi Khandelal, Christopher D. Manning, Quoc V. Le

Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning” for Neural Machine Translation
Wei Wang, Isaac Caswell, Ciprian Chelba

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation
Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey, Chung-Cheng Chiu, Semih Yavuz, Ruoming Pang, Wei Li, Colin Raffel

On the Robustness of Self-Attentive Models
Yu-Lun Hsieh, Minhao Cheng, Da-Cheng Juan, Wei Wei, Wen-Lian Hsu, Cho-Jui Hsieh

Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B
Jiaming Luo, Yuan Cao, Regina Barzilay

How Large Are Lions? Inducing Distributions over Quantitative Attributes
Yanai Elazar, Abhijit Mahabal, Deepak Ramachandran, Tania Bedrax-Weiss, Dan Roth

BERT Rediscovers the Classical NLP Pipeline
Ian Tenney, Dipanjan Das, Ellie Pavlick

Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling
Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas Mccoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen, Benjamin Van Durme, Edouard Grave, Ellie Pavlick, Samuel R. Bowman

Robust Zero-Shot Cross-Domain Slot Filling with Example Values
Darsh Shah, Raghav Gupta, Amir Fayazi, Dilek Hakkani-Tur

Latent Retrieval for Weakly Supervised Open Domain Question Answering
Kenton Lee, Ming-Wei Chang, Kristina Toutanova

On-device Structured and Context Partitioned Projection Networks
Sujith Ravi, Zornitsa Kozareva

Incorporating Priors with Feature Attribution on Text Classification
Frederick Liu, Besim Avci

Informative Image Captioning with External Sources of Information
Sanqiang Zhao, Piyush Sharma, Tomer Levinboim, Radu Soricut

Reducing Word Omission Errors in Neural Machine Translation: A Contrastive Learning Approach
Zonghan Yang, Yong Cheng, Yang Liu, Maosong Sun

Synthetic QA Corpora Generation with Roundtrip Consistency
Chris Alberti, Daniel Andor, Emily Pitler, Jacob Devlin, Michael Collins

Unsupervised Paraphrasing without Translation
Aurko Roy, David Grangier

Workshops
Widening NLP 2019
Organizers include: Diyi Yang

NLP for Conversational AI
Organizers include: Thang-Minh Luong, Tania Bedrax-Weiss

The Fourth Arabic Natural Language Processing Workshop
Organizers include: Imed Zitouni

The Third Workshop on Abusive Language Online
Organizers include: Zeerak Waseem

TyP-NLP, Typology for Polyglot NLP
Organizers include: Manaal Faruqui

Gender Bias in Natural Language Processing
Organizers include: Kellie Webster

Tutorials
Wikipedia as a Resource for Text Analysis and Retrieval
Organizer: Marius Pasca

Building, training, and deploying fastai models with Amazon SageMaker

Deep learning is changing the world. However, much of the foundation work, such as building containers, can slow you down. This post describes how you can build, train, and deploy fastai models into Amazon SageMaker training and hosting by using the Amazon SageMaker Python SDK and a PyTorch base image. This helps you avoid the extra steps of building your own container.

Amazon SageMaker is a fully managed machine learning (ML) service. It allows data scientists and developers to quickly build, train, and deploy ML models into production at a low cost. The Amazon SageMaker Python SDK is an open source library for training and hosting ML models. It is easy to use and compatible with popular deep learning frameworks such as TensorFlow, MXNet, PyTorch, and Chainer. AWS recently added the fastai library to the base PyTorch container. This allows you to take advantage of the fastai deep learning model in Amazon SageMaker, instead of providing your own container.

Using modern best practices, the fastai library helps create advanced deep learning models with just a few lines of code. This includes domains like computer vision, natural language processing, structured data, or collaborative filtering.

The organization fast.ai develops and maintains the fastai library, which works with the popular deep learning open source PyTorch package. The organization recently placed well in the DAWNBench Competition. It also provides popular online courses to train developers in using their models, even those with no background or experience in ML.

Set up the environment

To set up a new Amazon SageMaker notebook instance with the fastai library installed, choose Launch Stack:

This AWS CloudFormation template provisions all the AWS resources that you need for this walkthrough. To create the stack, select I acknowledge that AWS CloudFormation might create IAM resources and choose Create stack.

When the stack has been successfully completed, AWS CloudFormation announces the stack’s CREATE_COMPLETE state.

In the Amazon SageMaker console, choose Notebook instances.

You can see a notebook instance labeled fastai, created from the AWS CloudFormation stack. Select Open Juypter to launch a managed Juypter environment. You can write your own Python code here to build the model.

For this post, we provide the IPython notebook under SageMaker Examples, Advanced Functionality, fastai_lesson1_sagemaker_example.ipynb. Create a copy of this notebook to build, train, and deploy the model into Amazon SageMaker.

For demonstration purpose, we use the Oxford IIIT Pet dataset to identify dog breeds.

Because the fastai library builds onto PyTorch, you can use the Amazon SageMaker Python SDK to train your PyTorch model. Training a PyTorch model is a two-step process:

  1. Create a PyTorch training script.
  2. Create a training job in Amazon SageMaker.

Create a PyTorch training script

Define a _train() function where you write your training code. A typical training script loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model to model_dir to host later. For a more detailed example, see the pets.py file in the fastai_oxford_pets GitHub repo.

Here is the basic training code:

print('Creating DataBunch object')
    data = ImageDataBunch.from_name_re(path_img, fnames, pat, 
                                ds_tfms=get_transforms(), 
                                size=args.image_size, 
                                bs=args.batch_size).normalize(imagenet_stats)

    # create the CNN model
    print('Create CNN model from model zoo')
    print(f'Model architecture is {args.model_arch}')
    arch = getattr(models, args.model_arch)
    print("Creating pretrained conv net")    
    learn = create_cnn(data, arch, metrics=error_rate)
    print('Fit for 4 cycles')
    learn.fit_one_cycle(4)    
    learn.unfreeze()
    print('Unfreeze and fit for another 2 cycles')
    learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4))
    print('Finished Training')

In this example, combine the training and inference code in a single script. Make sure to put your training code in a main guard (if __name__==’__main__’:), so that Amazon SageMaker doesn’t inadvertently run it at the wrong point in execution. You can also place pass hyperparameters in your training script, which you can retrieve with an argparse.ArgumentParser. The following code snippet features passing hyperparameters:

if __name__ == '__main__':
    parser = argparse.ArgumentParser()

    parser.add_argument('--workers', type=int, default=2, metavar='W',
                        help='number of data loading workers (default: 2)')
    parser.add_argument('--epochs', type=int, default=2, metavar='E',
                        help='number of total epochs to run (default: 2)')
    parser.add_argument('--batch_size', type=int, default=64, metavar='BS',
                        help='batch size (default: 4)')
    parser.add_argument('--lr', type=float, default=0.001, metavar='LR',
                        help='initial learning rate (default: 0.001)')

Create a training job in Amazon SageMaker

You can train and host PyTorch models on Amazon SageMaker with the help of the PyTorch base image and PyTorch SageMaker Estimators, which handle end-to-end training and hosting.

The entry_point argument takes the training script that contains the _train() function. The train_instance_type argument takes the instance type where the model trains. If you set this parameter as ‘local’, the model trains on Amazon SageMaker notebook instance as a Docker container.

To use local mode training, install Docker Compose, as well as NVIDIA-Docker if training with a GPU. To train the model on Amazon SageMaker, pass the value of a GPU-optimized instance type, such as ml.p3.2xlarge or ml.p2.xlarge.

Amazon SageMaker bases the instance lifetime on your training time. When it completes training, Amazon SageMaker uploads the model artifact to Amazon S3 and terminates the instance.

The training script contains the definition of the model and the serving functions that are required to reconstruct the model back at the endpoint. This doesn’t limit when the code was written or the model was created, as long as the models and code match versioning.

pets_estimator = PyTorch(entry_point='source/pets.py',
                         base_job_name='fastai-pets',
                         role=role,
                         framework_version='1.0.0',
                         train_instance_count=1,
                         train_instance_type='ml.p3.2xlarge')

pets_estimator.fit(data_location)

Deploy the model in Amazon SageMaker

Using the Amazon SageMaker Python SDK, create a PyTorchModel object from the PyTorch estimator. The deploy () method on the model object creates an Amazon SageMaker endpoint, which serves prediction requests in real time.

The Amazon SageMaker endpoint runs an Amazon SageMaker-provided PyTorch model server. It hosts the model that your training script produces after you call fit. This was the model you saved to model_dir.

pets_model=PyTorchModel(model_data=pets_estimator.model_data,
                        name=pets_estimator._current_job_name,
                        role=role,
                        framework_version=pets_estimator.framework_version,
                        entry_point=pets_estimator.entry_point,
                        predictor_cls=ImagePredictor)

pets_predictor = pets_model.deploy(initial_instance_count=1, 
                                     instance_type='ml.c5.large')

Create the inference script

To do the inference, define four functions in your inference script. In this example, we use the same script for training and inference. You define these functions along with a training function called _train():

  • model_fn(model_dir)—Loads the model into the Amazon SageMaker PyTorch model server.
  • input_fn(request_body,request_content_type)—Deserializes the data into an object for predictions.
  • predict_fn(input_object,model)—On deserialized data performs inference against loaded model.
  • output_fn(prediction,content_type)—Serializes predictions according to the response content type.

For a more detailed example, see the pets.py file in the fastai_oxford_pets GitHub repo.

Run your inference using predict () method after the model deploys. This method returns the result of inference against your model.

response = pets_predictor.predict(img_bytes)

Clean up

Make sure that you stop the notebook instance and delete the Amazon SageMaker endpoint to prevent any additional charges.

Conclusion

In this post, we showed you how to use Amazon SageMaker to train and host fastai models using the PyTorch base image and Amazon SageMaker Python SDK. This approach eliminates the bring-your-own-container approach, which helps you to build, train, and deploy a fastai model easily and quickly in Amazon SageMaker.

To get started, login to the AWS Management Console and deploy the AWS CloudFormation template. If you are a new user, you will need to create an account. Also, follow this GitHub link for detail information on how to build the fastai model with Amazon SageMaker.


About the authors:

Amit Mukherjee is a partner solutions architect with AWS. He provides architectural guidance to help partners achieve success in the cloud. He has interest in the deep learning space. In his spare time, he enjoys spending quality time with his family.

 

 

 

 

Matt McClean is a partner solutions architect for AWS. He is a specialist in machine learning and works with technology partners in the EMEA Region, providing guidance on developing solutions using AWS technologies. In his spare time, he is a passionate skier and cyclist.

Machine learning for all developers with edX and Amazon SageMaker

Customers often ask us how to get started when they do not have a deep data science and machine learning (ML) background. At AWS, our goal is to put ML in the hands of every developer and data scientist.

AWS Training and Certification has partnered with edX to help you get started quickly and easily with ML with our interactive course, Amazon SageMaker: Simplifying Machine Learning Application Development

Available exclusively on edX, Amazon SageMaker: Simplifying Machine Learning Application Development is an intermediate-level digital course that provides a baseline understanding of ML and how applications can be built, trained, and deployed using Amazon SageMaker. Amazon SageMaker is a fully managed, modular service that covers the entire ML workflow. It helps you label and prepare your data; choose an algorithm; train the model; tune and optimize it for deployment; make predictions; and act.

This course was developed by AWS experts. It explains the following:

  • Key problems that ML can address and ultimately help solve
  • How to train a model using Amazon SageMaker’s built-in algorithms and a Jupyter Notebook instance
  • How to deploy a model using Amazon SageMaker
  • How to integrate the published SageMaker endpoint with an application

Amazon SageMaker: Simplifying Machine Learning Application Development is recommended for developers of all skills. If you want to take your models from concept to production quickly and easily, or are looking to level up in ML, this course is for you. Before beginning, we recommend that you have at least one year of software development experience. You should also have a basic understanding of AWS services and the AWS Management Console, either through previous experience or the AWS Developer Professional Series.

The course is divided into four weekly lessons, with an estimated two to four hours per week of study time. It features video-based lectures, demonstrations, and hands-on lab exercises. You can set your own pace and deadlines. Everyone can take the weekly quizzes, which are not graded and allow unlimited retries.

This on-demand, 100%-digital course is available now and is offered on a complimentary basis.

Get started today at Amazon SageMaker: Simplifying Machine Learning Application Development.


About the Author

Jennifer Davis is the Senior Manager, Product Marketing, for AWS Training and Certification.

Enabling healthcare access from home: Electronic Caregiver’s AWS-powered virtual caregiver  

When Electronic Caregiver’s founder and CEO, Anthony Dohrmann, started the company a decade ago, he was reacting to a difficult situation faced by 100 million Americans and countless individuals globally: the challenge of managing health treatment for chronic diseases. “Patients are often confused about their care instructions and non-adherence with care plans and medications schedules are estimated to cause 50% of all treatment failures,” he explains.

As such, Electronic Caregiver was designed “to improve the patient experience and to positively engage patients in their personal care plans. We improve communication between providers, families, and caregivers, and expedite a more informed response to the need of the aging and ill. We intend to reduce costly complications, improve health outcomes, and extend lifespans.”

Today, Electronic Caregiver’s solution revolves around Addison, a-state-of-the-art, 3D-animated virtual caregiver. She can engage in two-way conversations and is programmed for a user’s personal needs. Similar to a human in-home caregiver, Addison monitors patients’ activity, reminds them to take medications, collects vitals, and conducts real-time health assessments—all from the safety and comfort of a patient’s home. Whereas a patient would otherwise need to make myriad doctor visits or pay an in-home caregiver, Addison brings health solutions to the user wherever they are.

To power that life-changing magic, Electronic Caregiver relies on AWS in multiple ways. For raw computational power to store patient data in a HIPPA-compliant way, the team uses services including AWS Lambda functions. For the patient-facing experience, Electronic Caregiver has developed an augmented reality (AR) character named Addison using Amazon Sumerian. And for the intelligence behind that character including collecting and analyzing data, AWS IoT Core, AWS IoT Greengrass, and Amazon SageMaker are key to the solution. The architecture that the Electronic Caregiver team has developed is shown in the following diagram.

Bryan Chasko, CTO at Electronic Caregiver, comments, “We saw an opportunity to use the latest sensing, artificial intelligence, and other cloud-based technologies to address unmet customer needs with a fuller-featured solution than traditional alert devices.”

Specifically, Electronic Caregiver provides patients with wearable gadgets (such as a wrist pendant) and monitoring devices (such as a contact-free thermometer and a glucose meter) that are connected to the cloud.

To connect device fleets, AWS IoT Core easily and securely connects devices to the cloud via an MQTT lightweight communication protocol specifically designed to tolerate intermittent connections, minimize the code footprint on devices, and reduce network bandwidth requirements.

Whenever a user completes a health reading, such as checking their temperature or completing a physical therapy exercise, that activity generates data that the devices capture. That data can then be queried to check whether a measured value is in the expected range. If the reading is good, the patient will receive a contextually appropriate positive response from Addison.

For example, in cases where a patient is in physical therapy recovering from an injury, Electronic Caregiver monitors improvement to their range of motion. The built-in gamification rewards the patient with points and even sends gifts to their homes to celebrate improvements. This personalized support reinforces their ongoing commitment to their treatment plan and helps these patients execute their treatments properly; it is pivotal to their long-term recovery.

If a reading is atypical, Electronic Caregiver springs into action to get the patient back on track. From a technical perspective, AWS IoT Greengrass Machine Learning Inference pushes a machine learning model built in Amazon SageMaker directly to the edge device in the user’s home. The patient is asked specific questions to help assess the cause of the anomaly, and then they receive from their device a prediction of the likely reason(s) for this result as well as recommended solutions. These questions and solutions are voiced to the patient with Amazon Lex and Amazon Polly, as well as shared with the patient’s selected stakeholders (such as family members and doctors) so everyone on the individual’s care team is immediately aware.

With this set-up, it is as though the patient has a constant caregiver watching out for them, so they receive the quality of care typical of a full-time facility like a nursing home, but possible from their own house. As a further benefit, even individuals who are not co-located with the patient (such as family members on a different continent) can get real-time updates from across the world.

In addition to the connected wearables, Electronic Caregiver has developed a platform to track patients’ activity and ensure that they are conscious and mobile. If activity is not detected, Electronic Caregiver can summon emergency response, coming to the rescue quickly in the event of a fall or other lapse into unconsciousness.

There is a visual analytics monitoring system that also enables personalized monitored medication reminders. Motion is tracked by the visual analytics system and then the pills are identified with Amazon SageMaker-trained machine learning models. This means that Addison can pinpoint when a user has taken their medication and remind them if they’re late.

Amazon Lex also accepts verbal input from the user, so a patient can simply articulate that they’re taking a medication and the system logs it. This feature makes it feel almost like the caregiver is human. Just as someone would articulate to a housemate that they’d completed their medication routine, they can inform Addison.

“Only 3% of the US population can afford live caregiving,” Dohrmann notes. “We are bringing affordable, effective care alternatives to the world through Addison.”

 


About the Author

Marisa Messina is on the AWS ML marketing team, where her job includes identifying the most innovative AWS-using customers and showcasing their inspiring stories. Prior to AWS, she worked on consumer-facing hardware and then university-facing cloud offerings at Microsoft. Outside of work, she enjoys exploring the Pacific Northwest hiking trails, cooking without recipes, and dancing in the rain.

 

 

 

 

 

Smoothing Out the Bumps: Researchers Aim to Solve Mystery of Turbulence 

Turns out turbulence isn’t just something to concern anxious fliers clasping onto their seats at 30,000 feet.

Apart from jiggling your plane around, turbulence also affects how cars drive, the stability of tall buildings and the amount of energy that can be produced by wind turbines.

While the experience of turbulence is all around us, the mathematics behind this bumpy phenomenon remains a mystery. So much so it’s one of seven Millennium Prize Problems posed by the Clay Mathematics Institute. These problems challenge the field of mathematics to solve some of the “deepest, most difficult problems” of classical physics.

Understanding turbulence is of crucial importance for engineers around the world. And that’s just what a team from Imperial College London, headed by Peter Vincent, Reader and EPSRC Fellow, has set out to do using highly accurate flow simulations on GPU-accelerated supercomputers.

The Physics Behind Turbulence

Turbulent flows are chaotic, containing millions of small vortices — spinning regions of the flow — that interact in incredibly complicated ways.

When designing stable buildings and optimal vehicles, engineers can often ignore the smallest-scale chaotic motions and instead focus on averages of pressure and velocity.

But it turns out that even these average properties are extremely difficult to predict accurately since their behavior is linked to chaotic small-scale motions. This means engineers generally resort to using approximate models.

To improve the accuracy of turbulent flow calculations, Vincent and his team ran thousands of turbulent flow simulations, each requiring billions of calculations to complete, over a period of 12 months. To power these, the team made use of two of Europe’s fastest supercomputers — Piz Daint at CSCS and Wilkes-2 from the University of Cambridge.

These NVIDIA GPU-accelerated systems enabled the team to identify for the first time so-called “eigenmode” solutions of averaged turbulent flow in a channel. This provides fundamental insights into the flow physics, which can be used to develop improved approximate models for use in industry.

“From these calculations, we’ve been able to shed new light on the physics that governs averaged properties of turbulent flow,” explained Vincent. “In particular, they show that the governing equations cannot possess certain symmetries, which are often assumed by existing models.”

With a deeper understanding of the physics behind turbulence, engineers can design the next generation of airplanes, wind turbines, submarines and many other objects to be more stable and secure.

“With the mainstream emergence of unsteady turbulence modeling for wind energy applications, the need for improved models is vital,” stated David Standingford, co-founder and director of Zenotech and an expert in mathematics and fluid dynamics. “The current work from Imperial College London addresses fundamental questions that will enable better industrial simulations in the future.”

 

 

 

Photo credit: Thomas Angus, Imperial College London

The post Smoothing Out the Bumps: Researchers Aim to Solve Mystery of Turbulence  appeared first on The Official NVIDIA Blog.

Speaking of AI: Startup Empowers Indian Language Speakers with Deep Learning

A flood of new smartphone users will come online in the next couple years — and many don’t speak or read a word of English, the internet’s most common language.

To make web adoption smoother for hundreds of millions of these new users, one Bangalore-based startup is building AI speech tools for 10 different languages spoken in India. India will have more than 600 million smartphone owners by 2020, but the country has just 125 million English speakers — most of whom speak it as a second language.

“While internet adoption is increasing in India, there’s still a gap in the market for users who don’t know how to read and write English,”.said Ananth Nagaraj, co-founder of Gnani.ai, a member of the NVIDIA Inception program. “Even if something is written in their own language, it may not necessarily be easy for every user to read. We can empower those customers to interact with voice in their native language.”

India’s linguistic diversity presents a challenge for government agencies and private companies trying to communicate with the country’s 1.37 billion people. The country has 22 major languages and around 100 other languages that each have 10,000 or more speakers.

AI speech engine tools that process multiple languages can facilitate conversation by serving as a voice assistant, fielding customer service calls or conducting voice-based transactions.

Gnani.ai provides APIs and voice assistant solutions to e-commerce enterprises, insurance companies, banking and finance firms. Developed using cloud-based NVIDIA GPUs, its tools support languages spoken across the entire subcontinent: Indian English, Hindi, Bengali, Gujarati, Kannada, Malayalam, Marathi, Punjabi, Tamil and Telugu.

Now AI’s Speaking My Language 

Although the linguistic makeup of online content has shifted from 80 percent English in the 1990s to just over 25 percent English today, there’s still a dearth of user-friendly interfaces for Indian language speakers.

Even Indians who speak English as a second language often prefer to consume online content in their native language. But keyboards on computers and mobile devices largely default to the QWERTY keyboard layout, making it slower to type in Indian scripts like Devanagari, used for several languages including Hindi — which is spoken by half a billion people.

Local governments in India have to publish every communication in English and the official language of a given state. Gnani.ai’s voice-to-text tools could speed up this process by up to 4x, Nagaraj said.

The startup’s voice assistant software can integrate with a business’ mobile apps and websites, or be used as an interactive voicebot on customer service telephone lines.

Gnani.ai has collected more than 50,000 hours of annotated audio data to build its AI models. The startup develops its algorithms on NVIDIA V100 Tensor Core GPUs on Amazon Web Services, accelerating the training process up to 20x compared to using CPUs.

The company chose cloud-based GPUs because they were easier to spin up multiple clusters at once for large-scale data training, Nagaraj said. Gnani.AI uses CUDA matrix libraries and NVIDIA’s AMP feature for TensorFlow designed to  speed up neural network training up to 3x.

Starting the Conversation 

Nagaraj said the team believes that AI voice assistants can make the customer support experience more efficient and personalized. With multilingual bots, enterprises can provide personalized service experiences for customers with AI — and allow human agents to devote more time to complex queries from callers.

Bank clients incorporating Gnani.ai’s software could allow the automated system to help customers access their account statements or freeze a credit card, while passing more detailed processes on to staffers. The voice assistant could even reach out to insurance customers in their preferred language to coordinate policy payments, help elderly clients book taxis or provide farmers with pricing information for their crops.

As a Bangalore-based company, Nagaraj said, “we have a significantly higher accuracy compared to some of the global providers because we understand the nuances of the languages and dialects of a diverse country. That helps us tune our AI algorithms to perform better for this market.”

Since its founding in 2016, Gnani.ai has piloted or deployed voice assistant solutions with more than 20 large enterprises in India. The company — which recently received funding from Samsung’s investment arm — plans to expand its call center automation AI tools to other countries, including the United States, in 2020.

The post Speaking of AI: Startup Empowers Indian Language Speakers with Deep Learning appeared first on The Official NVIDIA Blog.

Creating custom labeling jobs with AWS Lambda and Amazon SageMaker Ground Truth  

Amazon SageMaker Ground Truth helps you build highly accurate training datasets for machine learning. It offers easy access to public and private human labelers, and provides them with built-in workflows and interfaces for common labeling tasks. Ground Truth can lower your labeling costs by up to 70% using automatic labeling. It works by training Ground Truth from human-labeled data, so that the service learns to label data independently.

In addition to built-in workflows, Ground Truth gives you the option to upload custom workflows. A custom workflow consists of an HTML interface that provides the human labelers with all of the instructions and required tools for completing the labeling tasks. You also create pre– and post-processing AWS Lambda functions:

  • The pre-processing Lambda function helps customize input to the HTML interface.
  • The post-processing Lambda function helps to process the data. For example, one of its primary uses is to host an accuracy improvement algorithm to tell Ground Truth how it should assess the quality of human-provided labels.

An algorithm is used to find consensus on what is “right” when the same data is provided to multiple human labelers. It also identifies and de-emphasizes those labelers who tend to provide poor quality data. You can upload the HTML interface and the pre- and post-processing Lambda functions using the Amazon SageMaker console.

To integrate successfully with the HTML interface, the pre– and post-processing Lambda functions should adhere to the input/output specifications laid out in the Creating Custom Labeling Workflows. Setting up all the moving pieces and getting them to talk to each other successfully may take a few iterations.

In this post, I walk you through the process of setting up a custom workflow with a custom HTML template and sample Lambda functions. The sample Lambda functions can be found in the AWS Serverless Application Repository. These Lambda functions can be easily deployed to your AWS account and directly modified in the AWS Lambda Console. The source code is available in the aws-sagemaker-ground-truth-recipe GitHub repo.

For this post, you create a custom labeling job for instance segmentation. But first, deploy Lambda functions from the AWS Serverless Application Repository to your AWS account.

Import Lambda functions

On the Serverless Application Repository home page, select “Available applications” on the left-hand menu and search for Ground Truth. Choose aws-sagemaker-ground-truth-recipe.

On the application’s details page, choose Deploy. Make sure that the user has permissions to create IAM roles. If the user does not have permissions, this deployment fails.

It may take a few minutes to deploy this application. Wait until you see the status screen, which shows that four AWS resources (two Lambda functions and two IAM roles) have been created.

Now, you have successfully imported the Lambda functions used in the labeling job into your account. To modify these Lambda functions, select them and tweak the Python code.

Create a custom labeling job

Assume that there are millions of images taken from cameras mounted in cars driving the public roadways. These images are stored in an Amazon S3 bucket location called s3://mybucket/datasets/streetscenes/. To start a labeling job for instance segmentation, you first create a manifest to be fed to Ground Truth.

The following code example shows the sample contents of a manifest file with a set of images. For more information, see Input Data.

{"source-ref": "S3 location for image 1"} 
{"source-ref": "S3 location for image 2"} 
   ... 
{"source-ref": "S3 location for image n"} 

Step 1: Download the example dataset

If you already have a manifest file for instance segmentation, skip this section.

For this example, I use the CBCL StreetScenes dataset. This dataset has over 3000 images, but I use a selection of just 10 images. The full dataset is approximately 2 GB. You can choose to upload all of the images to S3 for labeling or just a selection of them.

  • Download the zip file and extract to a folder. By default, the folder is Output.
  • Create a small sample dataset with which to work:
    mkdir streetscenes
    cp Original/SSDB00010.JPG ./streetscenes/
    cp Original/SSDB00017.JPG ./streetscenes/
    cp Original/SSDB00019.JPG ./streetscenes/
    cp Original/SSDB00025.JPG ./streetscenes/
    cp Original/SSDB00038.JPG ./streetscenes/
    cp Original/SSDB00016.JPG ./streetscenes/
    cp Original/SSDB00018.JPG ./streetscenes/
    cp Original/SSDB00021.JPG ./streetscenes/
    cp Original/SSDB00029.JPG ./streetscenes/
    cp Original/SSDB00039.JPG ./streetscenes/

In the S3 console, create the /streetscenes folder in your bucket.  S3 is a key-value store, so there is no concept of folders. However, the S3 console gives you a sense of folder structure by using forward slashes in the key. You use the console to create the key.

Upload the following files to your S3 bucket, s3://mybucket/datasets/streetscenes/. You can use the S3 console or the following AWS CLI command:

aws s3 sync streetscenes/ s3://gt-recipe-demo/datasets/streetscenes/
upload: streetscenes/SSDB00010.JPG to s3://gt-recipe-demo/datasets/streetscenes/SSDB00010.JPG
upload: streetscenes/SSDB00017.JPG to s3://gt-recipe-demo/datasets/streetscenes/SSDB00017.JPG
upload: streetscenes/SSDB00018.JPG to s3://gt-recipe-demo/datasets/streetscenes/SSDB00018.JPG
upload: streetscenes/SSDB00021.JPG to s3://gt-recipe-demo/datasets/streetscenes/SSDB00021.JPG
upload: streetscenes/SSDB00025.JPG to s3://gt-recipe-demo/datasets/streetscenes/SSDB00025.JPG
upload: streetscenes/SSDB00038.JPG to s3://gt-recipe-demo/datasets/streetscenes/SSDB00038.JPG
upload: streetscenes/SSDB00029.JPG to s3://gt-recipe-demo/datasets/streetscenes/SSDB00029.JPG
upload: streetscenes/SSDB00016.JPG to s3://gt-recipe-demo/datasets/streetscenes/SSDB00016.JPG
upload: streetscenes/SSDB00039.JPG to s3://gt-recipe-demo/datasets/streetscenes/SSDB00039.JPG
upload: streetscenes/SSDB00019.JPG to s3://gt-recipe-demo/datasets/streetscenes/SSDB00019.JPG

Step 2: Create an input manifest

If you already have a manifest file for instance segmentation, skip this section.

In the Amazon SageMaker console, start the process by creating a labeling job.

Under input dataset location, choose Create manifest file. This tool helps you create the manifest by crawling an S3 location containing raw data (images or text).

For images, the crawler takes an input s3Prefix and crawls all of the image files with extensions .jpg, .jpeg, and .png in that prefix. It then creates a manifest with each line as follows:

{"source-ref":"<s3-location-of-crawled-image>"}

The Create manifest file link opens a modal window. Enter the S3 path to which you uploaded the images files, and make sure to include the trailing slash. Next, choose Create. When the creation process is completed, choose Use this manifest. It takes a few seconds to create the manifest.

In this example, the objects are images in S3, so you can use the crawling tool to create the initial manifest. Each line of JSON contains a field called source-ref pointing to the s3Uri value of an image. The contents of the created manifest file should look as follows:

{"source-ref":"s3://gt-recipe-demo/datasets/streetscenes/SSDB00010.JPG"}
{"source-ref":"s3://gt-recipe-demo/datasets/streetscenes/SSDB00016.JPG"}
{"source-ref":"s3://gt-recipe-demo/datasets/streetscenes/SSDB00017.JPG"}
{"source-ref":"s3://gt-recipe-demo/datasets/streetscenes/SSDB00018.JPG"}
{"source-ref":"s3://gt-recipe-demo/datasets/streetscenes/SSDB00019.JPG"}
{"source-ref":"s3://gt-recipe-demo/datasets/streetscenes/SSDB00021.JPG"}
{"source-ref":"s3://gt-recipe-demo/datasets/streetscenes/SSDB00025.JPG"}
{"source-ref":"s3://gt-recipe-demo/datasets/streetscenes/SSDB00029.JPG"}
{"source-ref":"s3://gt-recipe-demo/datasets/streetscenes/SSDB00038.JPG"}
{"source-ref":"s3://gt-recipe-demo/datasets/streetscenes/SSDB00039.JPG"}

Step 3: Create a custom labeling job

Configure the following job settings:

Select the custom task type and choose Next.

For Workers, choose Private. For more information about the different workforce options, see Managing Your Workforce.

There are a number of labeling UI templates that you can use for setting up your own custom workflows. In this case, use the instance segmentation UI. For Templates, choose Instance Segmentation.

Modify the HTML code to look like the following. In the original template, you had three placeholders: src, header, and labels. I changed the header and labels fields. When tasks are created for workers using this template, Ground Truth provides the data to fill in the src placeholder field.

<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>

<crowd-form>
  <crowd-instance-segmentation
    name="annotatedResult"
    src="{{ task.input.taskObject | grant_read_access }}"
    header="Draw a polygon around all people in the image."
    labels="['Person']"
  >
    <full-instructions header="Segmentation Instructions">
      <ol>
          <li><strong>Read</strong> the task carefully and inspect the image.</li>
          <li><strong>Read</strong> the options and review the examples provided to understand more about the labels.</li>
          <li><strong>Choose</strong> the appropriate label that best suits the image.</li>
      </ol>
    </full-instructions>

    <short-instructions>
      <p>Use the tools to label all instances of the requested items in the image</p>
    </short-instructions>
  </crowd-instance-segmentation>
</crowd-form>

Next, for pre– and post-processing task Lambda function fields, select the Lambda functions that you imported earlier.

Under Custom labeling task setup, choose Preview. Remember to allow pop-ups before attempting to preview the UI. If the page loads successfully without errors, you know that the pre-processing task Lambda function and the custom HTML template are working well together.

Step 4: Give execute permissions to the Amazon SageMaker role

In the previous step, while creating a Ground Truth labeling job, you created an IAM role. Ground Truth uses this IAM role to execute your labeling job. This role should trust the execution role of the post-processing Lambda function.

In the Lambda console, select the Lambda function that you previously imported. At the top of the page or under the Tags section, note the Amazon Resource Name (ARN). It should look like the following:

arn:aws:lambda:us-east-1:919226420625:function:serverlessrepo-aws-sagema-GtRecipeAnnotationConsol-xxxxxxx

Choose Execution role, Use an existing role, and view the role.

Copy the IAM role ARN.

In the IAM console, find the Amazon SageMaker execution role that you created. Choose Trust relationshipsEdit trust relationship.  Add the copied Lambda execution role to the trust relationship. The following code example shows the contents of the trust relationship.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::<your-aws-account>:role/serverlessrepo-aws-sagema-GtRecipeAnnotationConsol-xxxxx"
        ],
        "Service": [
          "lambda.amazonaws.com",
          "sagemaker.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Choose PermissionsAttach policies. Select AWSLambdaFullAccess, and choose Attach Policy. After attaching the policy, your Permissions tab should look like the following screenshot:

Close the current tab.

Step 5: Submit the labeling job

In the main browser tab, which has your labeling job open, choose Submit. The labeling job is in progress. Wait for this job to complete.

If you have workers assigned to your private work team, instruct them to work on your tasks. If you have added yourself as a worker, complete the tasks in the private work team portal. For more information, see Managing a Private Workforce.

Labeling results

After the workers perform the labeling work, your output manifest looks like the following:

{"source-ref":"s3://gt-recipe-demo/dataset/streetscenes/SSDB00010.JPG","gt-label":{"annotationsFromAllWorkers":[{"workerId":"public.us-east-1.M52TVGWFFYHS34QM3EHF3NMTKY","annotationData":{"content":"{"annotatedResult":{"inputImageProperties":{"height":960,"width":1280},"instances":[],"labeledImage":{"pngImageData":"iVBORw0KGgoAAAANSUhEUgAABQAAAAPAAQMAAABQEkY6AAAAAXNSR0IB2cksfwAAAANQTFRFAAAAp3o92gAAAAF0Uk5TAEDm2GYAAACsSURBVHic7cExAQAAAMKg9U/tbwagAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAN1veAAH/xLK9AAAAAElFTkSuQmCC"}}}"}},{"workerId":"public.us-east-1.JZUQXKHROY2DBQ2V2ZLEWEYCAE","annotationData":{"content":"{"annotatedResult":{"inputImageProperties":{"height":960,"width":1280},"instances":[{"color":"#2ca02c","label":"Person"}],"labeledImage":{"pngImageData":"iVBORw0KGgoAAAANSUhEUgAABQAAAAPAAQMAAABQEkY6AAAAAXNSR0IB2cksfwAAAAZQTFRFAAAALKAsCO/WQQAAAAJ0Uk5TAP9bkSK1AAABeklEQVR4nO3awWnEQAxAURsffHQJLsWlOaVtKVNCjjmEzGYbGMNqQTK8V8FHJyE0TQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC8ac8OuLL/ZBdcOPojO2FMYNQpMKh8YBcYJDBo7r1lNwyVD1xuEPid3TD0H1h731pvEPib3TC09f6X3TBUPnDvvWc3DAmMOqoHlp/gK/ArO2JEYJTAqO0OgY/siBGBUQKj1jsEtuyIEYFRAqNegaWPMwKjbhHYsiNGBEYJjBIYJTBqqR44Vw+cBEad1QOP6oF79cBNYNBaPbD800L5t49JYNRZPfCoHrhVD5yrB05nyy64sLXsggtLyy640rIDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPicJ/PI71EA2TUjAAAAAElFTkSuQmCC"}}}"}},{"workerId":"public.us-east-1.H7WTTAHGRYWYXEA7V7E7KOMOCE","annotationData":{"content":"{"annotatedResult":{"inputImageProperties":{"height":960,"width":1280},"instances":[],"labeledImage":{"pngImageData":"iVBORw0KGgoAAAANSUhEUgAABQAAAAPAAQMAAABQEkY6AAAAAXNSR0IB2cksfwAAAANQTFRFAAAAp3o92gAAAAF0Uk5TAEDm2GYAAACsSURBVHic7cExAQAAAMKg9U/tbwagAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAN1veAAH/xLK9AAAAAElFTkSuQmCC"}}}"}}]},"gt-label-17-metadata":{"type":"groundtruth/custom","job-name":"gt-label-17","human-annotated":"yes","creation-date":"2019-04-01T05:21:34+0000"}}
{"source-ref":"s3://gt-recipe-demo/dataset/streetscenes/SSDB00016.JPG","gt-label":{"annotationsFromAllWorkers":[{"workerId":"public.us-east-1.H7WTTAHGRYWYXEA7V7E7KOMOCE","annotationData":{"content":"{"annotatedResult":{"inputImageProperties":{"height":960,"width":1280},"instances":[{"color":"#2ca02c","label":"Person"},{"color":"#1f77b4","label":"Person"}],"labeledImage":{"pngImageData":"iVBORw0KGgoAAAANSUhEUgAABQAAAAPAAgMAAAAXsjzqAAAAAXNSR0IB2cksfwAAAAlQTFRFAAAALKAsH3e00sbfdAAAAAN0Uk5TAP//RFDWIQAABI1JREFUeJzt3E2O4kgUhVGwhNTyqCdswqtgCTUo74elMGx5lV1JklWGWXP10u2IcxYQL/QJ/2CEDwcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaMhpnufL1pvYMwFDHwF/bL2JPRMw9BHw59ab2DMBQx8B5603sWcChu4BL1vvYscEDN0Duo95n4Che8CG72PG6gGtB5yuxQPuAdu9jxmWW/GExgOOyz/FEz4DXoqnbGZaluIJjQdcluVaO+EzYKv3McOvgLfaEZ8BWz0J/jqCq0+Cj4CX2ilbWT7UjngEbPNOcLgHvJbOeARs8yM43gPeSmd8BWzyMjLdA9aeBL8CtngZ+TyCi0+C57ndj+D4CHgtnHH86tfiZWR6BLwVzvh9BLd4GfmOgLOAkWPTAZf6gCcBM7OAkWPTAYf6gOc+AtZ9FZkFjBzbDjiWBzwLmJkFjJwaDzhVBzwLmOkmYNUDwdYDLgJmBAwJmBkEzNQHnAXMNB5wFDCzCnitmSBgqPGAk4AZAUMChpbqgEcBMx0FvJUMaDzgIGBGwJCAoVHATH3Ak4AZAUMdBaz5YbjxgIOAIQFDAoYmATPlAc+NBxwFzKzvY0oGtB5w8AkMVX8T6SjgtWT95gNOtUdwRwFvNes3H3CsPYJffxZuL+Dvy3DR+t0ErPqXQ/MBD7WnwH4CXouWbz/gVHoK7CbgN/1Xs8WAY+kR/PqzcIMBh9IjuJeA3/R//yYDHipvYroJeK1avIeAU+EpsIuAY+U7YzoJeCtbvIeAQ+XbK19/Fm4zYOELaHsIePiuVwe2G/Bat3YXAafCtbsI+Ffh2l0ErCRgSMDQWcCMgCEBQwKGBAwJGBIw9NpPwP/q9SN42XpDu9P6a5DrNf4q+HoChgQMCRgSMCRgSMCQgCEBQwKGBAwJGBIwJGBIwJCAIQFDAoYEDAkYEjAkYOg54Na72SEBQwKGBAwJGBIw9BTw59a72SEBQwKGBAydBMw8Bfyx9W52SMCQgCEBQwKGngJett7NDgkYEjAkYEjAkIChp4Bbb2aPBAwJGBIwJGBoHdDz1DcIGBIwJGBIwNA6oOepbxAwJGBIwJCAIQFD64CXrTezRwKGBAwJGBIwtA649V526SxgRsCQgCEBQ6uAnqe+Q8CQgCEBQwKGBAytAnqe+g4BQwKGBAwJGBIwtAp42XovuyRgSMCQgCEBQ6uAW29lnwQMCRgSMCRgSMDQn4AeSL9FwJCAIQFDAob+BPQ89S0ChmYBMwKGBIyd9Usd9Yv9vfUGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAID/t38Buw1HOPtmWvcAAAAASUVORK5CYIIu003d"}}}"}},{"workerId":"public.us-east-1.M52TVGWFFYHS34QM3EHF3NMTKY","annotationData":{"content":"{"annotatedResult":{"inputImageProperties":{"height":960,"width":1280},"instances":[],"labeledImage":{"pngImageData":"iVBORw0KGgoAAAANSUhEUgAABQAAAAPAAQMAAABQEkY6AAAAAXNSR0IB2cksfwAAAANQTFRFAAAAp3o92gAAAAF0Uk5TAEDm2GYAAACsSURBVHic7cExAQAAAMKg9U/tbwagAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAN1veAAH/xLK9AAAAAElFTkSuQmCC"}}}"}},{"workerId":"public.us-east-1.QUERNWRJ6HNNO5D6XDUVPMWTMA","annotationData":{"content":"{"annotatedResult":{"inputImageProperties":{"height":960,"width":1280},"instances":[{"color":"#2ca02c","label":"Person"},{"color":"#1f77b4","label":"Person"}],"labeledImage":{"pngImageData":"iVBORw0KGgoAAAANSUhEUgAABQAAAAPAAgMAAAAXsjzqAAAAAXNSR0IB2cksfwAAAAlQTFRFAAAALKAsH3e00sbfdAAAAAN0Uk5TAP//RFDWIQAABJZJREFUeJzt3EFuIjsUhtFUJKQW89oEq2AJb/BqPyyFYYtVdpMOCcms+9etUl2fswDb+oRlMAUvLwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEB3y3nrFezbImBiWgRM3PsJmBAwJGDoHvC/rRexZwKGBAwJGBIwJGBIwNA94P9bL2LPBAwJGBIwNAuYETAkYOgecNl6EXvWPuDrtXb89gGP19rx2we8/awdv3vA4+1WO0H3gDcBI6+/A15KZ5h7fy13+h3wWjpD64D31191wKVxwONbv+JjWMDMJGDmIGCmdcDTCgHfDuGuX8utEXAZIGDpR5E/AZt+FFkh4PQe8Fw4x3ZWCHh4D9jzUv+2XsCee3iFgPMjYMtjZIWAj3499/Aj4KVuio+AHY+R1/qA02fAhnt4hYCHz4ANj5GVA57LZtnKCgHnp4D9jpEVAi5L55fg8RHwWjbFl4DtjpFTecCp9yuwPuChd8CbgJHX+oCzgJmldcCPQ7juTr93wFN5wEnAzKF3wJuAkdf6gLOAmaV1wM9DuOxOv3fAU/krcBIw8+0M6XYlfRMwUx9w7h2wfgsLGGoe8Chgpv6NdPOA9Z+Fxwl4rZmge8CTgJmjgBkBQ+VfKnUPWP5oR/uAL+sG7Pd01scmvtYM3z/gYxNfakYfIOCLgKE/m7ho8BECvm3ilb7VFPBvDRPwWjT2MkrAS9HYAoaGCVg19igBV3q8UsC/NkrAa9XY3wK2+53Nm2PhD70EDI0SsGzsQQLW/W+RgKFBAl7Lxh4k4KVsbAFDgwSsG1vA0BgBC/99UcDQGAGvdWN/C3ium2lDhQGb/1Dp3fFSNrSAoUEC1g0tYGiMgD/qhhYwNEbAQgKGBAwJGBIwJGDoe8Ct17M7AoYEDAkYEjAkYEjAkICh5n+dVU/AkC0cEjAlYGj+0q/nz0RKCRg6CJiZBMwImBIwNAuYETD05Rju+YRvrUnAjIAhAUMChgQMCRgSMCRg6EvA89ar2SEBUwKGBAwJGBIwJGBIwJCAoeeAW69llwQMCRgSMCRgSMCQgCEBQ0/9PNnxLwQMCRgSMCRgSMCQgCEBQ08BPZjwLwQMCRgSMCRg6Cngeeu17JKAIQFDAoaeAm69lH0SMCRgSMCQy5iQgCEBQwKGBAy5jAkJGBIwJGDIZUxIwJCAIQFDLmNCAoYEDAkYEjDkMiYkYEjAkIAhlzEhAUMChgQMCRhymxUSMCRgSMCQgCEBQwKGBAwJGBIwJGBIwNBHwK0XslcChgQMCRg6CBiaBcxMAoYOAoYEDB0EDM0CAgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAjOIXhsjo6mjj6oAAAAAASUVORK5CYIIu003d"}}}"}}]},"gt-label-17-metadata":{"type":"groundtruth/custom","job-name":"gt-label-17","human-annotated":"yes","creation-date":"2019-04-01T05:21:34+0000"}}

Expand one JSON line to view the annotations. You can see that three workers worked on the image and produced annotations:

  • source-ref: The location of the image.
  • workerId: The ID of the worker to whom the subsequent annotationData In this case, you can see three workerIds, which means three workers annotated this image.
  • annotationData: The annotation result.
  • gt-label-17-metadata: The metadata associated with the labeling job of which this image was a part.
{  
   "source-ref":"s3://gt-recipe-demo/dataset/streetscenes/SSDB00010.JPG",
   "gt-label-17":{  
      "annotationsFromAllWorkers":[  
         {  
            "workerId":"public.us-east-1.M52TVGWFFYHS34QM3EHF3NMTKY",
            "annotationData":{  
               "content":"{"annotatedResult":{"inputImageProperties":{"height":960,"width":1280},"instances":[],"labeledImage":{"pngImageData":"iVBORw0KGgoAAAANSUhEUgAABQAAAAPAAQMAAABQEkY6AAAAAXNSR0IB2cksfwAAAANQTFRFAAAAp3o92gAAAAF0Uk5TAEDm2GYAAACsSURBVHic7cExAQAAAMKg9U/tbwagAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAN1veAAH/xLK9AAAAAElFTkSuQmCC"}}}"
            }
         },
         {  
            "workerId":"public.us-east-1.JZUQXKHROY2DBQ2V2ZLEWEYCAE",
            "annotationData":{  
               "content":"{"annotatedResult":{"inputImageProperties":{"height":960,"width":1280},"instances":[{"color":"#2ca02c","label":"Person"}],"labeledImage":{"pngImageData":"iVBORw0KGgoAAAANSUhEUgAABQAAAAPAAQMAAABQEkY6AAAAAXNSR0IB2cksfwAAAAZQTFRFAAAALKAsCO/WQQAAAAJ0Uk5TAP9bkSK1AAABeklEQVR4nO3awWnEQAxAURsffHQJLsWlOaVtKVNCjjmEzGYbGMNqQTK8V8FHJyE0TQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC8ac8OuLL/ZBdcOPojO2FMYNQpMKh8YBcYJDBo7r1lNwyVD1xuEPid3TD0H1h731pvEPib3TC09f6X3TBUPnDvvWc3DAmMOqoHlp/gK/ArO2JEYJTAqO0OgY/siBGBUQKj1jsEtuyIEYFRAqNegaWPMwKjbhHYsiNGBEYJjBIYJTBqqR44Vw+cBEad1QOP6oF79cBNYNBaPbD800L5t49JYNRZPfCoHrhVD5yrB05nyy64sLXsggtLyy640rIDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPicJ/PI71EA2TUjAAAAAElFTkSuQmCC"}}}"
            }
         },
         {  
            "workerId":"public.us-east-1.H7WTTAHGRYWYXEA7V7E7KOMOCE",
            "annotationData":{  
               "content":"{"annotatedResult":{"inputImageProperties":{"height":960,"width":1280},"instances":[],"labeledImage":{"pngImageData":"iVBORw0KGgoAAAANSUhEUgAABQAAAAPAAQMAAABQEkY6AAAAAXNSR0IB2cksfwAAAANQTFRFAAAAp3o92gAAAAF0Uk5TAEDm2GYAAACsSURBVHic7cExAQAAAMKg9U/tbwagAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAN1veAAH/xLK9AAAAAElFTkSuQmCC"}}}"
            }
         }
      ]
   },
   "gt-label-17-metadata":{  
      "type":"groundtruth/custom",
      "job-name":"gt-label",
      "human-annotated":"yes",
      "creation-date":"2019-04-01T05:21:34+0000"
   }
}

Cleanup

In order to avoid incurring future charges:

  1. Make sure that your labeling job is marked as “Complete,” “Stopped,” or “Failed” in the Amazon SageMaker console.
  2. Delete the corresponding S3 bucket “mybucket” in Amazon S3.
  3. Delete the “serverlessrepo-aws-sagemaker-ground-truth-recipe” stack from Amazon CloudFormation console.

Conclusion

In this post, I started by deploying the pre– and post-processing Lambda functions from a Ground Truth app, using the AWS Serverless Application Repository. I then created a custom labeling job and configured it to use the imported Lambda functions.

These sample Lambda functions help you get a custom labeling job running quickly. You can add or modify them with your own logic, using the AWS Lambda Console.

Visit your AWS Management Console to get started!


About the Authors

Anjan Dash is a Software Development Engineer in AWS AI where he builds large scale distributed systems to solve complex machine learning problems. He is primarily focused on innovating technologies that can ‘Divide and Conquer’ Big Data problem. In his spare time, he loves spending time with family in outdoors activities.

 

 

 

Revekka Kostoeva is a Software Developer Engineer intern at Amazon AI where she works on customer facing and internal solutions to expand the breadth of Sagemaker Ground Truth services. As a researcher, she is driven to improve the tools of the trade to drive innovation forward.