Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Global

AI in the Sky Aids Feet on the Ground Spotting Human Rights Violations

In a traditional human rights investigation, researchers travel to a region, conduct interviews, visit crime scenes, examine court records, and collect hospital or autopsy records.

While that painstaking approach still constitutes a major part of Human Rights Watch’s work, the U.S.-based nonprofit is also exploring new technological methods — including AI — for its investigations, said Fred Abrahams, an associate director.

“It would be irresponsible of us not to do that,” Abrahams said in a talk attended by more than 100 people at last month’s GPU Technology Conference. “We must explore every opportunity we can to get the goods to report on these human rights violations.”

These new tools include remote sensing via satellite and drone data, analytics from public datasets, and investigations using videos and photos posted to social media. Remote sensing is essential in situations where researchers can’t access a conflict zone or closed country — a major issue for the human rights and humanitarian community.

“We can’t document it if we can’t get there,” said Josh Lyons, director of geospatial analysis at Human Rights Watch. “If the people are in hiding or they’re dead, there’s no way to document that case.”

To push this work forward, the nonprofit is partnering with Element AI, a global AI software provider cofounded in 2016 by deep learning pioneer Yoshua Bengio. The company has a team in London focused on building AI for social good.

In addition to using NVIDIA GPUs in Element AI’s data center, Human Rights Watch is using two NVIDIA DGX Stations, provided in 2018 by NVIDIA, to further their efforts.

“The hardware will allow us to make it work,” Abrahams said.

Where There’s Smoke

There are hundreds of satellites orbiting and observing the Earth. Aerial imagery can show geographic features, human settlements and forces like flood and fire. Comparing how a region looks at one moment in time compared to another can be critical for human rights investigations — but the influx of data is too vast for any individual to go through.

At GTC, Lyons shared how Human Rights Watch was able to use thermal data from environmental satellites to begin monitoring the outbreak of ethnic violence in Myanmar in 2017, just hours after the first reports of conflict. Combined with aerial images, the organization was able to detect a pattern of burned Rohingya villages across the region.

This digital evidence helped on-the-ground researchers corroborate the testimony of the Muslim minority community targeted by the authorities. By pinpointing the exact date and time that a village began burning, investigators could better quantify the scale of violence and begin to determine who the perpetrators were.

But it takes an expert eye — or a neural network — to tell the difference between smoke plumes and puffy white clouds.

“Most of the time, it’s my eyes that are doing the analysis,” Lyons said. “The DGX immediately gives us the ability to scale.”

A deployed deep learning model that analyzes satellite or social media data could one day identify potential human rights abuses automatically from text and images and alert Human Rights Watch and humanitarian agencies.

However, though the proliferation of satellites and social media has led to a massive amount of new data for human rights investigators to parse, there’s still little labeled data to train neural networks. Looking at a satellite image of a smoke plume, “I know it’s a crime,” Lyons said. “But how do I tell the computer it’s a crime?”

That’s where Element AI’s expertise in deep learning can help. “By essentially cloning Josh’s visual cortex, we can have a huge impact,” said Julien Cornebise, director of research at Element AI. The company has also partnered with Amnesty International.

Cornebise and his team have also worked with Amnesty International on two projects: one to build neural networks to detect burned villages in Sudan, and another to parse Twitter data to study online abuse against women.

Putting AI to Good Use

Human Rights Watch has been using the DGX Stations for photogrammetry, or converting 2D footage into 3D models, based on data collected from the nonprofit’s drones. The team is also developing and testing deep learning models to parse aerial imagery and social media data.

“We’re data rich and drowning in potential applications,” Lyons said. “The simple challenge is to prioritize.”

Potential uses include AI tools for processing archival footage dating back nearly 50 years, or making handwritten notes from Human Rights Watch investigators easier to translate or search.

These archives, particularly researchers’ notebooks, are “more or less locked in hard copy, paper form,” Lyons said. “Having such a system in place would be quite useful. It would give immeasurable value to future investigations.”

Having powerful deep learning systems onsite is also critical for Human Rights Watch to build AI tools analyzing sensitive datasets. For certain data such as forensic photographs or personal information, the organization is often not authorized to share the information with third parties — or host it on a remote server that falls under a specific geographic area or legal jurisprudence.

Lyons said, “The DGX Station hits that perfect sweet spot of being able to do large, robust data analysis in-house with sensitive data in a way that meets all of our legal and ethical privacy concerns.”

The above satellite image may look like clouds over a coastal community. However, an expert eye, or AI, can tell that the image shows smoke — revealing building fires in five villages in Myanmar’s Maungdaw township on the morning of September 15, 2017. Image courtesy of Human Rights Watch and Planet Labs Inc.

The post AI in the Sky Aids Feet on the Ground Spotting Human Rights Violations appeared first on The Official NVIDIA Blog.

Amazon Comprehend now support KMS encryption

Amazon Comprehend is a fully managed natural language processing (NLP) service that enables text analytics for important workloads. For example, analyzing market research reports for key market indicators or data that contains PII information. Customers that work with highly sensitive, encrypted data can now easily enable Comprehend to work with this encrypted data via an integration with the AWS Key Management Service.

AWS KMS makes it easy for you to create and manage keys and control the use of encryption across a wide range of AWS services and in your applications. AWS KMS is a secure and resilient service that uses FIPS 140-2 validated hardware security modules to protect your keys. AWS KMS is integrated with AWS CloudTrail to provide you with logs of all key usage to help meet your regulatory and compliance needs.

To enable Comprehend to use KMS keys to access data, the feature can be configured via the AWS Management console or the SDK and supports Amazon Comprehend asynchronous training and inference jobs. To get started you first need to create a key in the AWS KMS service.  To learn more about how to create KMS keys, please visit: https://docs.aws.amazon.com/kms/latest/developerguide/create-keys.html

When you are configuring an asynchronous job, you can specify the KMS encryption key the Comprehend should use to access your data in S3. Below is an example of selecting a key with the alias “Comprehend” as part of configuring job details, in the Amazon Comprehend console:

To manage your AWS KMS keys, please visit the AWS KMS management portal or use the KMS SDK.  For more information, please visit: AWS Key Management Service. To learn more about how to configure Comprehend jobs to work with KMS keys, please visit our documentation:


About the author

Nino Bice is a Sr. Product Manager leading product for Amazon Comprehend, AWS’s natural language processing service.

 

 

 

 

 

 

Capturing Special Video Moments with Google Photos

Recording video of memorable moments to share with friends and loved ones has become commonplace. But as anyone with a sizable video library can tell you, it’s a time consuming task to go through all that raw footage searching for the perfect clips to relive or share with family and friends. Google Photos makes this easier by automatically finding magical moments in your videos—like when your child blows out the candle or when your friend jumps into a pool—and creating animations from them that you can easily share with friends and family.

In “Rethinking the Faster R-CNN Architecture for Temporal Action Localization“, we address some of the challenges behind automating this task, which are due to the complexity of identifying and categorizing actions from a highly variable array of input data, by introducing an improved method to identify the exact location within a video where a given action occurs. Our temporal action localization network (TALNet) draws inspiration from advances in region-based object detection methods such as the Faster R-CNN network. TALNet enables identification of moments with large variation in duration, achieving state-of-the-art performance compared to other methods, allowing Google Photos to recommend the best part of a video for you to share with friends and family.

An example of the detected action “blowing out candles”

Identifying Actions for Model Training
The first step in identifying magic moments in videos is to assemble a list of actions that people might wish to highlight. Some examples of actions include “blow out birthday candles”, “strike (bowling)”, “cat wags tail”, etc. We then crowdsourced the annotation of segments within a collection of public videos where these specific actions occurred, in order to create a large training dataset. We asked the raters to find and label all moments, accommodating videos that might have several moments. This final annotated dataset was then used to train our model so that it could identify the desired actions in new, unknown videos.

Comparison to Object Detection
The challenge of recognizing these actions belongs to the field of computer vision known as temporal action localization, which, like the more familiar object detection, falls under the umbrella of visual detection problems. Given a long, untrimmed video as input, temporal action localization aims to identify the start and end times, as well as the action label (like “blowing out candles”), for each action instance in the full video. While object detection aims to produce spatial bounding boxes around an object in a 2D image, temporal action localization aims to produce temporal segments including an action in a 1D sequence of video frames.

Our approach to TALNet is inspired by the faster R-CNN object detection framework for 2D images. So, to understand TALNet, it is useful to first understand faster R-CNN. The figure below demonstrates how the faster R-CNN architecture is used for object detection. The first step is to generate a set of object proposals, regions of the image that can be used for classification. To do this, an input image is first converted into a 2D feature map by a convolutional neural network (CNN). The region proposal network then generates bounding boxes around candidate objects. These boxes are generated at multiple scales in order to capture the large variability in objects’ sizes in natural images. With the object proposals now defined, the subjects in the bounding boxes are then classified by a deep neural network (DNN) into specific objects, such as “person”, “bike”, etc.

Faster R-CNN architecture for object detection

Temporal Action Localization
Temporal action localization is accomplished in a fashion similar to that used by R-CNN. A sequence of input frames from a video are first converted into a sequence of 1D feature maps that encode scene context. This map is passed to a segment proposal network that generates candidate segments, each defined by start and end times. A DNN then applies the representations learned from the training dataset to classify the actions in the proposed video segments (e.g., “slam dunk”, “pass”, etc.). The actions identified in each segment are given weights according to their learned representations, with the top scoring moment selected to share with the user.

Architecture for temporal action localization

Special Considerations for Temporal Action Localization
While temporal action localization can be viewed as the 1D counterpart of the object detection problem, care must be taken to address a number of issues unique to action localization. In particular, we address three specific issues in order to apply the Faster R-CNN approach to the action localization domain, and redesign the architecture to specifically address them.

  1. Actions have much larger variations in durations
    The temporal extent of actions varies dramatically—from a fraction of a second to minutes. For long actions, it is not important to understand each and every frame of the action. Instead, we can get a better handle on the action by skimming quickly through the video, using dilated temporal convolutions. This approach allows TALNet to search the video for temporal patterns, while skipping over alternate frames based on a given dilation rate. Analysing the video with several different rates that are selected automatically according to the anchor segment’s length enables efficient identification of actions as large as the entire video or as short as a second.
  2. The context before and after an action are important
    The moments preceding and following an action instance contain critical information for localization and classification, arguably more so than the spatial context of an object. Therefore, we explicitly encode the temporal context by extending the length of proposal segments on both the left and right by a fixed percentage of the segment’s length in both the proposal generation stage and the classification stage.
  3. Actions require multi-modal input
    Actions are defined by appearance, motion and sometimes even audio information. Therefore, it is important to consider multiple modalities of features for the best results. We use a late fusion scheme for both the proposal generation network and the classification network, in which each modality has a separate proposal generation network whose outputs are combined together to obtain the final set of proposals. These proposals are classified using separate classification networks for each modality, which are then averaged to obtain the final predictions.

TALNet in Action
As a consequence of these improvements, TALNet achieves state-of-the-art performance for both action proposal and action localization tasks on the THUMOS’14 detection benchmark and competitive performance on the ActivityNet challenge. Now, whenever people save videos to Google Photos, our model identifies these moments and creates animations to share. Here are a few examples shared by our initial testers.

An example of the detected action “sliding down a slide”
An example of the detected actions “jump into the pool” (left), “twirl in a dress” (center) and “feed baby a spoonful” (right).

Next steps
We are continuing work to improve the precision and recall of action localization using more data, features and models. Improvements in temporal action localization can drive progress on a large number of important topics ranging from video highlights, video summarization, search and more. We hope to continue improving the state-of-the-art in this domain and at the same time provide more ways for people to reminisce on their memories, big and small.

Acknowledgements
Special thanks Tim Novikoff and Yu-Wei Chao, as well as Bryan Seybold, Lily Kharevych, Siyu Gu, Tracy Gu, Tracy Utley, Yael Marzan, Jingyu Cui, Balakrishnan Varadarajan, Paul Natsev for their critical contributions to this project.

Injecting AI into Healthcare: Medical Innovators Harness NVIDIA Tools for AI-Powered Future

More than 1,500 healthcare experts will converge next week in Boston for the World Medical Innovation Forum to discuss the impact of AI in clinical care, and hear talks by top names in biotech and pharma, U.S. cabinet secretaries and federal agency leaders — and NVIDIA founder and CEO Jensen Huang.

Huang will have a fireside chat with Keith Dreyer, vice chairman of radiology at Massachusetts General Hospital. They’ll be introduced by Cathy Minehan, chairman of the hospital’s board of trustees.

At last year’s event, Huang spoke about the potential of AI to change healthcare, calling data the vital “source code” for companies in the future. This time, he’ll share the latest results of NVIDIA’s innovation in healthcare with partners like Massachusetts General Hospital.

Worldwide, NVIDIA GPUs are powering AI applications to discover potential drug molecules, improve the consistency of mammogram assessments, and detect rare congenital heart defects. And this is just the start.

Across the healthcare industry, AI researchers and innovators rely on NVIDIA’s deep learning and accelerated computing for medicine.

Leading minds in medicine gathered for our GPU Technology Conference in San Jose last month, including attendees from five of the top seven radiology departments in the United States and four of the top five academic medical centers in the country.

Eric Topol, founder and director of the Scripps Research Translational Institute, spoke to a packed audience on the potential for deep learning to help healthcare institutions provide better, faster and cheaper care. NVIDIA and Scripps established in 2018 a center of excellence for AI in genomics and digital sensors.

Through more than 40 healthcare sessions and several booth exhibits, panels and lightning talks, the conference highlighted how AI and GPUs are used in every pillar of healthcare, from medical imaging and genomics to drug discovery and patient care.

And NVIDIA showcased Clara AI, a software toolkit built for radiologists. Containing more than a dozen state-of-the-art classification and segmentation models, Clara AI provides experts with time-saving AI-assisted annotation tools and transfer learning capabilities.

Radiologists, data scientists and developers can now gain access to two software development kits — the Clara Train SDK and Clara Deploy SDK, enabling AI-assisted workflows for medical imaging.

Attendees of WMIF can learn more about the Clara AI toolkit in a demo at the MGH & BWH Center for Clinical Data Science booth.

The post Injecting AI into Healthcare: Medical Innovators Harness NVIDIA Tools for AI-Powered Future appeared first on The Official NVIDIA Blog.

AWS DeepRacer League hits the road for more fun and excitement for developers!

From developer to machine learning developer

The AWS DeepRacer League is the world’s first autonomous racing league open to developers of all skill levels and it kicked off last week in Santa Clara, California. Chris Miller was crowned our first champion of the 2019 season. Chris is the founder of Cloud Brigade, based in Santa Cruz, California, and he came to the AWS Summit specifically to learn more about machine learning.

At AWS, we are committed to putting machine learning in the hands of all developers of all skill levels, making their experiences with machine learning fun and easy. At Santa Clara, our top three finishers all built a model in one of the onsite workshops and had a lot of fun doing it.

Chris Miller achieved a winning lap time of 10.43 seconds, and will now be advancing the finals at re:Invent 2019 where he will race to win the AWS DeepRacer Championship Cup. Before he arrived at the AWS Summit, he had no experience with machine learning.

Chris says, “When I got here today, I had no experience with machine learning, but that’s exactly what I came here to learn and what a great way to learn machine learning.”

Rahul Shah from Fremont, California came in second place. He was pleasantly surprised by how successful his model was and had a lot of fun with AWS DeepRacer. Rahul has been working with machine learning for the past few years, but this was his first time working with reinforcement learning.

“Working on this was easy, and any developer would be able to have success. The DeepRacer event is a really fun and exciting thing to do at the AWS Summit,” Rahul said.

The third-place finisher was Adrian Sarno from San Mateo, California. Adrian is a data scientist and has been actively involved with machine learning for most of his career. Attending the workshop and participating in the league was his first experience with reinforcement learning and he was curious to learn this advanced ML technique. Adrian’s first attempt at building his model was not as successful as he wanted it to be. When he realized what was at stake, he took to his keyboard and retrained his model for 2 hours. Then he returned with a model that scored him a podium finish.

Adrian says, “It’s straightforward to work with the applications that have been put together.”

All of our participants are excited to experiment more and use the coming months to get more advanced models ready to compete at re:Invent 2019. There, they can use their new found skills to help them win the AWS DeepRacer Championship Cup. 

Heading to Paris to reach developers globally

And it doesn’t end there. The AWS DeepRacer League made its first international stop at the AWS Summit in Paris, France yesterday. Paris is fast becoming a hub for learning and research on artificial intelligence. The French government has plans to invest in Paris to help enable the AI ecosystem in France and the rest of Europe. Such an investment can encourage a large community of developers to learn with easy access to the tools they need to become machine learning developers just like Chris, Rahul, and Adrian.

Today, at the AWS Summit in Paris, the AWS DeepRacer League welcomed more developers to learn, build, and train models to compete. The podium was filled with developers who came to the Summit to participate in the league and each of them had spent time on their models at home before arriving. Positions changed throughout the afternoon as they learned more. In a tense final 60 minutes of racing, Arthur Pace from Paris, took home the Paris Summit Champion cup with a lap time of 13.87 seconds. Second place went to “JO” (Wajdi Fathallah), who attended a DeepRacer meet up before the AWS Summit and secured a 15.5 second lap. The third place finisher was Matthieu Rousseau (16.00 seconds). Matthieu worked on his model with fellow engineering student (and Paris Champion) Arthur Pace for the last 2 weeks in order to land on the podium!

The 2019 developer journey continues

On April 10, the AWS DeepRacer League will be at the AWS Summit in Singapore. The Summit there offers an opportunity to get hands-on with AWS DeepRacer. There will be multiple workshops and hours of live racing. You can follow the action live on at www.deepracerleague.com. Coming soon is the AWS DeepRacer Virtual League. Get ready today by taking the digital training course for reinforcement learning and AWS DeepRacer.

Developers, start your engines! Your journey to becoming a machine learning developer begins with the AWS DeepRacer League.


About the Author

Alexandra Bush is a Senior Product Marketing Manager for AWS AI. She is passionate about how technology impacts the world around us and enjoys being able to help make it accessible to all. Out of the office she loves to run, travel and stay active in the outdoors with family and friends.

 

 

 

 

Create high-quality instructions for Amazon SageMaker Ground Truth labeling jobs

Amazon SageMaker Ground Truth helps you quickly build highly accurate training datasets for machine learning (ML). You can use your own workers, a choice of vendor-managed workforces that specialize in data labeling, or a public workforce powered by Amazon Mechanical Turk to provide the human-generated labels. To get high-quality labels, you must provide simple, concise, and clear instructions, especially when using a public workforce. Writing good instructions is the single most important action you can take to improve annotation quality. It’s worth investing the time to do it right.

This blog post shares best practices for creating highly effective instructions for a public workforce. There are two key points: reduce the cognitive load for the workers as much as possible, and experiment early in the process to fine-tune your instructions and save yourself trouble later on. You can experiment by labeling some of your data yourself and by submitting small jobs to the public workforce throughout the process.

The following screenshot shows an example of a Ground Truth bounding box labeling task with good instructions from the worker’s perspective. In this example task, we ask workers to draw boxes around flowers in images taken from the Google Open Images Dataset. The left side of image shows the short instructions that are constantly visible in a sidebar while the worker is annotating. They are clear, to the point, specialized to the task, and focused on example images.

The following figure shows an example of the full instructions that a worker can see by choosing View full instructions in the sidebar. They clarify ambiguities that could confuse the worker. By the end of this post, you’ll be able to create high-quality instructions for your own labeling job.

Our recommended workflow

The quickest way to create good instructions is use the tools provided by Ground Truth to annotate some of your own data. You can then use the results as examples in your instructions. To do this, you should take the following steps:

  1. Select a small number of examples from your data.
  2. Run a private job on Ground Truth to label your chosen examples.
  3. Create the short instructions using your results. Focus on example images and small amounts of text.
  4. Create the full instructions to clarify ambiguities in the task.
  5. Run a small public job to test the instructions. Iterate on the results until you are satisfied.
  6. Consider simplifying your task, and set a reasonable price.

Note: Running the private labeling jobs will cost $0.08 per example. For pricing details, see the Amazon SageMaker Ground Truth pricing page.

After you have produced high-quality instructions, you can send your full labeling job out to the public workforce. Let’s go over each step in the checklist.

Select a small number of examples from your data

Browse your dataset and select examples that capture the variety in your data. Choosing examples from the items you want to label (as opposed to generic examples) ensures the instructions will help annotators understand your specific task.

Here, we select images with different numbers of flowers of various shapes and sizes. The flowers in some of these images are hidden behind others or touch the edge of the frame. Choosing a variety of cases makes it easier to find good examples for creating the instructions. It also gives you insight into the difficulty of the task from the worker’s point of view.

Run a private job on Ground Truth to label your chosen examples

A previous blog post described how to run a labeling job using the AWS Management Console. You should follow the method described there to label the examples you chose from the previous section. You need to add the images you have selected to a manifest file, create a private work team with your own email address, and select one annotator per example. There’s no reason for you to label the same example multiple times.

Running this private job gives you perspective on what you want to accomplish with your labeling job, the difficulty of the task, and the tools the annotators will be using. Make a record of the examples that were difficult or ambiguous as you work. You will need these later to write the full instructions. In addition, you should consider timing yourself to gauge how much to pay the workers for your task.

The left figure shows a preview of the bounding box tool at work. Notice that the instructions on the left side of the image have not yet been created. The right figure demonstrates the results from the private labeling job.

Create the short instructions using your results

After you finish the private labeling job, you can find the results in the Amazon SageMaker console by going to Labeling jobs and selecting the name you gave the job. The annotated examples are at the bottom of the page. For image labeling tasks, the simplest way to extract the results is to zoom in on these annotated images and take screenshots.

Ideally, narrow your results to one or two exemplary “good” instances, then create one or two images with various bad annotations illustrating what you expect to be the most common sources of failure. You can do this by re-running the private labeling job and skipping all the other examples. Alternatively, you can combine examples of good and bad annotations in a single example image to help the workers quickly understand the task. One particularly inventive strategy is to use an animated GIF that alternates between good and bad examples. For the flower labeling instructions, we use the following images for the good and bad examples, respectively:

After you have selected the example instances and extracted the results, use your favorite image editing software (such as Google Drawings, GIMP, Keynote, or PowerPoint) to put the finishing touches on the figures for your instructions. For example, you might consider placing Xs over images representing incorrect annotations.

Upload your images to an Amazon S3 bucket

Upload the images to an Amazon S3 bucket and set the object permissions so that the images are publicly available. If your S3 bucket has the default permissions, you’ll have to first change the public access settings for the bucket to allow the images to be publicly available. We strongly recommend against making the entire bucket publicly accessible. To make it possible for the images to be public, go to the Amazon S3 console, select your bucket, and choose the Permissions tab. You should see something similar to the following image:

Choose Edit, then uncheck the first two boxes. Choose Save.

A confirmation dialog box appears. Type “confirm” in the appropriate field and choose Confirm to update the public access settings.

To finish uploading the image, return to your S3 bucket overview by choosing Overview. Then choose Upload, drag and drop the file into the dialog box, and then choose Upload in the dialog box. Finally, select the image name from the S3 bucket overview and choose Make public to make the image publically accessible from the internet.

If your bucket permissions have been set correctly, a message saying Success appears.

Finally, we recommend returning to the bucket permissons tab and re-checking the first box, Block new public ACLs and uploading public objects. This prevents you from accidentally making a different object public in the future.

Use the instruction-making tool to finish creating the instructions

Finally, go to the instruction-making tool in the job creation section of the Amazon SageMaker console, create your instructions, and link to the images you gathered in your S3 bucket. You can place your images in the short instructions by choosing the image icon in the instructions tool and entering the object URL, which you can find in the S3 bucket overview by selecting the image name.

After you have added the image, you’ll see a thumbnail in the instruction-making tool.

If you instead see a broken image link icon like the one on the right in the preceding figure, double-check that you have correctly set the bucket and object permissions by following the steps in the previous section.

Many workers will only read the short instructions, so make them count. Focus on your example images, with a small amount of explanatory text in simple English. Use short sentences. Remember, the annotators are not always fluent in English, and ambiguous instructions lead to ambiguous results. Your goal is to be as explicit as possible while keeping things simple.

Create the full instructions to clarify ambiguities in the task

After you have finished writing the short instructions, choose Additional instructions in the instruction-making tool to begin working on the full instructions. Here are some points to keep in mind:

  • The full instructions should clarify ambiguities in your task. Often, annotators will only consult these if they are confused. Use your experience from the private job to anticipate sources of confusion.
  • Try not to repeat the short instructions.
  • Catching every edge case at the expense of having pages and pages of instructions is usually a mistake. In our experience, two or three additional good/bad example pairs should suffice, and further instructions yield diminishing returns.

The following figure shows the final instructions for the flower example.

Run a small public labeling job to test the instructions

After you complete the first draft of the instructions, you can create and submit a small public labeling job. Inspect the results, and look for common mistakes that aren’t addressed in the current version of the instructions. Workers often make mistakes that are different from the ones that you anticipate. It’s better to catch these early in the process than to run a large and expensive labeling job twice. You can continue to repeat this process until the results are satisfactory.

Consider simplifying your task and set a reasonable price

If your instructions are still too long, too complex, or are missing difficult examples from your data, think about how to split your task into several simpler ones. You might have noticed this image in our selection of examples:

Asking workers to label images like this for the same price as the other examples is a recipe for failure. In this case, you might first perform an image classification job to estimate the number of flowers in each image. Then, you can go back and subdivide the images with many flowers so no single image is too challenging.

As another example, consider a job that asks workers to label flowers, people, and dogs in each image. In this case you might get better results by launching three jobs, each focused on a single category. You can run these jobs in parallel or one after another and then combine the results.

As the final step in the process of creating the instructions, use your newly gained experience labeling the examples yourself to set a reasonable price for your tasks. The job creation section of the Amazon SageMaker console allows you to choose a payment for each labeled example using a drop-down menu:

You can use your records of the amount of time it took to complete the labeling jobs for the instructions together with the suggestions in the menu to select an appropriate reward.

Conclusion

Instructions specific to your data will always be superior to generic ones. Creating them might be time-consuming, but the workers will appreciate your effort. They want to complete your task as quickly as possible, and making their lives easier will improve your results.

Here are some resources if you would like to learn more about Ground Truth and making instructions for a public workforce:

Disclosure regarding the Open Images Dataset V4

Open Images Dataset V4 is created by Google Inc. In some cases we have modified the images or the accompanying annotations. You can obtain the original images and annotations here. The annotations are licensed by Google Inc. under CC BY 4.0 license. The images are listed as having a CC BY 2.0 license. The following paper describes Open Images V4 in depth: from the data collection and annotation to detailed statistics about the data and evaluation of models trained on it.

A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, T. Duerig, and V. Ferrari. The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982, 2018. (link to PDF)


About the Authors

Tristan McKinney is an applied scientist in the Amazon ML Solutions Lab. He recently completed his PhD in theoretical physics at Caltech where he studied effective field theory and its application to high-T_c superconductors. As his father was in the US Army, he lived all over the place when growing up, including Germany and Albania. In his spare time, Tristan loves to ski and play soccer.

 

 

Krzysztof Chalupka is an applied scientist in the Amazon ML Solutions Lab. He has a PhD in causal inference and computer vision from Caltech. At Amazon, he figures out ways in which computer vision and deep learning can augment human intelligence. His free time is filled with family. He also loves forests, woodworking, and books (trees in all forms).

 

 

 Fedor Zhdanov is a Machine Learning Scientist at Amazon. He works on developing Machine Learning algorithms and tools for our internal and external customers.

Using Deep Learning to Improve Usability on Mobile Devices

Tapping is the most commonly used gesture on mobile interfaces, and is used to trigger all kinds of actions ranging from launching an app to entering text. While the style of clickable elements (e.g., buttons) in traditional desktop graphical user interfaces is often conventionally defined, on mobile interfaces it can still be difficult for people to distinguish tappable versus non-tappable elements due to the diversity of styles. This confusion can lead to false affordances (e.g., a feature that could be mistaken for a button) and a lack of discoverability that can lead to user frustration, uncertainty, and errors. To avoid this, interface designers can conduct a study or a visual affordance test to help clarify the tappability of items in their interfaces. However, such studies are time-consuming and their findings are often limited to a specific app or interface design.

In our CHI’19 paper, “Modeling Mobile Interface Tappability Using Crowdsourcing and Deep Learning“, we introduced an approach for modeling the usability of mobile interfaces at scale. We crowdsourced a task to study UI elements across a range of mobile apps to measure the perceived tappability by a user. Our model predictions were consistent with the user group at the ~90% level, demonstrating that a machine learning model can be effectively used to estimate the perceived tappability of interface elements in their design without the need for expensive and time consuming user testing.

Predicting Tappability with Deep Learning
Designers often use visual properties such as the color or depth of an element to signify its availability for interaction on interfaces, e.g., the blue color and underline of a link. While these common signifiers are useful, it is not always clear when to apply them in each specific design setting. Furthermore, with design trends evolving, traditional signifiers are constantly being altered and challenged, potentially causing user uncertainty and mistakes.

To understand how users perceive this changing landscape, we analyzed the potential signifiers affecting tappability in real mobile apps—element type (e.g., check boxes, text boxes, etc.), location, size, color, and words. We started by crowdsourcing volunteers to label the perceived clickability of ~20,000 unique interface elements from ~3,500 apps. With the exception of text boxes, type signifiers yielded low uncertainty in user perceived tappability. The location signifier refers to the position of a feature on the screen and is informed by the common layout design in mobile apps, as demonstrated in the figure below.

Heatmaps displaying the accuracy of tappable and non-tappable elements by location, where warmer colors represent areas of higher accuracy. Users labeled non-tappable elements more accurately towards the upper center of the interface, and tappable elements towards the bottom center of the interface.

The impact of element size was relatively weak, but did indicate confusion in the case of large non-tappable elements. Users showed a tendency to bright colors and short word counts for tappable elements, though word semantics also played a significant role.

We used these labels to train a simple deep neural network that predicts the likelihood that a user will perceive an interface element as tappable versus non-tappable. For a given element of the interface, the model uses a range of features, including the spatial context of the element on the screen (location), the semantics and functionality of the element (words and type), and the visual appearance (size as well as raw pixels). The neural network model applies a convolutional neural network (CNN) to extract features from raw pixels, and uses learned semantic embeddings to represent text content and element properties. The concatenation of all these features are then fed to a fully-connected network layer, the output of which produces a binary classification of an element’s tappability.

Evaluation of the Model
The model allowed us to automatically diagnose mismatches between the tappability of each interface element as perceived by a user—predicted by our model—and the intended or actual tappable state of the element specified by the developer or designer. In the example below, our model predicts that there is a 73% chance that a user would think the labels such as “Followers” or “Following” are tappable, while these interface elements are in fact not programmed to be tappable.

To understand how our model behaves compared to human users, particularly when there is ambiguity in human perception, we generated a second, independent dataset by crowdsourcing an effort among 290 volunteers to label each of 2,000 unique interface elements with respect to their perceived tappability. Each element was labeled independently by five different users. We found that more than 40% of the elements in our sample were labeled inconsistently by volunteers. Our model matches this uncertainty in human perception quite well, as demonstrated in the figure below.

The scatterplot of the tappability probability predicted by the model (the Y axis) versus the consistency in the human user labels (the X axis) for each element in the consistency dataset.

When users agree an element’s tappability, our model tends to give a more definite answer—a probability close to 1 for tappable and close to 0 for not tappable. When workers are less consistent on an element (towards the middle of the X axis), our model is also less certain about the decision. Overall, our model achieved reasonable accuracy of matching human perception in identifying tappable UI elements with a mean precision of 90.2% and recall of 87.0%.

Predicting tappability is merely one example of what we can do with machine learning to solve usability issues in user interfaces. There are many other challenges in interaction design and user experience research where deep learning models can offer a vehicle to distill large, diverse user experience datasets and advance scientific understandings about interaction behaviors.

Acknowledgements
This research was a joint work of Amanda Swangson, summer intern at Google, and Yang Li, a Research Scientist in Deep Learning and Human Computer Interaction.