Tracking the throughput of your private labeling team through Amazon SageMaker Ground Truth
Launched at AWS re:Invent 2018, Amazon SageMaker Ground Truth helps you quickly build highly accurate training datasets for your machine learning models. Amazon SageMaker Ground Truth offers easy access to public and private human labelers, and provides them with built-in workflows and interfaces for common labeling tasks. Additionally, Amazon SageMaker Ground Truth can lower your labeling costs by up to 70% using automatic labeling, which works by training Ground Truth from data labeled by humans so that the service learns to label data independently.
When using your own private workers to perform data labeling, you want to measure and track their throughput and efficiency. Amazon SageMaker Ground Truth now logs worker events (for example, when a labeler starts and submits a task) to Amazon CloudWatch. In addition, you can also use the built-in metrics feature of CloudWatch to measure and track throughput across a work team or for individual workers. In this blog post, we cover how to use the raw worker event logs and built-in metrics in your AWS account.
How to use worker activity logs
Once you set up a private team of workers and run a labeling job with Amazon SageMaker Ground Truth, worker activity logs are automatically emitted to CloudWatch. To learn how to set up a private team and kick off your first labeling job, reference this getting started blog post. Note: If you have previously created a private work team, you need to create a new private work team to set up the trust permissions between work teams and CloudWatch. Realize, you do not have to use that private work team, and this is simply a one-time setup step.
To view the logs, visit the CloudWatch console and click on Logs in the left-hand panel. Here, you should see a log group named /aws/sagemaker/groundtruth/WorkerActivity.
This Log Group contains logs for each task a worker accepts during an Amazon SageMaker Ground Truth labeling job, and we have included an example log below. You see the worker’s Amazon Cognito sub ID in the “cognito_sub_id” field. We will demonstrate how to tie this back to worker’s identity through Amazon Cognito. In addition, you see the Amazon Resource Name (ARN) for the Amazon SageMaker Ground Truth labeling job in the “workflow_arn”. This log also contains timestamps for when the worker begins the task (“task_accepted_time”) and when the worker either returns or submits the task (“task_returned_time” or “task_submitted_time”).
Learn more about using CloudWatch Logs from the developer documentation.
How to use worker activity metrics
You can also use the CloudWatch metrics capability to generate your own interesting statistics or graphs about the throughput of your private workers. You can begin by navigating to the Metrics tab and then the AWS/SageMaker/Workteam namespace.
Say you want to find the average amount of time workers spent on tasks for a specific labeling job. You would select the LabelingJob, Workteam option.
From here, you can calculate your own statistics. In the example below, we calculate the average time spent per submitted task for a specific labeling job. There were 14 tasks submitted that took a total of 2.28 minutes or, on average, 9.78 seconds per task.
Learn more about using CloudWatch metrics from the developer documentation.
How to link Amazon Cognito sub ID to worker information
You can link the outputted Amazon Cognito sub ID to identifiable worker information, such as user name. To do so, you can write a quick script using the Amazon Cognito ListUsers API. Alternatively, you can use the Amazon Cognito console by following these steps:
- Navigate to Manage User Pools in the AWS Region where you are running your labeling jobs.
- Select the sagemaker-ground-userpool (if you integrated your own Amazon Cognito user pool with Amazon SageMaker Ground Truth, select that user pool).
- From the left-hand panel, click Users and groups to see all of the users in your user pool.
- Click on any users to see their respective sub ID.
Conclusion
In this post, I introduced how to measure and track the throughput of your private labeling team using CloudWatch Logs and metrics. In addition, I walked through how to link the outputted worker ID to identifiable worker information, such as a user name. Visit the AWS Management Console to get started.
As always, AWS welcomes feedback. Please submit comments or questions below.
About the Authors
Vikram Madan is the Product Manager for Amazon SageMaker Ground Truth. He focusing on delivering products that make it easier to build machine learning solutions. In his spare time, he enjoys running long distances and watching documentaries.
Pranav Sachdeva is a Software Development Engineer in AWS AI. He is passionate about building high performance distributed systems to solve real life problems. He is currently focused on innovating and building capabilities in the AWS AI ecosystem that allow customers to give AI the much needed human aspect.