Amazon SageMaker Automatic Model Tuning now supports early stopping of training jobs

In June 2018, we launched Amazon SageMaker Automatic Model Tuning, a feature that automatically finds well-performing hyperparameters to train a machine learning model with. Unlike model parameters learned during training, hyperparameters are set before the learning process begins. A typical example of the use of hyperparameters is the learning rate of stochastic gradient procedures. Using default hyperparameters doesn’t always yield the best model performance, and finding well-performing hyperparameters can be a non-trivial and time-consuming task. Using Automatic Model Tuning, Amazon SageMaker will automatically find well-performing hyperparameters and train your model to maximize your objective metric.

The number of possible hyperparameter configurations is exponential in the number of hyperparameters that are being explored. A naive exploration of this search space would require a large number of training jobs and would result in high cost. To overcome this, Amazon SageMaker uses Bayesian optimization, a strategy that efficiently models the performance of different hyperparameters based on a small number of training jobs. This algorithm will, however, at times explore hyperparameter configurations which, by the end of the training, turn out to be significantly worse than previous configurations.

Today, we are adding the early stopping feature to Automatic Model Tuning. By enabling early stopping when you launch a tuning job, Amazon SageMaker tracks objective metrics per training iteration (‘epoch’) for each candidate model. Amazon SageMaker then assesses how likely each candidate is to outperform the previous best model evaluated thus far in your tuning job. With early stopping those models that are unlikely to bring value are terminated before completing all iterations, saving time and reducing cost by up to 28% (depending on your algorithm and dataset). For example, in this blog post we show how to use early stopping with an image classification algorithm using Amazon SageMaker, reducing time and cost by 23%.

You can use early stopping with supported built-in Amazon SageMaker algorithms and with your own algorithms, provided that they emit objective metrics per epoch.

Tuning an image classification model that uses early stopping

To demonstrate how you can leverage early stopping, we’ll build an image classifier using the built-in image classification algorithm and tune the model against the Caltech-256 dataset. We will run two hyperparameter tuning jobs: one without automatic early stopping and one with early stopping enabled, while all the other configurations stay the same. We’ll then compare the results of the two hyperparameter tuning jobs toward the end. You can find the full sample notebook here.

Set up and launch the hyperparameter tuning job without early stopping

We’ll skip the steps for creating a notebook instance, preparing the dataset, and pushing it to Amazon S3. The sample notebook covers these processes, so we won’t go through them here. Instead we’ll start by launching a hyperparameter tuning job.

To create a tuning job, we first need to create a training estimator for the built-in image classification algorithm, and specify a value for every hyperparameter of this algorithm, except for those we plan to tune. To learn more about hyperparameters of the built-in image classification algorithm, you can explore our documentation.

s3_train_data = 's3://{}/{}/'.format(bucket, s3_train_key)
s3_validation_data = 's3://{}/{}/'.format(bucket, s3_validation_key)

s3_input_train = sagemaker.s3_input(s3_data=s3_train_data, content_type='application/x-recordio')
s3_input_validation = sagemaker.s3_input(s3_data=s3_validation_data, content_type='application/x-recordio')

s3_output_key = "image-classification-full-training/output"
s3_output = 's3://{}/{}/'.format(bucket, s3_output_key)

sess = sagemaker.Session()
imageclassification = sagemaker.estimator.Estimator(training_image, 
                                                    role, 
                                                    train_instance_count=1,
                                                    train_instance_type='ml.p3.2xlarge',
                                                    output_path=s3_output, 
                                                    sagemaker_session=sess)

imageclassification.set_hyperparameters(num_layers=18, 
                                        image_shape='3,224,224',
                                        num_classes=257, 
                                        epochs=10, 
                                        top_k='2',
                                        num_training_samples=15420,  
                                        precision_dtype='float32',
                                        augmentation_type='crop')

Now we can create a hyperparameter tuning job with the estimator. We’ll specify the search ranges for the hyperparameters that we want to tune and the number of total training jobs that we want to run.

We selected three hyperparameters that should have the greatest impact on model quality, and thus our objective metric, according to the image classification algorithm tuning guide. You can find the full list of hyperparameters in our documentation. These are the three hyperparameters:

learning_rate: Controls how fast the training algorithm will try to optimize your model.
mini_batch_size: Controls how many data points are used for one gradient update.
optimizer: A choice among ‘sgd’, ‘adam’, ‘rmsprop’ and ‘nag’.

In this case we don’t need to specify the regular expressions for the objective metric because we are using one of the Amazon SageMaker built-in algorithms.

We first launch a hyperparameter tuning job without early stopping, which is turned off by default.

from time import gmtime, strftime 
from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner

tuning_job_name = "imageclassif-job-{}".format(strftime("%d-%H-%M-%S", gmtime()))

hyperparameter_ranges = {'learning_rate': ContinuousParameter(0.00001, 1.0),
                         'mini_batch_size': IntegerParameter(16, 64),
                         'optimizer': CategoricalParameter(['sgd', 'adam', 'rmsprop', 'nag'])}

objective_metric_name = 'validation:accuracy'

tuner = HyperparameterTuner(imageclassification, 
                            objective_metric_name, 
                            hyperparameter_ranges,
                            objective_type='Maximize', 
                            max_jobs=20, 
                            max_parallel_jobs=2)

tuner.fit({'train': s3_input_train, 'validation': s3_input_validation}, 
          job_name=tuning_job_name, include_cls_metadata=False)
tuner.wait()

After the tuning job is finished, we can bring in a table of metrics using HyperparameterTuningJobAnalytics from the Amazon SageMaker Python SDK.

tuner_metrics = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name)
tuner_metrics.dataframe().sort_values(['FinalObjectiveValue'], ascending=False).head(5)

The following table shows the top 5 performing training jobs that were run. You can look at all of the results by running the notebook. From this table, we can see that the best model from this hyperparameter tuning job has validation accuracy of 0.356. Unlike in the notebook, here we provide a screenshot from the Amazon SageMaker console to check the total training time and the job status. We will then compare them later to the tuning results when training job early stopping is enabled.

In the following screenshots from the Amazon SageMaker console, you can see that the total training duration is 2 hours and 48 minutes. Total training duration is defined as the aggregated duration of all training jobs and thus reflects the total cost of the hyperparameter tuning job. You may also notice that the hyperparameter tuning job takes 1 hour and 53 minutes to complete, thanks to parallelization of the training jobs. Also, all 20 training jobs are completed normally, as you can see in the Training job status counter.

Set up and launch the hyperparameter tuning job with early stopping

Next, we’ll launch another hyperparameter tuning job with the same configuration, but this time we’ll enable early stopping of training jobs. Specifically, in the tuning job configuration, we set one extra field ‘early_stopping_type’ to ‘Auto’. It’s worth noting that training job early stopping requires training jobs to emit epoch-wise objective metrics, preferably validation metrics. In this example, since the built-in image classification algorithm already emits the metric ‘validation:accuracy’ for each epoch, we can directly use ‘validation:accuracy’ as the objective metric without any change.

tuning_job_name_es = "imageclassif-job-{}-es".format(strftime("%d-%H-%M-%S", gmtime()))

tuner_es = HyperparameterTuner(imageclassification, 
                               objective_metric_name, 
                               hyperparameter_ranges,
                               objective_type='Maximize', 
                               max_jobs=20, 
                               max_parallel_jobs=2, 
                               early_stopping_type='Auto')

tuner_es.fit({'train': s3_input_train, 'validation': s3_input_validation}, 
             job_name=tuning_job_name_es, include_cls_metadata=False)
tuner_es.wait()

Alternatively, you can launch a tuning job with early stopping from the console by setting Training job early stopping type to Auto (default is Off).

After the hyperparameter tuning job is finished, we can again check the top 5 performing training jobs.

tuner_metrics_es = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name_es)
tuner_metrics_es.dataframe().sort_values(['FinalObjectiveValue'], ascending=False).head(5)

This time, because we have training job early stopping enabled, the best hyperparameter training job has a validation accuracy of 0.353, which is very close to the setting without early stopping. The total training time and training job status can be again checked from the console as shown in the next screenshot.

This time, with training job early stopping, the total training duration is 2 hours and 10 minutes, 38 minutes (23%) shorter than without early stopping and thus 23% cheaper as well. It takes 1 hour and 38 minutes to complete the hyperparameter tuning job, which is 15 minutes faster than the previous hyperparameter tuning job. Meanwhile, 6 training jobs are stopped by early stopping, as you can see in the following list.

df = tuner_metrics_es.dataframe
df[df.TrainingJobStatus == 'Stopped']

It is clear that all the stopped training jobs have very low validation accuracies and they run much shorter than normally completed jobs.

Conclusion

To recap, we demonstrated in this blog post how to use training job early stopping to speed up hyperparameter tuning jobs in Amazon SageMaker. Keep in mind that as the training time for each training job gets longer, the benefit of training job early stopping becomes more significant. However, smaller training jobs won’t benefit as much due to infrastructure overhead. For example, our experiments show that the effect of training job early stopping typically becomes noticeable when the training jobs last longer than 4 minutes.

Training job early stopping of Automatic Model Tuning is now available in all the AWS Regions where Amazon SageMaker is available today. For more information on Amazon SageMaker Automatic Model Tuning, see the Amazon SageMaker documentation.

About the Authors

Huibin Shen is an applied scientist in Amazon AI. He works in the part of the team that launched the Automatic Model Tuning feature in Amazon SageMaker.

Fan Li is a Product Manager of Amazon SageMaker. He used to be a big fan of ballroom dance but now loves whatever his 8-year-old son likes.

Miroslav Miladinovic is a Software Development Manager at Amazon SageMaker.

Blog

Learn About Our Meetup

5000+ Members

MEETUPS