Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Amazon

Managing multi-topic conversation flows with Amazon Lex Session API checkpoints

In daily conversations, you often jump back and forth between multiple topics. For example, when discussing a home improvement project related to new windows and curtains, you might have questions like, “How about closing out on curtain styles and then revisiting colors?” When AWS launched Amazon Lex Session API, you learned how to address such digressions in the conversation. You can use Session API actions to switch an intent and continue the conversation. But in everyday interactions, you might have to deal with multiple digressions: “Let’s finish selecting windows before we get to curtains.”

How do you design conversation flows that contain a series of digressions? If you are like me, you’d have a dozen questions before even considering a specific product in a home improvement project.

With session checkpoints, you can easily design a conversation to support a switch to one of many topics. You can model the home improvement conversation as two intents: OrderWindows and OrderCurtains.  Now it is easy to switch topics. The flows for OrderWindows would have a checkpoint. If the user is ordering curtains but wants to complete selecting windows first, you could move the conversation back to the OrderWindows using “windowSelection” checkpoint.

Managing session checkpoints

The Amazon Lex runtime API provides operations that enable you to manage session checkpoints for a conversation. The PutSession and GetSession calls enable you to define and retrieve checkpoints.  Here’s how you can use the APIs to manage the conversation flows described earlier. Please review the bot schema for bot details.

Follow these steps to manage the conversation flow:

  1. Store the current state of the conversation
  2. Retrieve the previously stored state and continue the conversation

Store the current state of the conversation

Call the GetSession API with no filters to retrieve the current state of the conversation between your bot and the user. The GetSession API call is followed by a PutSession API call, which applies a checkpoint ‘windowSelection’ onto the OrderWindows intent. The PutSession call is shown in the code example:

PutSession Request:  Applying 'windowSelection' checkpoint on 'OrderWindows' intent

response = client.put_session (
	botName='HomeImprovementBot',
	botAlias='Prod',
	userId='abc1234',
	recentIntentSummaryView=[
	  {
	    "intentName": "OrderCurtains",
	    "slots": {
	      "curtainSize": "None",
	      "curtainStyle": "None"
	    },
	    "confirmationStatus": "None",
	    "dialogActionType": "ElicitSlot",
	    "slotToElicit": "curtainSize",
	    "checkpointLabel": "None"
	  },
	  {
	    "intentName": "OrderWindows",
	    "slots": {
	      "windowSize": "large",
	      "windowStyle": "None"
	    },
	    "confirmationStatus": "None",
	    "dialogActionType": "ElicitSlot",
	    "slotToElicit": "windowStyle",
	    "checkpointLabel": "windowSelection"
	  }
	]
)

Retrieve the previously stored state

At this point, the OrderCurtains intent has completed. Issue a GetSession API call, while passing a ‘windowSelection’ checkpointLabelFilter. This call results with the matching intent (OrderWindows), which received the checkpoint label in the previous step.

Continue with the conversation

Finally, issue a PutSession API call, setting the next step in the conversation to be continued where the user left off in OrderWindows. The following code example lists the details for GetSession:


GetSession Request:  Filtering on 'windowSelection' CheckpointLabel

--- GetSession Request with filter: ---
 
response = client.get_session(
	botName='HomeImprovementBot',
	botAlias='Prod',
	userId='abc123',
	checkpointLabelFilter='windowSelection'
)

--- Filtered GetSession Response: --- 
{
  "recentIntentSummaryView": [
    {
      "intentName": "OrderWindows",
      "slots": {
        "windowSize": "large",
        "windowStyle": "None"
      },
      "confirmationStatus": "None",
      "dialogActionType": "ElicitSlot",
      "slotToElicit": "windowStyle",
      "checkpointLabel": "windowSelection"
    }
  ],
  "sessionAttributes": {},
  "sessionId": "XXX",
  "dialogAction": {
    "type": "ElicitSlot",
    "intentName": "OrderCurtains",
    "slots": {
      "curtainSize": "None",
      "curtainStyle": "None"
    },
    "slotToElicit": "curtainSize"
  }
}

Getting started with Session API checkpoints

In this post, you learned how to use Session API checkpoints to manage multiple digressions. You can define Session API checkpoints using the AWS SDK. You can download the bot schema for the conversation in this post to implement a quick application. For more information, see the Amazon Lex documentation.


About the Author

Shahab Shekari works as a Software Development Engineer at Amazon AI. He works on scalable distributed systems and enhancing Lex user experiences. Outside of work, he can be found traveling and enjoying the Pacific Northwest with his dogs, friends and family.

 

 

Verifying and adjusting your data labels to create higher quality training datasets with Amazon SageMaker Ground Truth

Building a highly accurate training dataset for your machine learning (ML) algorithm is an iterative process. It is common to review and continuously adjust your labels until you are satisfied that the labels accurately represent the ground truth, or what is directly observable in the real world. ML practitioners often built custom systems to review and update data labels because accurately labeled data is critical to ML model quality. If there are issues with the labels, the ML model can’t effectively learn the ground truth, which leads to inaccurate predictions.

One way that ML practitioners have improved the accuracy of their labeled data is through using audit workflows. Audit workflows enable a group of reviewers to verify the accuracy of labels (a process called label verification) or adjust them (a process called label adjustment) if needed.

Amazon SageMaker Ground Truth now features built-in workflows for label verification, and label adjustment for bounding boxes and semantic segmentation. With these new workflows, you can chain an existing Amazon SageMaker Ground Truth labeling job to a verification or adjustment job, or you can import your existing labels for a verification or adjustment job.

This post walks you through both options for bounding boxes labels. The walkthrough assumes that you are familiar with running a labeling job or have existing labels. For more information, see Amazon SageMarker Ground Truth – Build Highly Accurate Datasets and Reduce Labeling Costs by up to 70%.

Chaining a completed Amazon SageMaker Ground Truth labeling job

To chain a completed labeling job, complete the following steps.

  1. From the Amazon SageMaker Ground Truth console, choose Labeling jobs.
  2. Select your desired job.
  3. From the Actions drop-down menu, choose Chain.

The following screenshot shows the Labeling jobs page:

For more information, see Chaining labeling jobs.

The Job Overview page carries forward the configurations you used for your chained job. If there are no changes, you can move to the next section Task Type.

Configuring label verification

To use label verification, from Task type, choose Label verification.

See the following screenshot of the Task type page:

The Workers section is preconfigured to the selections you made for the chained labeling job. You can opt to choose a different workforce or stick with the same configurations for your label verification job. For more information, see Managing Your Workforce.

You can define your verification labels, for example, Label Correct, Label Incorrect – Object(s) Missed, and Label Incorrect – Box(es) Not Tightly Drawn.

You can also specify the instructions in the left-hand panel to guide reviewers on how to verify the labels.

See the following screenshot of the Label verification tool page:

Configuring label adjustment

To perform label adjustment, from the Task type section, choose Bounding box. See the following screenshot of the Task type page:

The following steps for configuring the Workers section and setting up the labeling tool are similar to creating a verification job. The one exception is that you must opt into displaying existing labels in the Existing-labels display options section. See the following screenshot:

Uploading your existing labels from outside Amazon SageMaker Ground Truth

If you labeled your data outside of Amazon SageMaker Ground Truth, you can still use the service to verify or adjust your labels. Import your existing labels by following these steps.

  1. Create an augmented manifest with both your data and existing labels.For example, in the following example code, the source-ref points to the images that were labeled, and the “bound-box” attribute is the label.
    {"source-ref": "<S3 location of image 1>", "bound-box": <bounding box label>}
    {"source-ref": "<S3 location of image 2>", "bound-box": <bounding box label>}

  2. Save your augmented manifest in Amazon S3.You should save the manifest in the same S3 bucket as your images. Also, remember the attribute name of your labels (in this post, bound-box) because you need to point to this when you set up your jobs.Additionally, make sure that the labels conform to the label format prescribed by Amazon SageMaker Ground Truth. For example, you can see the label format for bounding boxes in Bounding Box Job Output.You are now ready to create verification and adjustment jobs.
  3. From the Amazon SageMaker Ground Truth console, create a new labeling job.
  4. In Job overview, for Input dataset location, point to the S3 path of the augmented manifest that you created.See the following screenshot of the Job overview page:
  5. Follow the steps previously outlined to configure Task Type, Workers, and the labeling tool when setting up your verification or adjustment job.
  6. In Existing-labels display option, for Label attribute name, select the name of your augmented manifest from the drop-down menu.See the following screenshot of Existing-labels display options:

Conclusion

A highly accurate training dataset is critical for achieving your ML initiatives, and you now have built-in workflows to perform label verification and adjustment through Amazon SageMaker Ground Truth. This post walked you through how to use the new label verification and adjustment features. You can chain a completed labeling job, or you can upload labels. Visit the Amazon SageMaker Ground Truth console to get started.

As always, AWS welcomes feedback. Please submit any comments or questions.


About the Authors

Sifan Wang is a Software Development Engineer for AWS AI. His focus is on building scalable systems to process big data and intelligent systems to learn from the data. In his spare time, he enjoys traveling and hitting the gym.

 

 

 

Carter Williams is a Web Development Engineer on the Mechanical Turk Requester CX team with a focus in Computer Vision UIs. He strives to learn and develop new ways to gather accurate annotation data in intuitive ways using web technologies. In his free time, he enjoys paintball, hockey, and snowboarding.

 

 

 

Vikram Madan is the Product Manager for Amazon SageMaker Ground Truth. He focusing on delivering products that make it easier to build machine learning solutions. In his spare time, he enjoys running long distances and watching documentaries.

 

 

Amazon Textract is now HIPAA eligible

Today, Amazon Web Services (AWS) announced that Amazon Textract, a machine learning service that quickly and easily extracts text and data from scanned documents, is now eligible for healthcare workloads that require HIPAA certification. This launch builds upon the existing portfolio of AWS artificial intelligence services that are HIPAA-eligible, including Amazon Translate, Amazon Comprehend, Amazon Transcribe, Amazon Polly, Amazon SageMaker and Amazon Rekognition – that help deliver better healthcare outcomes.

Healthcare providers routinely extract text and data from documents such as medical records and forms through manual data entry or simple optical character recognition (OCR) software. This is a time-consuming and often inaccurate process that produces outputs requiring extensive post-processing before it can be used by other applications. What organizations want instead is the ability to accurately identify and extract text and data from forms and tables in documents of any format and from a variety of file types and templates.

Amazon Textract analyzes virtually any type of document, automatically generating highly accurate text, form, and table data. Amazon Textract identifies text and data from tables and forms in documents – such as patient information from an insurance claim or values from a table in a scanned medical chart – and recognizes a range of document formats, including those specific to healthcare and insurance, without requiring any customization or human intervention. Amazon Textract makes it easy for customers to accurately process millions of document pages in a matter of hours, significantly lowering document processing costs, and allowing customers to focus on deriving business value from their text and data instead of wasting time and effort on post-processing. Results are delivered via an API that can be easily accessed and used without requiring any machine learning experience.

Starting today, Amazon Textract is now a HIPAA-eligible service, which means healthcare customers can take full advantage of it. Many healthcare customers like Cerner, Fred Hutchinson Cancer Research Center, and The American Heart Association, are already exploring new ways use the power of ML to automate their current workloads and transform how they provide care to patients, all while meeting the security and privacy requirements required by HIPAA.

Change Healthcare is a leading independent healthcare technology company that provides data and analytics-driven solutions to improve clinical, financial, and patient engagement outcomes in the U.S. healthcare system. “At Change Healthcare, we believe that we can make healthcare affordable and accessible to all by improving the timeliness and quality of financial and administrative decisions.  This can be achieved by the power of machine learning technology to understand more from our data. But unlocking the potential of this information can often be difficult as it’s siloed in tables and forms that traditional optical character recognition hasn’t been able to analyze,” said Nick Giannasi, EVP and Chief AI Officer at Change Healthcare. “Amazon Textract further advances document understanding with the ability to retrieve structured data in addition to text, and now with the service becoming HIPAA eligible, we’ll be able to liberate the information from millions of documents and create even more value for patients, payers, and providers.”

Cambia Health Solutions is a total health solutions company and the parent company of six regional health plans, including Regence, an insurer serving 2.6 million members in Oregon, Idaho, Utah, and Washington. Cambia is transforming the health care system to be more economically sustainable and efficient for people and their families. “Over the past 100 years Cambia has been dedicated to improving health care for people and their families. To help us achieve that goal, we’re always evaluating new innovations and opportunities to optimize care coordination. One area of focus is streamlining administrative processes that are time and labor intensive. We’re excited to explore Amazon Textract to help us automate the process of extracting valuable data from paper forms accurately and efficiently. The powerful combination of data science, A.I., and a person-focused approach is key to our mission of transforming the health care system” said Faraz Shafiq, Cambia Health Solutions Chief Artificial Intelligence Officer.

ClearDATA is a HITRUST certified AWS Managed Service Provider trusted by customers across the globe to safeguard their sensitive data and power their critical applications. Matt Ferrari, Chief Technology Officer at ClearDATA, says “It’s exciting to see AWS add their optical character recognition service powered by machine learning, Amazon Textract, to their list of HIPAA eligible services. A lot of medical data that is shared among payers and providers is locked in image-based files like PDFs. Instead of manually processing that kind of data, healthcare organizations can now use Amazon Textract service to extract medical data from files that previously have been non-machine readable. This brings an opportunity to integrate this data with their electronic health records, or other cloud technologies like Amazon Comprehend Medical that can identify protected health information in the dataset.This is just another step forward in increasing the opportunity to use these emerging technologies to improve access to data, get better insights, lower costs, and improve patient and member experiences”. ClearDATA offers solutions and services that protect healthcare organizations from data privacy risks, improves their data management, and scales their healthcare IT infrastructure, along with one of the most comprehensive Business Associate Agreements in the healthcare industry.

For additional information on Amazon Machine Learning services and how healthcare and life sciences companies can run HIPAA-eligible workloads on AWS please reference the following materials:

To get started with Amazon Textract, you can click the “Get Started with Amazon Textract”, button on the Amazon Textract page. You must have an Amazon Web Services account; if you do not already have one, you will be prompted to create one during the process. Once you are signed in to your AWS account, try out Amazon Textract with your own images or PDF documents using the Amazon Textract Management Console. You can also download the Amazon Textract SDKs to start creating your own applications. Please refer to our step-by-step Getting Started Guide for more information.


About the author

Kriti Bharti is the Product Lead for Amazon Textract. Kriti has over 15 years’ experience in Product Management, Program Management, and Technology Management across multiple industries such as Healthcare, Banking and Finance, and Retail. In her time at AWS, she has helped launch a number of new services including AWS IoT Device Management and AWS IoT Device Defender. In her spare time, you can find Kriti having a pawsome time with Fifi and her cousins, reading, or learning different dance forms.

Managing conversation flow with a fallback intent on Amazon Lex

Ever been stumped by a question? Imagine you’re in a business review going over weekly numbers and someone asks, “What about expenses?” Your response might be, “I don’t know. I wasn’t prepared to have that discussion right now.”

Bots aren’t fortunate enough to have the same comprehension capabilities, so how should they respond when they don’t have an answer? How can a bot recover when it doesn’t have the response? Asking you to repeat yourself could be quite frustrating if the bot still doesn’t understand. Perhaps it can pretend to understand what you said based on the last exchange? That might not always work and could also sound foolish. Maybe the bot can admit its limitations and tell you what it can do? That would be acceptable the first few times but can be suboptimal in the long run.

There is no single correct way. Conversation repair strategies vary by the kind of experience you’re trying to create. You can use error handling prompts. The bot would try to clarify by prompting “Sorry, can you please say that again?” a few times before hanging up with a message such as, “I am not able to assist you at this time.”  Building on the sample conversation above, let us first build a simple chatbot to answer questions related to revenue numbers. This bot answers questions such as “What’s the revenue in Q1?”, “What were our sales in western region?” The Lex bot contains only two intents: RegionDetails and QuarterDetails. With this bot definition, if someone were to discuss expenses (“How much did we spend last quarter?”), the bot would go through the clarification prompts and eventually hang up. You couldn’t intervene or execute business logic. The conversation would resemble the following:

Starting today, you can add fallback intent to help your bot recover gracefully in such situations. With a fallback intent, you can now control the bot’s recovery by providing additional information, managing dialog, or executing business logic. You can control the conversation better and manage the flow for an ideal outcome, such as the following:

Configuring the fallback intent

You can configure your fallback intent by completing the following steps.

  1. From the Amazon Lex console, choose Create intent.
  2. Search for AMAZON.Fallback in the existing intents.

See the following screenshot of the BusinessMetricsFallback page:

If you have any clarification prompts the Fallback intent will be triggered after the clarification prompts are executed. We recommend disabling the clarification prompts. Hang up phrase are not used when Fallback is configured. See the following screenshot of the Error handling page:

  1. Add an intent ContactDetails to collect the email ID.

This is a simple intent with just the email address as a slot type. Please review the bot definition for intent details.

  1. Add an AWS Lambda function in the fulfillment code hook of the fallback intent.

This function performs two operations. First, it creates a task (for example, a ticket entry in a database) to record your request for an operator follow-up. Second, it switches the intent to elicit additional information, such as your email ID, so that a response goes out after an operator has processed the query. Please review the Lambda definition for code details.

With the preceding bot definition, you can now control the conversation. When you ask “How much did we spend last quarter,” the input does not match any of the configured intents, and triggers the fallback intent. The fulfillment code hook of the Lambda creates the ticket and switches the intent to ContactDetails to capture the email ID.

Summary

This post demonstrated how to have better control of the conversation flow with a fallback intent. You can switch intents, execute business logic, or provide custom responses. For more information about incorporating these techniques into real bots, see the Amazon Lex documentation.

 

 


About the Author

Kartik Rustagi works as a Software Development Manager in Amazon AI. He and his team focus on enhancing the conversation capability of chat bots powered by Amazon Lex. When not at work, he enjoys exploring the outdoors and savoring different cuisines.

 

 

 

 

Generating searchable PDFs from scanned documents automatically with Amazon Textract

Amazon Textract is a machine learning service that makes it easy to extract text and data from virtually any document. Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables. This allows you to use Amazon Textract to instantly “read” virtually any type of document and accurately extract text and data without the need for any manual effort or custom code.

The blog post Automatically extract text and structured data from documents with Amazon Textract shows how to use Amazon Textract to automatically extract text and data from scanned documents without any machine learning (ML) experience. One of the use cases covered in the post is search and discovery. You can search through millions of documents by extracting text and structured data from documents with Amazon Textract and creating a smart index using Amazon ES.

This post demonstrates how to generate searchable PDF documents by extracting text from scanned documents using Amazon Textract. The solution allows you to download relevant documents, search within a document when it is stored offline, or select and copy text.

You can see an example of searchable PDF document that is generated using Amazon Textract from a scanned document. While text is locked in images in the scanned document, you can select, copy, and search text in the searchable PDF document.

To generate a searchable PDF, use Amazon Textract to extract text from documents and add the extracted text as a layer to the image in the PDF document. Amazon Textract detects and analyzes text input documents and returns information about detected items such as pages, words, lines, form data (key-value pairs), tables, and selection elements. It also provides bounding box information, which is an axis-aligned coarse representation of the location of the recognized item on the document page. You can use the detected text and its bounding box information to place text in the PDF page.

PDFDocument is a sample library in AWS Samples GitHub repo and provides the necessary logic to generate a searchable PDF document using Amazon Textract.  It also uses open-source Java library Apache PDFBox to create PDF documents, but there are similar PDF processing libraries available in other programming languages.

The following code example shows how to use sample library to generate a searchable PDF document from an image:

...

//Extract text using Amazon Textract
List<TextLine> lines = extractText(imageBytes);

//Generate searchable PDF with image and text
PDFDocument doc = new PDFDocument();
doc.addPage(image, imageType, lines);

//Save PDF to local disk
try(OutputStream outputStream = new FileOutputStream(outputDocumentName)) {
    doc.save(outputStream);
}

...

Generating a searchable PDF from an image document

The following code shows how to take an image document and generate a corresponding searchable PDF document. Extract the text using Amazon Textract and create a searchable PDF by adding the text as a layer with the image.

public class DemoPdfFromLocalImage {

    public static void run(String documentName, String outputDocumentName) throws IOException {

        System.out.println("Generating searchable pdf from: " + documentName);

        ImageType imageType = ImageType.JPEG;
        if(documentName.toLowerCase().endsWith(".png"))
            imageType = ImageType.PNG;

        //Get image bytes
        ByteBuffer imageBytes = null;
        try(InputStream in = new FileInputStream(documentName)) {
            imageBytes = ByteBuffer.wrap(IOUtils.toByteArray(in));
        }

        //Extract text
        List<TextLine> lines = extractText(imageBytes);

        //Get Image
        BufferedImage image = getImage(documentName);

        //Create new pdf document
        PDFDocument pdfDocument = new PDFDocument();

        //Add page with text layer and image in the pdf document
        pdfDocument.addPage(image, imageType, lines);

        //Save PDF to local disk
        try(OutputStream outputStream = new FileOutputStream(outputDocumentName)) {
            pdfDocument.save(outputStream);
            pdfDocument.close();
        }

        System.out.println("Generated searchable pdf: " + outputDocumentName);
    }
    
    private static BufferedImage getImage(String documentName) throws IOException {

        BufferedImage image = null;

        try(InputStream in = new FileInputStream(documentName)) {
            image = ImageIO.read(in);
        }

        return image;
    }

    private static List<TextLine> extractText(ByteBuffer imageBytes) {

        AmazonTextract client = AmazonTextractClientBuilder.defaultClient();

        DetectDocumentTextRequest request = new DetectDocumentTextRequest()
                .withDocument(new Document()
                        .withBytes(imageBytes));

        DetectDocumentTextResult result = client.detectDocumentText(request);

        List<TextLine> lines = new ArrayList<TextLine>();
        List<Block> blocks = result.getBlocks();
        BoundingBox boundingBox = null;
        for (Block block : blocks) {
            if ((block.getBlockType()).equals("LINE")) {
                boundingBox = block.getGeometry().getBoundingBox();
                lines.add(new TextLine(boundingBox.getLeft(),
                        boundingBox.getTop(),
                        boundingBox.getWidth(),
                        boundingBox.getHeight(),
                        block.getText()));
            }
        }

        return lines;
    }
}

Generating a searchable PDF from a PDF document

The following code example takes an input PDF document from an Amazon S3 bucket and generates the corresponding searchable PDF document. You extract text from the PDF document using Amazon Textract, and create a searchable PDF by adding text as a layer with an image for each page.

public class DemoPdfFromS3Pdf {
    public static void run(String bucketName, String documentName, String outputDocumentName) throws IOException, InterruptedException {

        System.out.println("Generating searchable pdf from: " + bucketName + "/" + documentName);

        //Extract text using Amazon Textract
        List<ArrayList<TextLine>> linesInPages = extractText(bucketName, documentName);

        //Get input pdf document from Amazon S3
        InputStream inputPdf = getPdfFromS3(bucketName, documentName);

        //Create new PDF document
        PDFDocument pdfDocument = new PDFDocument();

        //For each page add text layer and image in the pdf document
        PDDocument inputDocument = PDDocument.load(inputPdf);
        PDFRenderer pdfRenderer = new PDFRenderer(inputDocument);
        BufferedImage image = null;
        for (int page = 0; page < inputDocument.getNumberOfPages(); ++page) {
            image = pdfRenderer.renderImageWithDPI(page, 300, org.apache.pdfbox.rendering.ImageType.RGB);

            pdfDocument.addPage(image, ImageType.JPEG, linesInPages.get(page));

            System.out.println("Processed page index: " + page);
        }

        //Save PDF to stream
        ByteArrayOutputStream os = new ByteArrayOutputStream();
        pdfDocument.save(os);
        pdfDocument.close();
        inputDocument.close();

        //Upload PDF to S3
        UploadToS3(bucketName, outputDocumentName, "application/pdf", os.toByteArray());

        System.out.println("Generated searchable pdf: " + bucketName + "/" + outputDocumentName);
    }

    private static List<ArrayList<TextLine>> extractText(String bucketName, String documentName) throws InterruptedException {

        AmazonTextract client = AmazonTextractClientBuilder.defaultClient();

        StartDocumentTextDetectionRequest req = new StartDocumentTextDetectionRequest()
                .withDocumentLocation(new DocumentLocation()
                        .withS3Object(new S3Object()
                                .withBucket(bucketName)
                                .withName(documentName)))
                .withJobTag("DetectingText");

        StartDocumentTextDetectionResult startDocumentTextDetectionResult = client.startDocumentTextDetection(req);
        String startJobId = startDocumentTextDetectionResult.getJobId();

        System.out.println("Text detection job started with Id: " + startJobId);

        GetDocumentTextDetectionRequest documentTextDetectionRequest = null;
        GetDocumentTextDetectionResult response = null;

        String jobStatus = "IN_PROGRESS";

        while (jobStatus.equals("IN_PROGRESS")) {
            System.out.println("Waiting for job to complete...");
            TimeUnit.SECONDS.sleep(10);
            documentTextDetectionRequest = new GetDocumentTextDetectionRequest()
                    .withJobId(startJobId)
                    .withMaxResults(1);

            response = client.getDocumentTextDetection(documentTextDetectionRequest);
            jobStatus = response.getJobStatus();
        }

        int maxResults = 1000;
        String paginationToken = null;
        Boolean finished = false;

        List<ArrayList<TextLine>> pages = new ArrayList<ArrayList<TextLine>>();
        ArrayList<TextLine> page = null;
        BoundingBox boundingBox = null;

        while (finished == false) {
            documentTextDetectionRequest = new GetDocumentTextDetectionRequest()
                    .withJobId(startJobId)
                    .withMaxResults(maxResults)
                    .withNextToken(paginationToken);
            response = client.getDocumentTextDetection(documentTextDetectionRequest);

            //Show blocks information
            List<Block> blocks = response.getBlocks();
            for (Block block : blocks) {
                if (block.getBlockType().equals("PAGE")) {
                    page = new ArrayList<TextLine>();
                    pages.add(page);
                } else if (block.getBlockType().equals("LINE")) {
                    boundingBox = block.getGeometry().getBoundingBox();
                    page.add(new TextLine(boundingBox.getLeft(),
                            boundingBox.getTop(),
                            boundingBox.getWidth(),
                            boundingBox.getHeight(),
                            block.getText()));
                }
            }
            paginationToken = response.getNextToken();
            if (paginationToken == null)
                finished = true;
        }

        return pages;
    }

    private static InputStream getPdfFromS3(String bucketName, String documentName) throws IOException {

        AmazonS3 s3client = AmazonS3ClientBuilder.defaultClient();
        com.amazonaws.services.s3.model.S3Object fullObject = s3client.getObject(new GetObjectRequest(bucketName, documentName));
        InputStream in = fullObject.getObjectContent();
        return in;
    }

    private static void UploadToS3(String bucketName, String objectName, String contentType, byte[] bytes) {
        AmazonS3 s3client = AmazonS3ClientBuilder.defaultClient();
        ByteArrayInputStream baInputStream = new ByteArrayInputStream(bytes);
        ObjectMetadata metadata = new ObjectMetadata();
        metadata.setContentLength(bytes.length);
        metadata.setContentType(contentType);
        PutObjectRequest putRequest = new PutObjectRequest(bucketName, objectName, baInputStream, metadata);
        s3client.putObject(putRequest);
    }
}

Running code on a local machine

To run the code on a local machine, complete the following steps. The code examples are available on the GitHub repo.

  1. Set up your AWS Account and AWS CLI.

For more information, see Getting Started with Amazon Textract.

  1. Download and unzip searchablepdf.zip from the GitHub repo.
  2. Install Apache Maven if it is not already installed.
  3. In the project directory, run mvn package.
  4. Run java -cp target/searchable-pdf-1.0.jar Demo.

This runs the Java project with Demo as the main class.

By default, only the first example to create a searchable PDF from an image on a local drive is enabled. To run other examples, uncomment the relevant lines in Demo class.

Running code in Lambda

To run the code in Lambda, complete the following steps. The code examples are available on the GitHub repo.

  1. Download and unzip searchablepdf.zip from the GitHub repo.
  2. Install Apache Maven if it is not already installed.
  3. In the project directory, run mvn package.

The build creates a .jar in project-dir/target/searchable-pdf1.0.jar, using information in the pom.xml to do the necessary transforms. This is a standalone .jar (.zip file) that includes all the dependencies. This is your deployment package that you can upload to Lambda to create a function. For more information, see AWS Lambda Deployment Package in Java. DemoLambda has all the necessary code to read S3 events and take action based on the type of input document.

  1. Create a Lambda with Java 8 and IAM role that has read and write permissions to the S3 bucket you created earlier.
  2. Configure the IAM role to also have permissions to call Amazon Textract.
  3. Set handler to DemoLambda::handleRequest.
  4. Increase timeout to 5 minutes.
  5. Upload the .jar file you built earlier.
  6. Create an S3 bucket.
  7. In the S3 bucket, create a folder labeled documents.
  8. Add a trigger in the Lambda function such that when an object uploads to the documents folder, the Lambda function executes.

Make sure that you set a trigger for the documents folder. If you add a trigger for the whole bucket, the function also triggers every time an output PDF document generates.

  1. Upload an image (.jpeg or .png) or PDF document to the documents folder in your S3 bucket.

In a few seconds, you should see the searchable PDF document in your S3 bucket.

These steps show simple S3 and Lambda integration. For large-scale document processing, see the reference architecture at following GitHub repo.

Conclusion

This post showed how to use Amazon Textract to generate searchable PDF documents automatically. You can search across millions of documents to find the relevant file by creating a smart search index using Amazon ES. Searchable PDF documents then allows you to select and copy text and search within a document after downloading it for offline use.

To learn more about different text and data extraction features of Amazon Textract, see How Amazon Textract Works.


About the Authors

Kashif Imran is a Solutions Architect at Amazon Web Services. He works with some of the largest strategic AWS customers to provide technical guidance and design advice. His expertise spans application architecture, serverless, containers, NoSQL and machine learning.

 

 

 

 

 

Transcribe speech to text in real time using Amazon Transcribe with WebSocket

Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to applications. In November 2018, we added streaming transcriptions over HTTP/2 to Amazon Transcribe. This enabled users to pass a live audio stream to our service and, in return, receive text transcripts in real time. We are excited to share that we recently started supporting real-time transcriptions over the WebSocket protocol. WebSocket support makes streaming speech-to-text through Amazon Transcribe more accessible to a wider user base, especially for those who want to build browser or mobile-based applications.

In this blog post, we assume that you are aware of our streaming transcription service running over HTTP/2, and focus on showing you how to use the real-time offering over WebSocket. However, for reference on using HTTP/2, you can read our previous blog post and tech documentation.

What is WebSocket?

WebSocket is a full-duplex communication protocol built over TCP. The protocol was standardized by the IETF as RFC 6455 in 2011. WebSocket is suitable for long-lived connectivity whereby both the server and the client can transmit data over the same connection at the same time. It is also practical for cross-domain usage. Voila! No need to worry about cross-origin resource sharing (CORS) as there would be when using HTTP.

Using Amazon Transcribe streaming with WebSocket

To use Amazon Transcribe’s StartStreamTranscriptionWebSocket API, you first need to authorize your IAM user to use the Amazon Transcribe Streaming WebSocket. Go to the AWS Management Console, navigate to Identity & Access Management (IAM), and attach the following inline policy to your user in the AWS IAM console. Please refer to “To embed an inline policy for a user or role” for instructions on how to add permissions.

{
    "Version": "2012-10-17",
    "Statement": [
        "Sid": "transcribestreaming",
        "Effect": "Allow",
        "Action": "transcribe:StartStreamTranscriptionWebSocket",
        "Resource": "*"
    ]
}

Your upgrade request should be pre-signed with your AWS credentials using the AWS Signature Version 4. The request should contain the required parameters, namely sample-rate, language code, and media-encoding. You could optionally supply vocabulary-name to use a custom vocabulary. The StartStreamTranscriptionWebSocket API supports all of the languages that Amazon Transcribe streaming supports today. After your connection is upgraded to WebSocket, you can send your audio chunks as an AudioEvent of the event-stream encoding in the binary WebSocket frame. The response you get is the transcript JSON, which would also be event-stream encoded. For more details, please refer to our tech docs.

To demonstrate how you can power your application with Amazon Transcribe in real time with WebSocket, we built a sample static website. On the website you can enter your account credentials, choose one of the preferred languages, and start streaming. The complete sample code is available on GitHub. JavaScript developers, among others, may find this to be a helpful start. We’d love to see what other cool applications you can build using Amazon Transcribe streaming with WebSocket!


About the authors

Bhaskar Bagchi is an engineer in the Amazon Transcribe service team. Outside of work, Bhaskar enjoys photography and singing.

 

 

 

 

Karan Grover is an engineer in the Amazon Transcribe service team. Outside of work, Karan enjoys hiking and is a photography enthusiast.

 

 

 

 

Paul Zhao is a Product Manager at AWS Machine Learning. He manages the Amazon Transcribe service. Outside of work, Paul is a motorcycle enthusiast and avid woodworker.

 

 

 

 

 

 

Using Amazon Polly in Windows Applications

AWS offers a vast array of services that allow developers to build applications in the cloud. At the same time, Windows desktop applications can take advantage of these services as well. Today, we are releasing Amazon Polly for Windows, an open-source engine that allows users to take advantage of Amazon Polly voices in SAPI-compliant Windows applications.

What is SAPI? SAPI (Speech Application Programming Interface) is a Microsoft Windows API that allows desktop applications to implement speech synthesis. When an application supports SAPI, it can access any of the installed SAPI voices to generate speech.

Out of the box, Microsoft Windows provides one SAPI male and female voice that can be used in any supported voice application. With Amazon Polly for Windows, users can install over 50 additional voices across over 25 languages, paying only for what they use.  For more details, please visit the Amazon Polly documentation and check the full list of text-to-speech voices.

Create an AWS account

If you don’t already have an AWS account, you can sign up here, which gives you 12-months in our free tier. During the first 12 months, Amazon Polly is free for the first 5 million characters/month. How many characters is that? As an example, “Ulysses” by James Joyce is 730 pages and contains approximately 1.5 million characters. So you could have Amazon Polly read the entire book three times and still have an additional 500,000 free characters for the remainder of the month.

Configure your account

  1. Log in to your AWS account.
  2. After you’ve logged in, click Services from the top menu bar, then type IAM in the search box. Click IAM when it pops up.
  3. On the left, click Users
  4. Click Add User
  5. Type in polly-windows-user (you can use any name)
  6. Click the Programmatic access check box and leave AWS Management Console access unchecked
  7. Click Next: Permissions
  8. Click Attach existing policies directly
  9. At the bottom of the page, in the search box next to Filter: Policy type, type polly
  10. Click the check box next to AmazonPollyReadOnlyAccess
  11. Click Next: Review
  12. Click Create user

IMPORTANT: Don’t close the webpage. You’ll need both the access key ID and the secret access key in Step 3.

Step 2: Install the AWS CLI for Windows

Click here to download the AWS CLI for Windows.

Step 3: Configure the AWS client

Amazon Polly for Windows requires an AWS profile called polly-windows. This ensures that the Amazon Polly engine is using the correct account.

  1. Open a Windows command prompt
  2. Type this command:
    aws configure --profile polly-windows 

  3. When prompted for the AWS Access Key ID and AWS Secret Access Key, use the values from the previous step.
  4. For Default Region, you can hit Enter for the default (us-east-1) or enter a different Region. Make sure to use all lower-case.
  5. For Default output format, just hit Enter
  6. Verify this worked by running the following command. You should see a list of voices:
    aws --profile polly-windows polly describe-voices 

Step 4: Install Amazon Polly TTS Engine for Windows

Click here to download and run the installer. You can verify that the installer worked properly. Amazon Polly for Windows comes with PollyPlayer, an application that allows you to experiment with the voices without additional software. Simply pick a voice, enter text, and then click Say It.

Using Amazon Polly Voices in Applications

The Amazon Polly voices are accessible in any Windows application that implements Windows SAPI. This means that after the Amazon Polly voices are installed, you simply need to select the Amazon Polly voice that you want to use from the list of voices in the application.

Amazon Polly supports SSML (Speech Synthesis Markup Language), which allows users to add tags to customize the speech generation. With Amazon Polly for Windows, users can either use plaintext or SSML tags when submitting requests. The standard Amazon Polly limits apply of 3000 maximum billed characters per request, or 6000 characters total (SSML tags are not billed).

Example: Using Amazon Polly for Windows with Adobe Captivate

Building eLearning content is a great use case for generated speech. In the past, content managers would need to record voice content, and then re-record as content changes. Using an eLearning designer such as Adobe Captivate along with Amazon Polly voices allows you to easily create and dynamically update content whenever you need.

You can use any SAPI-enabled eLearning solution. In this demonstration, we walk through creating a simple slide with Captivate to show how quickly and easily you can add voice content. If you don’t already have Captivate, you can download a free trial here.

Step 1: Create a project

Start Captivate and click New Project / Blank Project to create a new project.

At this point, you have a new blank project with a single slide.

Step 2: Add speech content

From the Audio menu, click Speech Management.

This brings up a Speech Management modal window, where you can add speech content to the slide. Click on the Speech Agent drop-down and select Amazon Polly – US English – Salli (Neural).  By default, all slides to use this voice.

Click the + button to add content.

In the textbox, type My name is Salli. My speech is generated by Amazon Polly.

Now we must generate the audio. Behind the scenes, Captivate uses the Windows SAPI driver to call back to AWS to generate the speech. Click Save and Generate Audio.

After the speech is generated, you can preview the audio by clicking the Play button next to the Generate Audio button.

You hear Salli speaking the text. Click the Close button.

After closing the window, you can preview the entire project to hear the speech with the slide.

The wide selection of Amazon Polly voices allows a content manager to build and experiment with limitless combinations of speech. Because content and voice selections can be updated at any time, content managers can keep both the audio presentation and content fresh without ever having to go near a recording studio.

Now that you’ve installed Amazon Polly for Windows, you can have fun experimenting with different variations of speech using using SSML tags, which are all fully supported in Windows. And because Amazon Polly for Windows is open-source, you can feel free to contribute features and submit feature requests. You can share feedback at the Amazon Polly forum. We’d love to hear how you’re using Amazon Polly for Windows!


About the Author

Troy Larson is a Senior DevOPs Cloud Architect for AWS Professional Services.

 

Build your ML skills with AWS Machine Learning on Coursera

Machine learning (ML) is one of the fastest growing areas in technology and a highly sought after skillset in today’s job market. Today, I am excited to announce a new education course, built in collaboration with Coursera, to help you build your ML skills: Getting started with AWS Machine Learning. You can access the course content for free now on the Coursera website.

The World Economic Forum [1] states that the growth of artificial intelligence (AI) could create 58 million net new jobs in the next few years, yet, it’s estimated that there are currently 300,000 AI engineers worldwide, but millions are needed [2]. This means that there is a unique and immediate opportunity to for you to get started learning the essential ML concepts that are used to build AI applications – no matter what your skill level. Learning the foundations of ML now will help you keep pace with this growth, expand your skills, and even help advance your career.

Based on the same ML courses used to train engineers at Amazon, this course teaches you how to get started with AWS Machine Learning. Key topics include: Machine Learning on AWS, Computer Vision on AWS, and Natural Language Processing (NLP) on AWS. Each topic consists of several modules deep-diving into a variety of ML concepts, AWS services, as well as insights from experts to put the concepts into practice. This course is a great start to build your foundational knowledge on Machine Learning before diving in deeper with the AWS Machine Learning Certification.

How it Works

You can read and view the course content for free on Coursera. If you want to access assessments, take graded assignments, and get a post course certificate, it costs $49 in the USA and $29 in Brazil, Russia, Mexico, and India. If you choose the paid route, when you complete the course, you’ll get an electronic Certificate that you can print and even add to your LinkedIn profile to showcase your new found machine learning knowledge.

Enroll now to build your skills towards becoming an ML developer!


About the Author

Tara Shankar Jana is a Senior Product Marketing Manager for AWS Machine Learning. Currently he is working on building unique and scalable educational offerings for the aspiring ML developer communities- to help them expand their skills on ML. Outside of work he loves reading books, travelling and spending time with his family.

 

 

 


[1] Artificial Intelligence to Create 58 Million New Jobs by 2022, Says Report (Forbes)
[2] Tencent says there are only 300,000 AI engineers worldwide, but millions are needed (The Verge)


Build, test, and deploy your Amazon Sagemaker inference models to AWS Lambda

Amazon SageMaker is a fully managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning (ML) models at any scale. When you deploy an ML model, Amazon SageMaker leverages ML hosting instances to host the model and provides an API endpoint to provide inferences. It may also use AWS IoT Greengrass.

However, thanks to Amazon SageMaker’s flexibility, which allows deployment to different targets, there are situations when hosting the model on AWS Lambda can provide some advantages. Not every model can be hosted on AWS Lambda, for instance, when a GPU is needed. Also, there are other limits, like the size of AWS Lambda’s deployment package, which can prevent you from using this method. When using AWS Lambda is possible, this architecture has advantages like lower cost, event triggering, seamless scalability, and spike requests. For example, when the model is small and not often invoked, it may be cheaper to use AWS Lambda.

In this post, I create a pipeline to build, test, and deploy an Lambda function that provides inferences.

Prerequisites

I assume that the reader has experience with Amazon SageMaker, AWS CloudFormation, AWS Lambda, and the AWS Code* suite.

Architecture description

To create the pipeline for CI/CD, use AWS Developer Tools. The suite uses AWS CodeDeploy, AWS CodeBuild, and AWS CodePipeline. Following is a diagram of the architecture:

When I train the model with Amazon SageMaker, the output model is saved into an Amazon S3 bucket. Each time a file is put into the bucket, AWS CloudTrail triggers an Amazon CloudWatch event. This event invokes a Lambda function to check whether the file uploaded is a new model file. It then moves this file to a different S3 bucket. This is necessary because Amazon SageMaker saves other files, like checkpoints, in different folders, along with the model file. But to trigger AWS CodePipeline, there must be a specific file in a specific folder of an S3 bucket.

Therefore, after the model file is moved from the Amazon SageMaker bucket to the destination bucket, AWS CodePipeline is triggered. First, AWS CodePipeline invokes AWS CodeBuild to create three items:

  • The deployment package of the Lambda function.
  • The AWS Serverless Application Model (AWS SAM) template to create the API.
  • The Lambda function to serve the inference.

After this is done, AWS CodePipeline executes the change set to transform the AWS SAM template into an AWS CloudFormation template. When the template executes, AWS CodeDeploy is triggered. AWS CodeDeploy invokes a Lambda function to test whether the Lambda function that was newly created in the latest version of your model is working as expected. If so, AWS CodeDeploy shifts the traffic from the old version to the new version of the Lambda function with the newest version of the model. Then, the deployment is done.

How the Lambda function deployment package is created

In the AWS CloudFormation template that I created to generate the pipelines, I included a section where I indicate how AWS CodeBuild should create this package. I also outlined how to create the AWS SAM template to generate the API and the Lambda function itself.

Here’s the code example:

- "git clone ${GitRepository}"
- "cd ${GitRepositoryName}"
- "rm -rf .git "
- "ls -al "
- "aws s3 cp s3://${SourceBucket}/${SourceS3ObjectKey} ."
- "tar zxf ${SourceS3ObjectKey}"
- "ls -al"
- "pwd"
- "rm -f ${SourceS3ObjectKey}"
- "aws cloudformation package --template-file samTemplateLambdaChecker.yaml --s3-bucket ${SourceBucket} --output-template-file ../outputSamTemplate.yaml"
- "cp samTemplateLambdaChecker.yaml ../"

In the BuildSpec, I use a GitHub repository to download the necessary files. These files are the Lambda function code, the Lambda function checker (which AWS CodeDeploy uses to check whether the new model works as expected), and the AWS SAM template. In addition, AWS CodeBuild copies the latest model.tar.gz file from S3.

To work, the Lambda function also must have Apache MXNet dependencies. The AWS CloudFormation template that you use creates a Lambda layer that contains the MXNet libraries necessary to run inferences in Lambda. I have not created a pipeline to build the layer, as that isn’t the focus of this post. You can find the steps I used to compile MXNet from Lambda in the following section.

Testing the pipeline

Before proceeding, create a new S3 bucket into which to move the model file:

  1. In the S3 console, choose Create bucket.
  2. For Bucket Name, enter a custom name.
  3. For Region, choose the Region in which to create the pipeline and choose Next.
  4. Enable versioning by selecting Keep all versions of an object in the same bucket and choose Next.
  5. Choose Create bucket.

In this bucket, add three files:

  • An empty file in a zip file called empty.zip. This is necessary because AWS CodeBuild must receive a file when invoked in order to work—although, you do not use this file in this case.
  • The file mxnet-layer.zip.
  • The zip function, which copies the file from the Amazon SageMaker bucket to the AWS CodePipeline triggering bucket.

To upload these files:

  1. Open the S3
  2. Choose your bucket.
  3. On the Upload page, click on Add files and select the zip file.
  4. Choose Next until you can select Upload.

Now that you have created this new bucket, you can launch the AWS CloudFormation template after downloading the template.

  1. Open the AWS CloudFormation
  2. Choose Create Stack.
  3. For Choose a template, select Upload a template to Amazon S3 and select the file.
  4. Choose Next.
  5. Add a Stack name.
  6. Change SourceS3Bucket to the bucket name you have previously created.
  7. Choose Next, then Next
  8. Select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  9. Choose Create.

 

This creates the pipeline on your behalf and deploys everything necessary. When you train the model in Amazon SageMaker, you must indicate that the S3 bucket created by your AWS CloudFormation template is the bucket in which you should host the output model. To find the name of your S3 bucket:

  1. Open the AWS CloudFormation
  2. Select your Stack Name.
  3. Choose Resources and find ModelS3Location.

To simulate that a new model has been trained by Amazon SageMaker and uploaded to S3, download a model that I previously trained and uploaded here on GitHub.

After that’s downloaded, you can upload the file to the S3 bucket that you created. The model has been trained from the SMS Spam Collection dataset provided by the University of California. You can also view the workshop from re:Invent 2018 that covers how to train this model. This simple dataset was trained with a neural network using Gluon, based on Apache MXNet.

  1. Open the S3
  2. Choose your ModelS3Location bucket.
  3. Choose Upload, Add files, and select the zip file.
  4. Choose Next, and choose Upload.

From the AWS CodeDeploy console, you should be able to see that the process has been initiated, as shown in the following image.

After the process has been completed, you can see that a new AWS CloudFormation stack called AntiSpamAPI has been created. As previously explained, this new stack has created the Lambda function and the API to serve the inference. You can invoke the endpoint directly. First, find the endpoint URL.

  1. In the AWS CloudFormation console, choose your AntiSpamAPI.
  2. Choose Resources and find ServerlessRestApi.
  3. Choose the ServerlessRestApi resource, which opens the API Gateway console.
  4. From the API Gateway console, select AntiSpamAPI.
  5. Choose Stages, Prod.
  6. Copy the Invoke URL.

After you have the endpoint URL, you can test it using this simple page that I’ve created:

For example, you can determine that the preceding sentence has a 99% probability of being spam, as you can see from the raw output.

Conclusion

I hope this post proves useful for understanding how you can automatically deploy your model into a Lambda function using AWS developer tools. Having a pipeline can reduce the overhead associated with using a model with a serverless architecture. With minor changes, you can use this pipeline to deploy a model that can be trained anywhere, like Amazon Deep Learning AMIs, AWS Deep Learning Containers, or on premises.

If you have questions or suggestions, please share them on GitHub or in the comments.


About the Author

Diego Natali is a solutions architect for Amazon Web Services in Italy. With several years engineering background, he helps ISV and Start up customers designing flexible and resilient architectures using AWS services. In his spare time he enjoys watching movies and riding his dirt bike.

 

 

 

 

Multiregion serverless distributed training with AWS Batch and Amazon SageMaker

Creating a global footprint and access to scale are one of the many best practices at AWS. By creating architectures that take advantage of that scale and also efficient data utilization (in both performance and cost), you can start to see how important access is at scale. For example, within autonomous vehicles (AV) development, data is geographically acquired local to the driving campaign. It is relevant and more efficient from a machine learning (ML) perspective to execute the compute pipeline in the same AWS Region as the generated data.

To elaborate further, say that your organization acquires 4K video data on a driving campaign in San Francisco, United States. In parallel, your colleagues acquire a driving campaign in Stuttgart, Germany. Both video campaigns can result in a few TBs of data per day. Ideally, you would transfer the data into Regions close to where you generated the data (in this case, us-west-1 and eu-central-1). If the workflow labels this data, then running the distributed training local to their respective Regions makes sense from a cost and performance standpoint while maintaining consistency in the hyperparameters used to train both datasets.

To get started with distributed training on AWS, use Amazon SageMaker, which provisions much of the undifferentiated heavy lifting required for distributed training (for example, optimized TensorFlow with Horovod). Additionally, its per-second billing provides efficient cost management. These benefits free up your focus for model development and deployment in a fully managed architecture.

Amazon SageMaker is an ecosystem of managed ML services to help with ground truth labeling, model training, hyperparameter optimization, and deployment. You can access these services using Jupyter notebooks, the AWS CLI, or the Amazon SageMaker Python SDK. Particularly with the SDK, you need little code change to initiate and distribute the ML workload.

In the above architecture the S3 bucket serves as source for the training input files. The SageMaker Python SDK will instantiate the required compute resources and Docker image to run the model training sourcing the data from the S3. The output model artifacts are saved to an output S3 bucket.

Because the Amazon SageMaker Python SDK abstracts infrastructure deployment and is entirely API driven, you can orchestrate requests for training jobs via the SDK in scalable ways.

In the previous AV scenario, for example, you can trigger the input training data from the uploaded dataset, which you tracked in a relational way. You can couple this with AWS Batch, which offers a job array mechanism that can submit these distributed training jobs in a scalable way passing relevant hyperparameters at runtime. Consider the following example architecture.

In this above architecture a relational database is used to track, for example, AV campaign metadata globally. A SQL query can be generated which populates the JOBARRAY input file in AWS Batch. AWS Batch then orchestrates the instantiation of the grid of clusters that are executed globally across multiple AWS Regions.

You are standing up a grid of clusters, globally deployed, based on data in Amazon S3 that is  generated locally. Querying the metadata from a central database to organize the training inputs with access to capacity across all four Regions. You can include some additional relational joins, which select data for transitive copy based on the On-Demand or Spot price per Region and reservation capacity.

Deploying Amazon SageMaker

The example in this post runs the Imagenet2012/Resnet50 model, with the Imagenet2012 TF records replicated across Regions. For this advanced workflow, you must prepare two Docker images. One image is for calling the Amazon SageMaker SDK to prepare the job submission, and the second image is for running the Horovod-enabled TensorFlow 1.13 environment.

First, create an IAM role to call the Amazon SageMaker service and subsequent services to run the training. Then, create the dl-sagemaker.py script. This is the main call script into the Amazon SageMaker training API.

For instructions on building the Amazon SageMaker Script Mode Docker image, see the TensorFlow framework repo on GitHub, in aws/sagemaker-tensorflow-container. After it’s built, commit this image to each Region in which you plan to generate data.

The following example commits this to us-east-1 (Northern Virginia), us-west-2 (Oregon), eu-west-1 (Ireland), and eu-central-1 (Frankfurt). When support for TensorFlow 1.13 with Tensorpack is in the Amazon SageMaker Python SDK, this becomes an optional step. To simplify the deployment, keep the name of the Amazon ECR image the same throughout Regions.

For the main entry script to call the Amazon SageMaker SDK (dl-sagemaker.py), complete the following steps:

  1. Replace the entry:
    role = 'arn:aws:iam::<account-id>:role/sagemaker-sdk'

  2. Replace the image_name with the name of the Docker image that you created:
    import os
    from sagemaker.session import s3_input
    from sagemaker.tensorflow import TensorFlow
    
    role = 'arn:aws:iam::<account-id>:role/sagemaker-sdk'
    
    num_gpus = int(os.environ.get('GPUS_PER_HOST'))
    
    distributions={
    'mpi': {
    'enabled': True,
    'processes_per_host': num_gpus,
    'custom_mpi_options': '-mca btl_vader_single_copy_mechanism none -x HOROVOD_HIERARCHICAL_ALLREDUCE=1 -x HOROVOD_FUSION_THRESHOLD=16777216 -x NCCL_MIN_NRINGS=8 -x NCCL_LAUNCH_MODE=PARALLEL'
    }
    }
    
    def main(aws_region,s3_location):
    estimator = TensorFlow(
    train_instance_type='ml.p3.16xlarge',
    train_volume_size=100,
    train_instance_count=10,
    framework_version='1.12',
    py_version='py3',
    image_name="<account id>.dkr.ecr.%s.amazonaws.com/sage-py3-tf-hvd:latest"%aws_region,
    entry_point='sagemaker_entry.py',
    dependencies=['/Users/amrraga/git/github/deep-learning-models'],
    script_mode=True,
    role=role,
    distributions=distributions,
    base_job_name='dist-test',
    )
    estimator.fit(s3_location)
    
    if __name__ == '__main__':
    aws_region = os.environ.get('AWS_DEFAULT_REGION')
    s3_location = os.environ.get('S3_LOCATION')
    
    main(aws_region,s3_location)

The following code is for sagemaker_entry.py, the inner call to initiate the training script:

import subprocess
import os

if __name__ =='__main__':
    train_dir = os.environ.get('SM_CHANNEL_TRAIN')
    subprocess.call(['python','-W ignore', 'deep-learning-models/models/resnet/tensorflow/train_imagenet_resnet_hvd.py', 
            "—data_dir=%s"%train_dir, 
            '—num_epochs=90', 
            '-b=256', 
            '—lr_decay_mode=poly', 
            '—warmup_epochs=10', 
            '—clear_log'])

The following code is for sage_wrapper.sh, the overall wrapper for AWS Batch to download the array definition from S3 and initiate the global Amazon SageMaker API calls:

#!/bin/bash -xe
###################################
env
###################################
echo "DOWNLOADING SAGEMAKER MANIFEST ARRAY FILES..."
aws s3 cp $S3_ARRAY_FILE sage_array.txt
if [[ -z "${AWS_BATCH_JOB_ARRAY_INDEX}" ]]; then
   echo "NOT AN ARRAY JOB...EXITING"
   exit 1
else
   LINE=$((AWS_BATCH_JOB_ARRAY_INDEX + 1))
   SAGE_SYSTEM=$(sed -n ${LINE}p sage_array.txt)
   while IFS=, read -r f1 f2 f3; do
           export AWS_DEFAULT_REGION=${f1}
           export S3_LOCATION=${f2}
   done <<< $SAGE_SYSTEM
fi

GPUS_PER_HOST=8 python3 dl-sagemaker.py

echo "SAGEMAKER TRAINING COMPLETE"
exit 0

Lastly, the following code is for the Dockerfile, to build the batch orchestration image:

FROM amazonlinux:latest

### SAGEMAKER PYTHON SDK

RUN yum update -y
RUN amazon-linux-extras install epel
RUN yum install python3-pip git -y
RUN pip3 install tensorflow sagemaker awscli

### API SCRIPTS

RUN mkdir /api
ADD dl-sagemaker.py /api
ADD sagemaker_entry.py /api
ADD sage_wrapper.sh /api
RUN chmod +x /api/sage_wrapper.sh

### SAGEMAKER SDK DEPENDENCIES

RUN git clone https://github.com/aws-samples/deep-learning-models.git /api/deep-learning-models

Commit the built Docker image to ECR in the same Region as the Amazon SageMaker Python SDK. From this Region, you can deploy all your Amazon SageMaker distributed ML cluster-workers globally.

With AWS Batch, you don’t need any unique configurations to instantiate a compute environment. Because you are just using AWS Batch to launch the Amazon SageMaker APIs, the default settings are enough. Attach a job queue to the compute environment and create the job definition file with the following:

{
    "jobDefinitionName": "sagemaker-python-sdk-jobdef",
    "jobDefinitionArn": "arn:aws:batch:us-east-1:<accountid>:job-definition/sagemaker-python-sdk-jobdef:1",
    "revision": 1,
    "status": "ACTIVE",
    "type": "container",
    "parameters": {},
    "containerProperties": {
        "image": "<accountid>.dkr.ecr.us-east-1.amazonaws.com/batch/sagemaker-sdk:latest",
        "vcpus": 2,
        "memory": 2048,
        "command": [
            "/api/sage_wrapper.sh"
        ],
        "jobRoleArn": "arn:aws:iam::<accountid>:role/ecsTaskExecutionRole",
        "volumes": [],
        "environment": [
            {
                "name": "S3_ARRAY_FILE",
                "value": "s3://ragab-ml/"
            }
        ],
        "mountPoints": [],
        "ulimits": [],
        "resourceRequirements": []
    }
}

To import at job startup, upload an example JOBARRAY file to S3:

us-east-1,s3://ragab-ml/imagenet2012/tf-imagenet/resized
us-west-2,s3://ragab-ml-pdx/imagenet2012/tf-imagenet/resized
eu-west-1,s3://ragab-ml-dub/imagenet2012/tf-imagenet/resized
eu-central-1,s3://ragab-ml-fra/imagenet2012/tf-imagenet/resized

On the Jobs page, submit a job that changes the path of the S3_ARRAY_FILE. A job array starts up with each node dedicated to submitting and monitoring an ML training job in a separate Region. If you select a candidate Region where a job is running, you can see additional algorithms, instance metrics, and further log details.

One notable aspect of this deployment is that in the previous example, you launched a grid of clusters of 480 GPUs over four Regions, totaling 360,000 images/sec combined. This process improved time to results and optimized parameter scanning.

Conclusion

By implementing this architecture, you now have a scalable, performant, globally distributed ML training platform. In the AWS Batch script, you can lift any number of parameters into the array file to distribute the workload. For example, you can use not only different input training files, but also different hyperparameters, Docker container images, or even different algorithms, all deployed on a global scale.

Consider also that any backend, serverless distributed ML service can execute these workloads. For example, it is possible to replace the Amazon SageMaker components with Amazon EKS. Now go power up your ML workloads with a global footprint!

Open the Amazon SageMaker console to get started. If you have any questions, please leave them in the comments.


About the Author

Amr Ragab is a Business Development Manager in Accelerated Computing for AWS, devoted to helping customers run computational workloads at scale. In his spare time he likes traveling and finds ways to integrate technology into daily life.