Using Amazon Polly in Windows Applications
AWS offers a vast array of services that allow developers to build applications in the cloud. At the same time, Windows desktop applications can take advantage of these services as well. Today, we are releasing Amazon Polly for Windows, an open-source engine that allows users to take advantage of Amazon Polly voices in SAPI-compliant Windows applications.
What is SAPI? SAPI (Speech Application Programming Interface) is a Microsoft Windows API that allows desktop applications to implement speech synthesis. When an application supports SAPI, it can access any of the installed SAPI voices to generate speech.
Out of the box, Microsoft Windows provides one SAPI male and female voice that can be used in any supported voice application. With Amazon Polly for Windows, users can install over 50 additional voices across over 25 languages, paying only for what they use. For more details, please visit the Amazon Polly documentation and check the full list of text-to-speech voices.
Create an AWS account
If you don’t already have an AWS account, you can sign up here, which gives you 12-months in our free tier. During the first 12 months, Amazon Polly is free for the first 5 million characters/month. How many characters is that? As an example, “Ulysses” by James Joyce is 730 pages and contains approximately 1.5 million characters. So you could have Amazon Polly read the entire book three times and still have an additional 500,000 free characters for the remainder of the month.
Configure your account
- Log in to your AWS account.
- After you’ve logged in, click Services from the top menu bar, then type IAM in the search box. Click IAM when it pops up.
- On the left, click Users
- Click Add User
- Type in polly-windows-user (you can use any name)
- Click the Programmatic access check box and leave AWS Management Console access unchecked
- Click Next: Permissions
- Click Attach existing policies directly
- At the bottom of the page, in the search box next to Filter: Policy type, type polly
- Click the check box next to AmazonPollyReadOnlyAccess
- Click Next: Review
- Click Create user
IMPORTANT: Don’t close the webpage. You’ll need both the access key ID and the secret access key in Step 3.
Step 2: Install the AWS CLI for Windows
Click here to download the AWS CLI for Windows.
Step 3: Configure the AWS client
Amazon Polly for Windows requires an AWS profile called polly-windows. This ensures that the Amazon Polly engine is using the correct account.
- Open a Windows command prompt
- Type this command:
aws configure --profile polly-windows
- When prompted for the AWS Access Key ID and AWS Secret Access Key, use the values from the previous step.
- For Default Region, you can hit Enter for the default (us-east-1) or enter a different Region. Make sure to use all lower-case.
- For Default output format, just hit Enter
- Verify this worked by running the following command. You should see a list of voices:
aws --profile polly-windows polly describe-voices
Step 4: Install Amazon Polly TTS Engine for Windows
Click here to download and run the installer. You can verify that the installer worked properly. Amazon Polly for Windows comes with PollyPlayer, an application that allows you to experiment with the voices without additional software. Simply pick a voice, enter text, and then click Say It.
Using Amazon Polly Voices in Applications
The Amazon Polly voices are accessible in any Windows application that implements Windows SAPI. This means that after the Amazon Polly voices are installed, you simply need to select the Amazon Polly voice that you want to use from the list of voices in the application.
Amazon Polly supports SSML (Speech Synthesis Markup Language), which allows users to add tags to customize the speech generation. With Amazon Polly for Windows, users can either use plaintext or SSML tags when submitting requests. The standard Amazon Polly limits apply of 3000 maximum billed characters per request, or 6000 characters total (SSML tags are not billed).
Example: Using Amazon Polly for Windows with Adobe Captivate
Building eLearning content is a great use case for generated speech. In the past, content managers would need to record voice content, and then re-record as content changes. Using an eLearning designer such as Adobe Captivate along with Amazon Polly voices allows you to easily create and dynamically update content whenever you need.
You can use any SAPI-enabled eLearning solution. In this demonstration, we walk through creating a simple slide with Captivate to show how quickly and easily you can add voice content. If you don’t already have Captivate, you can download a free trial here.
Step 1: Create a project
Start Captivate and click New Project / Blank Project to create a new project.
At this point, you have a new blank project with a single slide.
Step 2: Add speech content
From the Audio menu, click Speech Management.
This brings up a Speech Management modal window, where you can add speech content to the slide. Click on the Speech Agent drop-down and select Amazon Polly – US English – Salli (Neural). By default, all slides to use this voice.
Click the + button to add content.
In the textbox, type My name is Salli. My speech is generated by Amazon Polly.
Now we must generate the audio. Behind the scenes, Captivate uses the Windows SAPI driver to call back to AWS to generate the speech. Click Save and Generate Audio.
After the speech is generated, you can preview the audio by clicking the Play button next to the Generate Audio button.
You hear Salli speaking the text. Click the Close button.
After closing the window, you can preview the entire project to hear the speech with the slide.
The wide selection of Amazon Polly voices allows a content manager to build and experiment with limitless combinations of speech. Because content and voice selections can be updated at any time, content managers can keep both the audio presentation and content fresh without ever having to go near a recording studio.
Now that you’ve installed Amazon Polly for Windows, you can have fun experimenting with different variations of speech using using SSML tags, which are all fully supported in Windows. And because Amazon Polly for Windows is open-source, you can feel free to contribute features and submit feature requests. You can share feedback at the Amazon Polly forum. We’d love to hear how you’re using Amazon Polly for Windows!
About the Author
Troy Larson is a Senior DevOPs Cloud Architect for AWS Professional Services.