In a nutshell, text-to-speech technology is a powerful way to enable people to experience the web who otherwise couldn’t. As the technology has evolved, it works quite admirably – although it hasn’t gotten to the point where it sounds as natural as if someone were reading aloud to you.
That’s where Amazon’s Polly project comes in, which aims to get as close as possible to lifelike speech. In this article, we’re going to talk about the current state of text-to-speech software and introduce you to the Amazon Polly project. Then we’ll talk about our impressions on how Amazon Polly and WordPress play together. Let’s get talking!
What Text-to-Speech Software Is (And When It Makes Sense to Use It)
The concept of text-to-speech software is simple – you take a paragraph, a page, an article, or even a whole book and have a computer read it aloud to you. When people think about text-to-speech, they often associate it with robotic voices and stilted cadences. However, this usually isn’t the case anymore, particularly with modern software.
To some people, text-to-speech may sound like a gimmick, but it’s a technology with very practical applications, such as:
- Enabling people with disabilities to ‘read’. The most obvious use of text-to-speech software is to enable people with visual impairments to consume written content.
- It provides a hands-off reading experience. Even if your eyesight is perfect, sometimes it’s more comfortable or convenient to listen to something instead of reading it.
- For situations where audio versions of content aren’t available. These days, most popular books are also released in audio format. However, the same doesn’t hold true for most other written content, including articles, poems, and more. Text-to-speech software enables you to listen to any written content you want as long (as the functionality is built-in).
From a technical perspective, getting text-to-speech right is much more difficult than you might imagine. Recording human speech and reproducing is only the beginning, which brings us to the next section.
The Current State of Text-to-Speech Software
If you remember what text-to-speech software sounded like even a few years ago, you may not look on the technology fondly. However, this type of software has come a long way during the past few years. Here’s a quick example of Amazon Kindle’s text-to-speech functionality in action, reading Pride and Prejudice:
You’ll notice the video showcases several voices, some of which sound better than you’d imagine. Admittedly, they’re all a bit stilted, but the Englishman’s rendition in particular is quite enjoyable to hear. The difference between what a human would sound like reading the text is noticeable. However, it’s not out of the question for someone to work through an entire book using text-to-speech and still enjoy it. Plus, your computer narrator will never tire or slur their words, which gives it an edge over humans.
Naturally, there is plenty of other software offering decent text-to-speech capabilities, such as Natural Reader. This program enables you to open and edit your documents, as well as paste content and have it read aloud to you in over 50 different voices. Here’s a quick introduction video using some of the voices featured by the software:
The difference in quality between Natural Reader and Amazon is obvious. Speech sounds much more mechanical and the pauses between words are more noticeable. However, increasing the speed of the reader does a decent job of masking these issues.
To sum it up, there’s still a lot of variation in quality when it comes to text-to-speech software. In a few years, the technology will probably leap forward massively thanks to machine learning applications. At that stage, it might no longer become so easy to discern when you’re listening to a machine read aloud to you, or a real human being.
An Introduction to Amazon Polly
Amazon Polly is a cloud service that enables you to turn text into speech in over 20 languages, using over 40 unique voices. The service has been around since 2016 but it was in 2018 that Amazon launched a plugin to help WordPress users integrate it into their websites.
The plugin itself was the product of a joint effort between Amazon and WP Engine. It works on both websites powered with Amazon Web Services (AWS) and those running on independent web hosts. In either case, you can use Polly to generate audio for your written content and enable users to reproduce it. Plus, it enables you to store the audio versions of your posts on your own server or using Amazon’s Simple Storage Service (S3) at a cost.
On this page, you can find several examples of Amazon Polly speech in different languages:
The examples are quite short, but the English voices in particular are quite decent. It’s about on par with the experience of converting text on your Kindle to speech, which is to be expected considering it likely uses the same technology.
As far as costs go, you’ll need an AWS account to use Polly. However, the service supports up to five million characters per month for free, for up to 12 months. Just to give you an idea, there are seven to eight characters on average for each English word. That works out to over 600,000 words per month for free using Amazon Polly, which is about six times the length of a long-winded novel.
- Generate speech for your text in over 20 languages.
- Choose from over 40 voices for your text.
- Turn on text-to-speech by default for all of your WordPress content.
- Generate a player for the audio versions of your posts and control its placement.
- Store your audio files on your server or using Amazon S3.
- Convert up to five million characters to audio for free per month.
Price: Free and premium tiers available | More Information
Our Experience Using the Amazon Polly Plugin
Installing Amazon Polly on WordPress is remarkably simple. After activating the plugin, simply connect it to your AWS account using an access and secret key:
Once you link the plugin to your AWS account, you can configure which voice it should use by default. Other settings include the playback rate, which controls the speed of the text-to-speech sound, and the position of the player Amazon Polly uses to reproduce it:
There’s even an autoplay option for your Amazon Polly audio files, which we encourage you to keep turned off for the sake of your user’s experience. You can also configure where Amazon Polly will store the audio files for your posts, including the option to save it to your S3 account. If you use Amazon CloudFront, you can also use to distribute your audio to lessen the impact on your servers:
One feature that surprised us was the ability to generate a podcast feed using Amazon Polly, which you can link to an iTunes account. Personally, we don’t think the text-to-speech quality is quite there yet for a high-quality podcast. However, including this option is a step in the right direction:
If you enable Amazon Polly, it will add an audio player to each of your posts. However, you can turn off text-to-speech functionality for posts on a case-by-case basis. Just edit them, and look for the Enable Amazon Polly metabox within:
This widget also enables you to preview how much it’d cost to generate speech for each particular post, which is a nice touch. Now, when visitors access your posts, they’ll be able to click on Amazon Polly’s audio player, sit back, and listen to them leisurely. Overall, the experience of integrating the service with WordPress is remarkably simple thanks to this plugin.
Lifelike voices are the holy grail of text-to-speech software. The problem is, emulating what a real person sounds like is complicated when you have near infinite combinations of words. Even so, text-to-speech software continues to improve, and Amazon Polly offers you a great way to add this feature to your websites and applications.
As far as to how it sounds, Amazon Polly delivers a good text-to-speech experience. Its dedicated WordPress integration is easy to set up, and it’ll cost you very little indeed thanks to AWS’s competitive pricing.
Do you have any questions about adding text-to-speech functionality to your website? Ask away in the comments section below!
Article image thumbnail by vectorEps / shutterstock.com.