How to Convert Text to Speech with JavaScript. The speechSynthesis API

When designing an app, sometimes you need to read text to the user (perhaps reading blog articles while they do something else) or provide a text-to-speech converter, often for accessibility reasons. There are external services that offer fairly realistic results, but most of them are paid.

You might not be aware of it because it’s not widely known yet, but did you know that you can make your browser convert text to speech with JavaScript? And all of this with just a few lines of code and the speechSynthesis API. This process of converting text to speech is commonly known as TTS or text to speech. In this article, I’ll show you how to make your browser speak through various examples.

Keep in mind that this API depends on the voices available in your browser or the user’s operating system. And generally, the result is not as natural or human-like as cloud-based services. But it’s perfect for quick solutions or if you don’t want to invest a lot of money.

The speechSynthesis Interface

Pay attention to the following JavaScript code snippet. You may not believe it, but with just these four lines, you can convert text to speech in JavaScript.

const synth = window.speechSynthesis;
let text = "Hello everybody!!!!"
const utterThis = new SpeechSynthesisUtterance(text);

synth.speak(utterThis);

The only thing we are doing is declaring a variable that contains an instance of SpeechSynthesis, the gateway to the Web Speech API. In this case, for text-to-speech conversion.

Subsequently, we create a variable that contains an instance of SpeechSynthesisUtterance and make the browser speak. The SpeechSynthesisUtterance interface represents a request for the browser to be able to modulate text.

It’s hard to believe, but it seems that browsers have stepped up their game when it comes to accessibility and, in this case, voice modulation. However, I invite you to check compatibility with all browsers because it will only work in the most modern ones.

Properties of SpeechSynthesisUtterance: Customize it to Your Liking

Great! We have already succeeded in making our browser convert text to speech, but it doesn’t sound quite right because it’s not configured to our liking. First, in our case, we need to set the language to Spanish so that our visitors can understand it:

const synth = window.speechSynthesis;
const utterThis = new SpeechSynthesisUtterance('I speak in Spanish');
utterThis.lang = 'es-ES';
synth.speak(utterThis);

If we don’t set a default language, it will automatically use the one specified in the <html lang="en"> tag. If that’s not defined either, it will use the default language of the user’s browser.

We can also change the voice that the browser uses. To do this, we can check the available voices with synth.getVoices(), which will return a list of available voices. Generally, this depends on the browser but also on the operating system. For example, in Chrome, there is only one voice available for each language.

Note that voices may not be available when the window finishes loading, as it’s an asynchronous operation. The best way to detect when voices are loaded is to use the window.speechSynthesis.onvoiceschanged event.

Lastly, we can customize the rate (reading speed) and pitch (tone) of the voice. For rate, we should set a decimal value between 0.1 (lowest; it will read slowly) and 10 (highest), with 1 being the default value. As for the pitch, it’s a decimal value between 0 (lowest; like Barry White) and 2 (highest; almost audible only to felines), with 1 being the default value.

const synth = window.speechSynthesis;
const utterThis = new SpeechSynthesisUtterance('I have a deep voice');
utterThis.pitch = 0.2;
synth.speak(utterThis);

Voice Controls

JavaScript allows us to control whether a voice is currently playing in the browser. For this, the speechSynthesis API exposes four useful methods (especially if the content being read changes).

The speechSynthesis.speak(SpeechSynthesisUtterance) method initiates the process of converting text to voice (so the text must already be set before calling this method). Note that if there is another instance currently converting text to voice, the new one will be queued until the existing ones finish.

The other methods, pause(), resume(), and cancel(), pause, resume (from where it left off when paused), and cancel the reading, respectively. They are useful when, for example, the text changes; you can cancel the reading and start again from the beginning with the updated text. Note that you should not pass the SpeechSynthesisUtterance instance when pausing, resuming, or canceling a reading. You should only pass it when you want to start reading.

const synth = window.speechSynthesis;
const utterThis = new SpeechSynthesisUtterance('Lorem ipsum...');

onTextChange = (newText) => {
    synth.cancel();
    utterThis.text = newText;
    synth.speak(utterThis);
}

Compatible Only with Modern Browsers

As you are well aware, most browsers are doing an excellent job with modern APIs, allowing a wide range of features to be implemented natively that would have been challenging otherwise. However, when you work on the frontend, you know that despite the majority of users using modern browsers, there is always that one user who sticks with Windows XP and Internet Explorer 11.

If a user with an outdated browser tries to use the speechSynthesis API, the code will inevitably fail. That’s why when using modern APIs, it’s always advisable to detect whether the user’s user-agent has the relevant API available and have an alternative path prepared if it’s not.

if ('speechSynthesis' in window) {
  // The browser is modern. All good!
  window.speechSynthesis.speak(
    new SpeechSynthesisUtterance('Let's play')
  )
} else {
    // The browser is old. Alternative path.
    // Maybe show a reduced version or an information popup.
    alert('Please update your browser!');
}

Conclusion

Whether it’s to enable users to automatically read posts while browsing other websites or to make life easier for users by providing accessibility, JavaScript allows for text-to-speech conversion in a simple and straightforward manner.

So, the next time you need to implement an easy text-to-speech system natively, remember window.speechSynthesis. In the next article in this series, we will explore how to convert user voice to text, but until then…

Happy coding!