Chrome Text-to-Speech API Complete Guide for Extension Developers

19 min read

Chrome Text-to-Speech API Complete Guide for Extension Developers

Chrome Text-to-Speech API Complete Guide for Extension Developers

The Chrome Text-to-Speech API (TTS API) is one of the most powerful yet underutilized APIs available to Chrome extension developers. This comprehensive guide will walk you through everything you need to know to implement text-to-speech functionality in your extensions, from basic usage to advanced voice control and event handling.

Whether you’re building an accessibility-focused extension, a language learning tool, or a productivity application that reads content aloud, the Chrome TTS API provides the foundation you need. This tutorial covers all aspects of the API, including voice selection, rate and pitch control, error handling, and best practices for creating seamless voice experiences.


Understanding the Chrome TTS API

The Chrome Text-to-Speech API, accessible through the chrome.tts namespace, allows extensions to synthesize spoken audio from text. This API leverages the operating system’s speech synthesis capabilities, providing a unified interface regardless of the underlying platform. The API is available in all modern Chrome versions and works seamlessly across Windows, macOS, Linux, and Chrome OS.

The TTS API is particularly valuable for extensions that need to provide auditory feedback, read web content aloud, assist users with visual impairments, or offer multilingual support. Unlike traditional audio playback, speech synthesis generates audio dynamically from text input, making it flexible for any text-based content.

Key Capabilities of the Chrome TTS API

The Chrome TTS API offers a comprehensive set of features that make it suitable for various use cases:

  • Text-to-Speech Synthesis: Convert any text string into spoken audio
  • Voice Selection: Choose from multiple available voices across different languages
  • Parameter Control: Adjust speech rate, pitch, and volume
  • Event Handling: Monitor speech progress, completion, and errors
  • Enqueue Management: Queue multiple utterances for sequential playback
  • Pause and Resume: Control playback with pause and resume functions

Required Permissions

To use the Chrome TTS API in your extension, you don’t need to add any special permissions to your manifest.json file. The API is available by default to all Chrome extensions. However, you should consider requesting other permissions depending on the specific functionality of your extension, such as “tabs” or “activeTab” if your extension needs to read content from web pages.


Getting Started with Basic TTS Implementation

The simplest way to use the Chrome TTS API is through the chrome.tts.speak() method. This method takes the text you want to speak and optionally a callback function to handle completion.

Basic Syntax

chrome.tts.speak(
  textToSpeak,
  options,
  callback
);

The textToSpeak parameter is a string containing the text you want to be spoken. The options parameter is an optional object that allows you to configure various aspects of the speech synthesis. The callback function is called when the speak operation completes.

A Simple Example

Here’s a basic example that speaks a simple message when a user clicks a browser action button:

// background.js
chrome.action.onClicked.addListener(function(tab) {
  chrome.tts.speak('Hello! Welcome to my Chrome extension.');
});

This example demonstrates the core functionality, but real-world extensions typically need more control over the speech output.

Understanding the Options Object

The options object provides fine-grained control over how the text is spoken:

chrome.tts.speak('This is a test message', {
  rate: 1.0,        // Speech rate (0.1 to 10.0, default 1.0)
  pitch: 1.0,      // Voice pitch (0.0 to 2.0, default 1.0)
  volume: 1.0,     // Speech volume (0.0 to 1.0, default 1.0)
  voiceName: 'Google US English',  // Specific voice to use
  lang: 'en-US',   // Language code
  onEvent: function(event) {
    console.log('TTS event:', event.type);
  }
}, function() {
  if (chrome.runtime.lastError) {
    console.error('TTS error:', chrome.runtime.lastError.message);
  }
});

Each option serves a specific purpose in customizing the speech output. Understanding these parameters is essential for creating a polished voice experience.


Working with Voices

One of the most powerful features of the Chrome TTS API is the ability to choose from multiple voices. Different voices support different languages and have distinct characteristics.

Listing Available Voices

To see what voices are available in the user’s browser, use the chrome.tts.getVoices() method:

function getAvailableVoices() {
  const voices = chrome.tts.getVoices();
  
  voices.forEach(function(voice) {
    console.log('Voice:', voice.name);
    console.log('  Lang:', voice.lang);
    console.log('  Gender:', voice.gender);
    console.log('  Extension ID:', voice.extensionId);
  });
}

// Call the function
getAvailableVoices();

The getVoices() method returns an array of TtsVoice objects, each containing properties like name, lang, gender, and optionally extensionId for voices provided by extensions.

Selecting a Specific Voice

Once you know which voices are available, you can select a specific voice by name:

chrome.tts.speak('Speaking with a specific voice', {
  voiceName: 'Google UK English Female'
}, callback);

It’s important to note that voice names vary across platforms and installations. You should always provide fallback options and handle cases where the requested voice isn’t available.

Voice Selection Best Practices

When implementing voice selection in your extension, consider these best practices:

  • Always provide a default: If the specified voice isn’t available, the API will use a default voice
  • Match language first: Select voices based on language code before considering specific voice names
  • User preferences: Allow users to choose their preferred voice in your extension settings
  • Test across platforms: Voice availability varies significantly between operating systems

Controlling Speech Parameters

The Chrome TTS API provides three main parameters for controlling how text is spoken: rate, pitch, and volume. Understanding these parameters allows you to create natural-sounding speech output.

Speech Rate

The rate parameter controls how fast the text is spoken. The default rate is 1.0, which represents normal speaking speed. Values can range from 0.1 (very slow) to 10.0 (extremely fast):

// Slow, deliberate speech
chrome.tts.speak('This is spoken slowly', {
  rate: 0.5
});

// Fast speech
chrome.tts.speak('This is spoken quickly', {
  rate: 2.0
});

Different voices may interpret rate values differently. Some voices might not support extreme rate values, so testing is essential.

Voice Pitch

The pitch parameter adjusts the pitch of the spoken voice. The default pitch is 1.0:

// Higher pitch
chrome.tts.speak('Speaking with higher pitch', {
  pitch: 1.5
});

// Lower pitch
chrome.tts.speak('Speaking with lower pitch', {
  pitch: 0.5
});

Pitch adjustment is useful for creating distinct voices or emphasizing certain types of content. However, extreme pitch values can make speech sound unnatural.

Volume Control

The volume parameter controls the output volume. The default is 1.0 (maximum volume):

chrome.tts.speak('Speaking at reduced volume', {
  volume: 0.5
});

Note that volume control depends on the audio output device and may not work identically on all platforms.


Handling TTS Events

The Chrome TTS API provides comprehensive event handling that allows you to monitor and respond to speech synthesis events. This is crucial for building responsive extensions that need to coordinate speech with other actions.

Event Types

The API supports several event types:

  • start: Fired when speech synthesis begins
  • word: Fired when a word is spoken (includes character position)
  • sentence: Fired when a sentence is completed
  • marker: Fired when an SSML marker is reached
  • end: Fired when speech synthesis completes
  • interrupted: Fired when speech is interrupted
  • canceled: Fired when speech is canceled
  • error: Fired when an error occurs

Implementing Event Handlers

You can handle events through the onEvent option:

chrome.tts.speak('This is a longer text that will take some time to speak', {
  onEvent: function(event) {
    if (event.type === 'start') {
      console.log('Speech started');
    } else if (event.type === 'word') {
      console.log('Word spoken:', event.charIndex, event.charLength);
    } else if (event.type === 'end') {
      console.log('Speech completed');
    } else if (event.type === 'error') {
      console.error('TTS Error:', event.errorMessage);
    }
  }
});

Practical Event Handling Example

Here’s a more practical example that uses events to synchronize speech with visual feedback:

function speakWithProgress(text, onWord, onComplete) {
  chrome.tts.speak(text, {
    onEvent: function(event) {
      if (event.type === 'word' && onWord) {
        // Highlight the word being spoken
        onWord(event.charIndex, event.charLength);
      } else if (event.type === 'end' && onComplete) {
        onComplete();
      }
    }
  });
}

// Usage
speakWithProgress(
  'The quick brown fox jumps over the lazy dog',
  function(wordIndex, wordLength) {
    console.log('Currently speaking word at position:', wordIndex);
  },
  function() {
    console.log('Finished speaking');
  }
);

Managing Speech Queue

The Chrome TTS API automatically queues multiple speak requests, allowing you to queue several messages without waiting for each to complete.

Understanding the Queue

When you call speak() while another utterance is in progress, the new utterance is added to the queue:

chrome.tts.speak('First message');
chrome.tts.speak('Second message');
chrome.tts.speak('Third message');

These messages will be spoken sequentially in the order they were queued.

Controlling Queue Behavior

You can control queue behavior using the queueName parameter:

// Using different queues
chrome.tts.speak('Message A', { queueName: 'queue1' });
chrome.tts.speak('Message B', { queueName: 'queue2' });
// These two queues play simultaneously (not recommended)

// Clearing queue before speaking
chrome.tts.speak('New message', { enqueue: false });
// This replaces any queued messages

The enqueue option (when set to false) clears the queue before speaking the new text, which is useful for urgent announcements.


Pausing, Resuming, and Stopping

The Chrome TTS API provides methods for controlling playback after speech has started.

Stopping Speech

To stop all speech immediately:

chrome.tts.stop();

This clears the queue and stops any current speech immediately.

Pausing and Resuming

Pause and resume functionality allows for temporary interruption:

chrome.tts.pause();

// Later...
chrome.tts.resume();

Not all platforms support pause and resume. You should check availability and provide alternative controls if needed.

Checking State

You can check the current TTS state:

chrome.tts.isSpeaking(function(speaking) {
  if (speaking) {
    console.log('Currently speaking');
  } else {
    console.log('Not speaking');
  }
});

Advanced SSML Support

Chrome’s TTS API supports SSML (Speech Synthesis Markup Language), which provides fine-grained control over pronunciation, emphasis, and timing.

Using SSML Tags

const ssmlText = `
<speak>
  This is <emphasis level="moderate">important</emphasis>.
  The price is <say-as interpret-as="currency" format="USD">99.99</say-as>.
  <break time="500ms"/> Take a short pause here.
</speak>
`;

chrome.tts.speak(ssmlText, { ssmlMode: 'annotate' });

Common SSML tags include:

  • <speak>: Root element
  • <break>: Insert pauses
  • <emphasis>: Add emphasis
  • <say-as>: Control pronunciation
  • <phoneme>: Specify phonetic pronunciation
  • <prosody>: Control rate, pitch, and volume

SSML Modes

The ssmlMode option controls how SSML is processed:

  • none: No SSML processing (default)
  • fragment: Allow SSML fragments
  • annotate: Include word boundaries as events

Building a Complete TTS Extension Example

Here’s a practical example of building a simple text-to-speech extension:

manifest.json

{
  "manifest_version": 3,
  "name": "Simple Text Reader",
  "version": "1.0",
  "permissions": ["activeTab", "scripting"],
  "action": {
    "default_popup": "popup.html",
    "default_icon": "icon.png"
  }
}

popup.html

<!DOCTYPE html>
<html>
<head>
  <style>
    body { width: 300px; padding: 20px; font-family: Arial, sans-serif; }
    textarea { width: 100%; height: 100px; margin-bottom: 10px; }
    button { padding: 10px 20px; margin-right: 5px; cursor: pointer; }
    select { padding: 5px; margin-bottom: 10px; width: 100%; }
  </style>
</head>
<body>
  <h3>Text Reader</h3>
  <select id="voiceSelect"></select>
  <textarea id="textInput" placeholder="Enter text to speak..."></textarea>
  <button id="speakBtn">Speak</button>
  <button id="stopBtn">Stop</button>
  <script src="popup.js"></script>
</body>
</html>

popup.js

document.addEventListener('DOMContentLoaded', function() {
  const voiceSelect = document.getElementById('voiceSelect');
  const textInput = document.getElementById('textInput');
  const speakBtn = document.getElementById('speakBtn');
  const stopBtn = document.getElementById('stopBtn');
  
  // Load available voices
  function loadVoices() {
    const voices = chrome.tts.getVoices();
    voiceSelect.innerHTML = '';
    
    voices.forEach(function(voice) {
      const option = document.createElement('option');
      option.textContent = voice.name + ' (' + voice.lang + ')';
      option.setAttribute('data-lang', voice.lang);
      option.setAttribute('data-name', voice.name);
      voiceSelect.appendChild(option);
    });
  }
  
  loadVoices();
  chrome.tts.onVoicesChanged.addListener(loadVoices);
  
  // Speak button
  speakBtn.addEventListener('click', function() {
    const text = textInput.value;
    if (!text) return;
    
    const selectedVoice = voiceSelect.selectedOptions[0];
    const options = {
      rate: 1.0,
      pitch: 1.0,
      volume: 1.0
    };
    
    if (selectedVoice) {
      options.voiceName = selectedVoice.getAttribute('data-name');
    }
    
    chrome.tts.speak(text, options);
  });
  
  // Stop button
  stopBtn.addEventListener('click', function() {
    chrome.tts.stop();
  });
});

This example demonstrates a functional text-to-speech extension with voice selection, basic controls, and proper event handling.


Best Practices and Common Pitfalls

When implementing the Chrome TTS API in your extensions, keep these best practices in mind:

Performance Considerations

  • Minimize speech synthesis calls: Queue multiple sentences together rather than making separate calls
  • Handle errors gracefully: Always include error handling in your implementation
  • Clean up resources: Use chrome.tts.stop() when your extension is closed or no longer needs speech

User Experience

  • Provide visual feedback: Show users when speech is active
  • Respect user preferences: Remember the user’s chosen voice and settings
  • Offer controls: Allow users to pause, resume, and stop speech
  • Test with screen readers: Ensure your TTS implementation doesn’t conflict with assistive technologies

Cross-Browser Compatibility

  • Test across platforms: Voice availability varies significantly
  • Provide fallbacks: Have a default voice if the preferred voice isn’t available
  • Handle missing features: Some platforms don’t support pause/resume; provide alternatives

Accessibility

  • Don’t rely solely on audio: Always provide visual alternatives
  • Consider cognitive accessibility: Allow users to adjust speed for easier comprehension
  • Support multiple languages: Use the lang parameter appropriately for multilingual content

Troubleshooting Common Issues

Here are solutions to common problems you might encounter:

No Voices Available

If getVoices() returns an empty array, the voices might not have loaded yet. Try waiting for the onVoicesChanged event:

chrome.tts.onVoicesChanged.addListener(function() {
  const voices = chrome.tts.getVoices();
  console.log('Voices loaded:', voices.length);
});

Speech Not Working

If speech isn’t working, check for errors:

chrome.tts.speak(text, function() {
  if (chrome.runtime.lastError) {
    console.error('Error:', chrome.runtime.lastError.message);
  }
});

Intermittent Behavior

Some platforms have issues with rapid speak calls. Implement a debounce:

let speakTimeout;
function speakDebounced(text) {
  clearTimeout(speakTimeout);
  chrome.tts.stop();
  speakTimeout = setTimeout(function() {
    chrome.tts.speak(text);
  }, 100);
}

Conclusion

The Chrome Text-to-Speech API is a powerful tool that enables developers to create accessible, feature-rich extensions with voice capabilities. From simple text reading to complex SSML-based speech synthesis, this API provides the flexibility needed for various use cases.

Remember to test thoroughly across platforms, handle errors gracefully, and always prioritize user experience. With the techniques and best practices covered in this guide, you’re well-equipped to implement professional-grade text-to-speech functionality in your Chrome extensions.

Start experimenting with the Chrome TTS API today, and discover how voice synthesis can enhance your extension’s accessibility and user experience. The possibilities are virtually endless, from language learning tools to accessibility aids to innovative productivity applications.


Part of the Chrome Extension Guide by theluckystrike. Built at zovo.one.

No previous article
No next article