Chrome Text-to-Speech API Complete Guide for Extension Developers

January 17, 2025 · 18 min read · Chrome Extensions, API Guide

Chrome Text-to-Speech API Complete Guide for Extension Developers

The Chrome Text-to-Speech API (TTS API) is one of the most powerful yet underutilized APIs available to Chrome extension developers. This comprehensive guide will walk you through everything you need to know to implement text-to-speech functionality in your extensions, from basic usage to advanced voice control and event handling.

Whether you’re building an accessibility-focused extension, a language learning tool, or a productivity application that reads content aloud, the Chrome TTS API provides the foundation you need. This tutorial covers all aspects of the API, including voice selection, rate and pitch control, error handling, and best practices for creating seamless voice experiences.

Understanding the Chrome TTS API

The Chrome Text-to-Speech API, accessible through the chrome.tts namespace, allows extensions to synthesize spoken audio from text. This API leverages the operating system’s speech synthesis capabilities, providing a unified interface regardless of the underlying platform. The API is available in all modern Chrome versions and works seamlessly across Windows, macOS, Linux, and Chrome OS.

The TTS API is particularly valuable for extensions that need to provide auditory feedback, read web content aloud, assist users with visual impairments, or offer multilingual support. Unlike traditional audio playback, speech synthesis generates audio dynamically from text input, making it flexible for any text-based content.

Key Capabilities of the Chrome TTS API

The Chrome TTS API offers a comprehensive set of features that make it suitable for various use cases:

Text-to-Speech Synthesis: Convert any text string into spoken audio
Voice Selection: Choose from multiple available voices across different languages
Parameter Control: Adjust speech rate, pitch, and volume
Event Handling: Monitor speech progress, completion, and errors
Enqueue Management: Queue multiple utterances for sequential playback
Pause and Resume: Control playback with pause and resume functions

Required Permissions

To use the Chrome TTS API in your extension, you don’t need to add any special permissions to your manifest.json file. The API is available by default to all Chrome extensions. However, you should consider requesting other permissions depending on the specific functionality of your extension, such as “tabs” or “activeTab” if your extension needs to read content from web pages.

Getting Started with Basic TTS Implementation

The simplest way to use the Chrome TTS API is through the chrome.tts.speak() method. This method takes the text you want to speak and optionally a callback function to handle completion.

Basic Syntax

chrome.tts.speak(
  textToSpeak,
  options,
  callback
);

The textToSpeak parameter is a string containing the text you want to be spoken. The options parameter is an optional object that allows you to configure various aspects of the speech synthesis. The callback function is called when the speak operation completes.

A Simple Example

Here’s a basic example that speaks a simple message when a user clicks a browser action button:

// background.js
chrome.action.onClicked.addListener(function(tab) {
  chrome.tts.speak('Hello! Welcome to my Chrome extension.');
});

This example demonstrates the core functionality, but real-world extensions typically need more control over the speech output.

Understanding the Options Object

The options object provides fine-grained control over how the text is spoken:

chrome.tts.speak('This is a test message', {
  rate: 1.0,        // Speech rate (0.1 to 10.0, default 1.0)
  pitch: 1.0,      // Voice pitch (0.0 to 2.0, default 1.0)
  volume: 1.0,     // Speech volume (0.0 to 1.0, default 1.0)
  voiceName: 'Google US English',  // Specific voice to use
  lang: 'en-US',   // Language code
  onEvent: function(event) {
    console.log('TTS event:', event.type);
  }
}, function() {
  if (chrome.runtime.lastError) {
    console.error('TTS error:', chrome.runtime.lastError.message);
  }
});

Each option serves a specific purpose in customizing the speech output. Understanding these parameters is essential for creating a polished voice experience.

Working with Voices

One of the most powerful features of the Chrome TTS API is the ability to choose from multiple voices. Different voices support different languages and have distinct characteristics.

Listing Available Voices

To see what voices are available in the user’s browser, use the chrome.tts.getVoices() method:

function getAvailableVoices() {
  const voices = chrome.tts.getVoices();
  
  voices.forEach(function(voice) {
    console.log('Voice:', voice.name);
    console.log('  Lang:', voice.lang);
    console.log('  Gender:', voice.gender);
    console.log('  Extension ID:', voice.extensionId);
  });
}

// Call the function
getAvailableVoices();

The getVoices() method returns an array of TtsVoice objects, each containing properties like name, lang, gender, and optionally extensionId for voices provided by extensions.

Selecting a Specific Voice

Once you know which voices are available, you can select a specific voice by name:

chrome.tts.speak('Speaking with a specific voice', {
  voiceName: 'Google UK English Female'
}, callback);

It’s important to note that voice names vary across platforms and installations. You should always provide fallback options and handle cases where the requested voice isn’t available.

Voice Selection Best Practices

When implementing voice selection in your extension, consider these best practices:

Always provide a default: If the specified voice isn’t available, the API will use a default voice
Match language first: Select voices based on language code before considering specific voice names
User preferences: Allow users to choose their preferred voice in your extension settings
Test across platforms: Voice availability varies significantly between operating systems

Controlling Speech Parameters

The Chrome TTS API provides three main parameters for controlling how text is spoken: rate, pitch, and volume. Understanding these parameters allows you to create natural-sounding speech output.

Speech Rate

The rate parameter controls how fast the text is spoken. The default rate is 1.0, which represents normal speaking speed. Values can range from 0.1 (very slow) to 10.0 (extremely fast):

// Slow, deliberate speech
chrome.tts.speak('This is spoken slowly', {
  rate: 0.5
});

// Fast speech
chrome.tts.speak('This is spoken quickly', {
  rate: 2.0
});

Different voices may interpret rate values differently. Some voices might not support extreme rate values, so testing is essential.

Voice Pitch

The pitch parameter adjusts the pitch of the spoken voice. The default pitch is 1.0:

// Higher pitch
chrome.tts.speak('Speaking with higher pitch', {
  pitch: 1.5
});

// Lower pitch
chrome.tts.speak('Speaking with lower pitch', {
  pitch: 0.5
});

Pitch adjustment is useful for creating distinct voices or emphasizing certain types of content. However, extreme pitch values can make speech sound unnatural.

Volume Control

The volume parameter controls the output volume. The default is 1.0 (maximum volume):

chrome.tts.speak('Speaking at reduced volume', {
  volume: 0.5
});

Note that volume control depends on the audio output device and may not work identically on all platforms.

Handling TTS Events

The Chrome TTS API provides comprehensive event handling that allows you to monitor and respond to speech synthesis events. This is crucial for building responsive extensions that need to coordinate speech with other actions.

Event Types

The API supports several event types:

start: Fired when speech synthesis begins
word: Fired when a word is spoken (includes character position)
sentence: Fired when a sentence is completed
marker: Fired when an SSML marker is reached
end: Fired when speech synthesis completes
interrupted: Fired when speech is interrupted
canceled: Fired when speech is canceled
error: Fired when an error occurs

Implementing Event Handlers

You can handle events through the onEvent option:

chrome.tts.speak('This is a longer text that will take some time to speak', {
  onEvent: function(event) {
    if (event.type === 'start') {
      console.log('Speech started');
    } else if (event.type === 'word') {
      console.log('Word spoken:', event.charIndex, event.charLength);
    } else if (event.type === 'end') {
      console.log('Speech completed');
    } else if (event.type === 'error') {
      console.error('TTS Error:', event.errorMessage);
    }
  }
});

Practical Event Handling Example

Here’s a more practical example that uses events to synchronize speech with visual feedback:

function speakWithProgress(text, onWord, onComplete) {
  chrome.tts.speak(text, {
    onEvent: function(event) {
      if (event.type === 'word' && onWord) {
        // Highlight the word being spoken
        onWord(event.charIndex, event.charLength);
      } else if (event.type === 'end' && onComplete) {
        onComplete();
      }
    }
  });
}

// Usage
speakWithProgress(
  'The quick brown fox jumps over the lazy dog',
  function(wordIndex, wordLength) {
    console.log('Currently speaking word at position:', wordIndex);
  },
  function() {
    console.log('Finished speaking');
  }
);

Managing Speech Queue

The Chrome TTS API automatically queues multiple speak requests, allowing you to queue several messages without waiting for each to complete.

Understanding the Queue

When you call speak() while another utterance is in progress, the new utterance is added to the queue:

chrome.tts.speak('First message');
chrome.tts.speak('Second message');
chrome.tts.speak('Third message');

These messages will be spoken sequentially in the order they were queued.

Controlling Queue Behavior

You can control queue behavior using the queueName parameter:

// Using different queues
chrome.tts.speak('Message A', { queueName: 'queue1' });
chrome.tts.speak('Message B', { queueName: 'queue2' });
// These two queues play simultaneously (not recommended)

// Clearing queue before speaking
chrome.tts.speak('New message', { enqueue: false });
// This replaces any queued messages

The enqueue option (when set to false) clears the queue before speaking the new text, which is useful for urgent announcements.

Pausing, Resuming, and Stopping

The Chrome TTS API provides methods for controlling playback after speech has started.

Stopping Speech

To stop all speech immediately:

chrome.tts.stop();

This clears the queue and stops any current speech immediately.

Pausing and Resuming

Pause and resume functionality allows for temporary interruption:

chrome.tts.pause();

// Later...
chrome.tts.resume();

Not all platforms support pause and resume. You should check availability and provide alternative controls if needed.

Checking State

You can check the current TTS state:

chrome.tts.isSpeaking(function(speaking) {
  if (speaking) {
    console.log('Currently speaking');
  } else {
    console.log('Not speaking');
  }
});

Advanced SSML Support

Chrome’s TTS API supports SSML (Speech Synthesis Markup Language), which provides fine-grained control over pronunciation, emphasis, and timing.

Using SSML Tags

const ssmlText = `
<speak>
  This is <emphasis level="moderate">important</emphasis>.
  The price is <say-as interpret-as="currency" format="USD">99.99</say-as>.
  <break time="500ms"/> Take a short pause here.
</speak>
`;

chrome.tts.speak(ssmlText, { ssmlMode: 'annotate' });

Common SSML tags include:

<speak>: Root element
<break>: Insert pauses
<emphasis>: Add emphasis
<say-as>: Control pronunciation
<phoneme>: Specify phonetic pronunciation
<prosody>: Control rate, pitch, and volume

SSML Modes

The ssmlMode option controls how SSML is processed:

none: No SSML processing (default)
fragment: Allow SSML fragments
annotate: Include word boundaries as events

Building a Complete TTS Extension Example

Here’s a practical example of building a simple text-to-speech extension:

manifest.json

{
  "manifest_version": 3,
  "name": "Simple Text Reader",
  "version": "1.0",
  "permissions": ["activeTab", "scripting"],
  "action": {
    "default_popup": "popup.html",
    "default_icon": "icon.png"
  }
}

popup.html

<!DOCTYPE html>
<html>
<head>
  <style>
    body { width: 300px; padding: 20px; font-family: Arial, sans-serif; }
    textarea { width: 100%; height: 100px; margin-bottom: 10px; }
    button { padding: 10px 20px; margin-right: 5px; cursor: pointer; }
    select { padding: 5px; margin-bottom: 10px; width: 100%; }
  </style>
</head>
<body>
  <h3>Text Reader</h3>
  <select id="voiceSelect"></select>
  <textarea id="textInput" placeholder="Enter text to speak..."></textarea>
  <button id="speakBtn">Speak</button>
  <button id="stopBtn">Stop</button>
  <script src="popup.js"></script>
</body>
</html>

popup.js

document.addEventListener('DOMContentLoaded', function() {
  const voiceSelect = document.getElementById('voiceSelect');
  const textInput = document.getElementById('textInput');
  const speakBtn = document.getElementById('speakBtn');
  const stopBtn = document.getElementById('stopBtn');
  
  // Load available voices
  function loadVoices() {
    const voices = chrome.tts.getVoices();
    voiceSelect.innerHTML = '';
    
    voices.forEach(function(voice) {
      const option = document.createElement('option');
      option.textContent = voice.name + ' (' + voice.lang + ')';
      option.setAttribute('data-lang', voice.lang);
      option.setAttribute('data-name', voice.name);
      voiceSelect.appendChild(option);
    });
  }
  
  loadVoices();
  chrome.tts.onVoicesChanged.addListener(loadVoices);
  
  // Speak button
  speakBtn.addEventListener('click', function() {
    const text = textInput.value;
    if (!text) return;
    
    const selectedVoice = voiceSelect.selectedOptions[0];
    const options = {
      rate: 1.0,
      pitch: 1.0,
      volume: 1.0
    };
    
    if (selectedVoice) {
      options.voiceName = selectedVoice.getAttribute('data-name');
    }
    
    chrome.tts.speak(text, options);
  });
  
  // Stop button
  stopBtn.addEventListener('click', function() {
    chrome.tts.stop();
  });
});

This example demonstrates a functional text-to-speech extension with voice selection, basic controls, and proper event handling.

Best Practices and Common Pitfalls

When implementing the Chrome TTS API in your extensions, keep these best practices in mind:

Performance Considerations

Minimize speech synthesis calls: Queue multiple sentences together rather than making separate calls
Handle errors gracefully: Always include error handling in your implementation
Clean up resources: Use chrome.tts.stop() when your extension is closed or no longer needs speech

User Experience

Provide visual feedback: Show users when speech is active
Respect user preferences: Remember the user’s chosen voice and settings
Offer controls: Allow users to pause, resume, and stop speech
Test with screen readers: Ensure your TTS implementation doesn’t conflict with assistive technologies

Cross-Browser Compatibility

Test across platforms: Voice availability varies significantly
Provide fallbacks: Have a default voice if the preferred voice isn’t available
Handle missing features: Some platforms don’t support pause/resume; provide alternatives

Accessibility

Don’t rely solely on audio: Always provide visual alternatives
Consider cognitive accessibility: Allow users to adjust speed for easier comprehension
Support multiple languages: Use the lang parameter appropriately for multilingual content

Troubleshooting Common Issues

Here are solutions to common problems you might encounter:

No Voices Available

If getVoices() returns an empty array, the voices might not have loaded yet. Try waiting for the onVoicesChanged event:

chrome.tts.onVoicesChanged.addListener(function() {
  const voices = chrome.tts.getVoices();
  console.log('Voices loaded:', voices.length);
});

Speech Not Working

If speech isn’t working, check for errors:

chrome.tts.speak(text, function() {
  if (chrome.runtime.lastError) {
    console.error('Error:', chrome.runtime.lastError.message);
  }
});

Intermittent Behavior

Some platforms have issues with rapid speak calls. Implement a debounce:

let speakTimeout;
function speakDebounced(text) {
  clearTimeout(speakTimeout);
  chrome.tts.stop();
  speakTimeout = setTimeout(function() {
    chrome.tts.speak(text);
  }, 100);
}

Conclusion

The Chrome Text-to-Speech API is a powerful tool that enables developers to create accessible, feature-rich extensions with voice capabilities. From simple text reading to complex SSML-based speech synthesis, this API provides the flexibility needed for various use cases.

Remember to test thoroughly across platforms, handle errors gracefully, and always prioritize user experience. With the techniques and best practices covered in this guide, you’re well-equipped to implement professional-grade text-to-speech functionality in your Chrome extensions.

Start experimenting with the Chrome TTS API today, and discover how voice synthesis can enhance your extension’s accessibility and user experience. The possibilities are virtually endless, from language learning tools to accessibility aids to innovative productivity applications.

Chrome Extension Accessibility (A11y) Guide - Build accessible extensions following best practices
Speech Recognition and Voice Commands - Implement voice input in your extensions
WebRTC Screen Sharing in Chrome Extensions - Combine with WebRTC for video communication

Part of the Chrome Extension Guide by theluckystrike. Built at zovo.one.

Chrome Text-to-Speech API Complete Guide for Extension Developers

Chrome Text-to-Speech API Complete Guide for Extension Developers

Understanding the Chrome TTS API

Key Capabilities of the Chrome TTS API

Required Permissions

Getting Started with Basic TTS Implementation

Basic Syntax

A Simple Example

Understanding the Options Object

Working with Voices

Listing Available Voices

Selecting a Specific Voice

Voice Selection Best Practices

Controlling Speech Parameters

Speech Rate

Voice Pitch

Volume Control

Handling TTS Events

Event Types

Implementing Event Handlers

Practical Event Handling Example

Managing Speech Queue

Understanding the Queue

Controlling Queue Behavior

Pausing, Resuming, and Stopping

Stopping Speech

Pausing and Resuming

Checking State

Advanced SSML Support

Using SSML Tags

SSML Modes

Building a Complete TTS Extension Example

manifest.json

popup.html

popup.js

Best Practices and Common Pitfalls

Performance Considerations

User Experience

Cross-Browser Compatibility

Accessibility

Troubleshooting Common Issues

No Voices Available

Speech Not Working

Intermittent Behavior

Conclusion

Related Articles