Chrome Extension Text to Speech: A Developer Guide

Text-to-speech functionality in Chrome extensions transforms written content into spoken audio, opening doors for accessibility tools, language learning applications, and productivity boosters. This guide covers the technical implementation, from Web Speech API integration to custom audio solutions.

Understanding Text-to-Speech Options in Chrome

Chrome provides two primary pathways for text-to-speech: the native Web Speech API and the Chrome ttsEngine API. The Web Speech API offers the quickest implementation path, working directly in content scripts without requiring additional permissions. The ttsEngine API gives you deeper control over speech synthesis, enabling custom voice options and advanced playback control.

The Web Speech API’s SpeechSynthesis interface is available in all modern Chrome versions, making it a reliable choice for basic implementations. For extensions requiring offline capability or premium voice quality, you’ll need to explore third-party TTS services or implement the ttsEngine API.

Building Your First Text-to-Speech Extension

Every Chrome extension begins with a manifest file. For text-to-speech functionality, you’ll need manifest V3 with specific permissions:

{
  "manifest_version": 3,
  "name": "Simple TTS Reader",
  "version": "1.0",
  "permissions": ["activeTab", "scripting"],
  "action": {
    "default_popup": "popup.html"
  },
  "background": {
    "service_worker": "background.js"
  }
}

The popup interface provides the user controls. Create a simple HTML file with play, pause, and stop buttons:

<!DOCTYPE html>
<html>
<head>
  <style>
    body { width: 200px; padding: 10px; font-family: system-ui; }
    button { width: 100%; margin: 5px 0; padding: 8px; }
  </style>
</head>
<body>
  <button id="speak">Read Page</button>
  <button id="stop">Stop</button>
  <script src="popup.js"></script>
</body>
</html>

The popup script handles the core TTS logic using the Web Speech API:

document.getElementById('speak').addEventListener('click', async () => {
  const [tab] = await chrome.tabs.query({ active: true, currentWindow: true });
  
  chrome.scripting.executeScript({
    target: { tabId: tab.id },
    function: () => {
      const text = document.body.innerText;
      const utterance = new SpeechSynthesisUtterance(text);
      utterance.rate = 1.0;
      utterance.pitch = 1.0;
      window.speechSynthesis.speak(utterance);
    }
  });
});

document.getElementById('stop').addEventListener('click', () => {
  chrome.scripting.executeScript({
    target: { tabId: tab.id },
    function: () => window.speechSynthesis.cancel()
  });
});

This basic implementation reads the entire page content. For more refined control, consider extracting specific elements or providing text selection options.

Implementing Selective Text Reading

Power users often want to read specific paragraphs or selected text rather than entire pages. Modify your content script to handle text selection:

chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
  if (request.action === 'speakSelected') {
    const utterance = new SpeechSynthesisUtterance(window.getSelection().toString());
    utterance.onend = () => sendResponse({ status: 'completed' });
    window.speechSynthesis.speak(utterance);
    return true;
  }
});

Add a context menu item in your background script to enable right-click access:

chrome.contextMenus.create({
  id: 'speakSelection',
  title: 'Speak Selected Text',
  contexts: ['selection']
});

chrome.contextMenus.onClicked.addListener((info, tab) => {
  if (info.menuItemId === 'speakSelection') {
    chrome.tabs.sendMessage(tab.id, { action: 'speakSelected' });
  }
});

This pattern allows users to highlight any text and have it read aloud immediately.

Working with Voice Options

The Web Speech API provides access to system voices through speechSynthesis.getVoices(). Different voices support different languages and have varying quality levels:

function populateVoiceList() {
  const voices = window.speechSynthesis.getVoices();
  voices.forEach(voice => {
    console.log(`${voice.name} (${voice.lang}) - ${voice.default ? 'default' : ''}`);
  });
}

window.speechSynthesis.onvoiceschanged = populateVoiceList;

For a production extension, build a voice selector that lets users choose from available options. Store the preference in chrome.storage for persistence across sessions:

chrome.storage.sync.get(['selectedVoice'], (result) => {
  const voices = window.speechSynthesis.getVoices();
  const selectedVoice = voices.find(v => v.name === result.selectedVoice) || voices[0];
  
  const utterance = new SpeechSynthesisUtterance(text);
  utterance.voice = selectedVoice;
  window.speechSynthesis.speak(utterance);
});

Advanced: Custom Audio Generation

When the Web Speech API falls short—due to voice quality, offline requirements, or specific formatting needs—consider integrating external TTS services. Popular options include Google Cloud Text-to-Speech, Amazon Polly, and open-source solutions like Coqui TTS.

A practical approach uses a background worker to fetch audio and deliver it to content scripts:

// background.js
chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
  if (request.action === 'fetchAudio') {
    fetch('https://your-tts-api.com/synthesize', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ text: request.text, voice: request.voice })
    })
    .then(response => response.arrayBuffer())
    .then(audioData => {
      chrome.tabs.sendMessage(sender.tab.id, { 
        action: 'playAudio', 
        audio: audioData 
      });
    });
  }
  return true;
});

This hybrid approach combines the simplicity of Web Speech API for basic needs with the power of external services for advanced requirements.

Testing and Debugging

Chrome provides useful debugging tools for TTS extensions. Access the console in your extension’s popup or background page to monitor speech synthesis events. The Web Speech API fires events for state changes:

utterance.onstart = () => console.log('Speech started');
utterance.onend = () => console.log('Speech finished');
utterance.onerror = (e) => console.error('Speech error:', e.error);

Test across different Chrome versions and platforms, as voice availability varies significantly between operating systems.

Performance Considerations

Text-to-speech can impact page performance if not managed carefully. Always cancel previous utterances before starting new ones:

window.speechSynthesis.cancel(); // Clear queue before new speech
window.speechSynthesis.speak(utterance);

For long texts, consider splitting content into chunks to prevent memory issues and provide better user control over playback.

Building a Chrome extension with text-to-speech functionality combines straightforward Web Speech API usage with the architectural flexibility Chrome extensions provide. Start with basic page reading, then expand based on user feedback—whether that means adding voice selection, offline support, or integration with external TTS services.

Built by theluckystrike — More at zovo.one