AI YouTube Summary Chrome Extension: Developer Guide

Building an AI-powered YouTube summary extension for Chrome unlocks powerful capabilities for consuming video content efficiently. This guide walks you through the technical implementation, API integrations, and practical patterns for creating a robust summary tool.

Why Build a YouTube Summary Extension

YouTube hosts millions of hours of content daily. Developers, researchers, and power users often need to extract key information quickly without watching entire videos. An AI-powered summary extension solves this by automatically generating concise summaries of video content.

The technical challenge lies in extracting reliable video data and processing it through AI services while maintaining performance and respecting YouTube’s structure.

Extension Architecture

A YouTube summary extension operates across multiple components:

Content script — Injected into YouTube pages to extract video metadata and transcript data
Background service worker — Handles API communication, caching, and state management
Popup interface — Provides users with summary controls and settings
AI processing layer — Communicates with language models to generate summaries

The critical first step is extracting the video transcript, which serves as the primary input for AI summarization.

Extracting YouTube Transcripts

YouTube provides captions for most videos. The challenge is accessing them programmatically. Here’s a reliable approach using the content script:

// content.js - Extract transcript from YouTube video
class TranscriptExtractor {
  constructor() {
    this.videoId = this.getVideoId();
  }

  getVideoId() {
    const urlParams = new URLSearchParams(window.location.search);
    return urlParams.get('v');
  }

  async findTranscriptButton() {
    // Wait for the transcript button to appear
    await new Promise(resolve => {
      const observer = new MutationObserver(() => {
        const button = document.querySelector('button[aria-label="Show transcript"]');
        if (button) {
          observer.disconnect();
          resolve(button);
        }
      });
      observer.observe(document.body, { childList: true, subtree: true });
      setTimeout(() => observer.disconnect(), 10000);
    });
  }

  async extractTranscriptSegments() {
    // Click transcript button if available
    const transcriptButton = document.querySelector('button[aria-label="Show transcript"]');
    if (transcriptButton) {
      transcriptButton.click();
      await new Promise(r => setTimeout(r, 1000));
    }

    // Extract transcript segments
    const segments = [];
    const transcriptPanel = document.querySelector('#transcript-panel');
    
    if (!transcriptPanel) {
      throw new Error('Transcript not available for this video');
    }

    const transcriptLines = transcriptPanel.querySelectorAll('ytd-transcript-segment-renderer');
    
    transcriptLines.forEach(line => {
      const timestamp = line.querySelector('.timestamp')?.textContent;
      const text = line.querySelector('.segment-text')?.textContent;
      
      if (timestamp && text) {
        segments.push({ timestamp: timestamp.trim(), text: text.trim() });
      }
    });

    return segments;
  }

  getFullTranscript() {
    return this.extractTranscriptSegments()
      .then(segments => segments.map(s => s.text).join(' '))
      .catch(err => {
        console.error('Transcript extraction failed:', err);
        return null;
      });
  }
}

This approach clicks the transcript button programmatically and scrapes the resulting panel. Note that this method works on videos where YouTube provides automatic or community captions.

Connecting to AI Services

Once you have the transcript, send it to an AI service for summarization. Here’s a pattern for communicating with OpenAI’s API:

// background.js - AI API communication
const AI_CONFIG = {
  provider: 'openai',
  model: 'gpt-4o-mini',
  maxTokens: 1000,
  apiKey: null // Set by user in extension settings
};

async function summarizeTranscript(transcript, userPrompt = null) {
  if (!AI_CONFIG.apiKey) {
    throw new Error('API key not configured');
  }

  const systemPrompt = `You are a helpful assistant that summarizes YouTube video transcripts. 
Provide a concise summary that captures the main points, key arguments, and important details.
Format the summary with clear sections and bullet points.`;

  const response = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${AI_CONFIG.apiKey}`
    },
    body: JSON.stringify({
      model: AI_CONFIG.model,
      messages: [
        { role: 'system', content: systemPrompt },
        { role: 'user', content: userPrompt || `Summarize this transcript:\n\n${transcript}` }
      ],
      max_tokens: AI_CONFIG.maxTokens,
      temperature: 0.7
    })
  });

  if (!response.ok) {
    throw new Error(`API error: ${response.status}`);
  }

  const data = await response.json();
  return data.choices[0].message.content;
}

For cost-effective summarization, consider using smaller models like GPT-4o-mini or Claude Haiku. These handle transcript summarization effectively at a fraction of the cost.

Building the Summary Display

Display the generated summary directly on the YouTube page using a content script injection:

// content.js - Display summary overlay
class SummaryOverlay {
  constructor() {
    this.container = null;
  }

  create(summaryText, videoTitle) {
    // Remove existing overlay if present
    this.destroy();

    this.container = document.createElement('div');
    this.container.id = 'ai-summary-overlay';
    this.container.innerHTML = `
      <div class="summary-header">
        <h3>AI Summary</h3>
        <button class="close-btn">×</button>
      </div>
      <div class="summary-content">
        <h4>${videoTitle}</h4>
        <div class="summary-text">${summaryText}</div>
      </div>
      <div class="summary-footer">
        <button class="copy-btn">Copy to Clipboard</button>
      </div>
    `;

    // Add styles
    this.container.style.cssText = `
      position: fixed;
      top: 100px;
      right: 20px;
      width: 380px;
      max-height: 70vh;
      background: #1a1a1a;
      color: #fff;
      border-radius: 12px;
      box-shadow: 0 8px 32px rgba(0,0,0,0.4);
      z-index: 9999;
      overflow: hidden;
      font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
    `;

    document.body.appendChild(this.container);

    // Attach event listeners
    this.container.querySelector('.close-btn').addEventListener('click', () => this.destroy());
    this.container.querySelector('.copy-btn').addEventListener('click', () => {
      navigator.clipboard.writeText(summaryText);
    });
  }

  destroy() {
    if (this.container && this.container.parentNode) {
      this.container.parentNode.removeChild(this.container);
    }
  }
}

This creates a sleek overlay that integrates naturally with YouTube’s dark theme.

Implementing Background Message Handling

Connect your content script and background worker through Chrome’s message passing API:

// content.js - Request summary from background
async function requestSummary() {
  const extractor = new TranscriptExtractor();
  const transcript = await extractor.getFullTranscript();
  
  if (!transcript) {
    alert('No transcript available for this video');
    return;
  }

  // Send to background script
  chrome.runtime.sendMessage({
    action: 'generateSummary',
    transcript: transcript,
    videoTitle: document.title.replace(' - YouTube', '')
  }, (response) => {
    if (response.success) {
      const overlay = new SummaryOverlay();
      overlay.create(response.summary, response.videoTitle);
    } else {
      alert(`Summary failed: ${response.error}`);
    }
  });
}

// background.js - Handle message requests
chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
  if (request.action === 'generateSummary') {
    summarizeTranscript(request.transcript)
      .then(summary => {
        sendResponse({ 
          success: true, 
          summary: summary,
          videoTitle: request.videoTitle 
        });
      })
      .catch(error => {
        sendResponse({ 
          success: false, 
          error: error.message 
        });
      });
    return true; // Keep message channel open for async response
  }
});

Manifest Configuration

Set up your manifest with the necessary permissions:

{
  "manifest_version": 3,
  "name": "AI YouTube Summary",
  "version": "1.0.0",
  "description": "Generate AI-powered summaries of YouTube videos",
  "permissions": [
    "activeTab",
    "scripting",
    "storage"
  ],
  "host_permissions": [
    "https://www.youtube.com/*",
    "https://*.youtube.com/*"
  ],
  "content_scripts": [{
    "matches": ["https://www.youtube.com/watch*"],
    "js": ["content.js"]
  }],
  "background": {
    "service_worker": "background.js"
  },
  "action": {
    "default_popup": "popup.html",
    "default_icon": "icon.png"
  }
}

The activeTab and scripting permissions enable your content script to interact with the page. Host permissions are restricted to YouTube domains for security.

User Settings and API Key Management

Power users need to configure their own API keys. Implement a settings system:

// popup.js - Handle API key storage
document.getElementById('saveKey').addEventListener('click', () => {
  const apiKey = document.getElementById('apiKeyInput').value.trim();
  
  if (!apiKey) {
    alert('Please enter an API key');
    return;
  }

  chrome.storage.sync.set({ openaiApiKey: apiKey }, () => {
    document.getElementById('status').textContent = 'API key saved!';
    setTimeout(() => {
      document.getElementById('status').textContent = '';
    }, 2000);
  });
});

// Load saved key on startup
chrome.storage.sync.get(['openaiApiKey'], (result) => {
  if (result.openaiApiKey) {
    document.getElementById('apiKeyInput').value = result.openaiApiKey;
  }
});

Performance Optimization

Transcript processing can be slow for long videos. Optimize with these strategies:

Chunk long transcripts — Split transcripts over 8000 characters into smaller segments
Cache summaries — Store results in chrome.storage to avoid regenerating
Use loading states — Show progress indicators during API calls
Debounce requests — Prevent multiple simultaneous summarization attempts

// Chunk transcript for long videos
function chunkTranscript(transcript, maxLength = 6000) {
  const chunks = [];
  const sentences = transcript.split(/[.!?]+/);
  let currentChunk = '';

  sentences.forEach(sentence => {
    if ((currentChunk + sentence).length > maxLength) {
      chunks.push(currentChunk);
      currentChunk = sentence;
    } else {
      currentChunk += sentence + '.';
    }
  });

  if (currentChunk) chunks.push(currentChunk);
  return chunks;
}

Conclusion

Building an AI YouTube summary Chrome extension requires combining web scraping, browser extension APIs, and AI service integration. The core implementation follows a clear pattern: extract transcript data, send to an AI model, and display the result to users.

Start with basic transcript extraction, add AI integration, then refine the UI and performance. This incremental approach helps you debug each component before adding complexity.

The result is a powerful tool that helps developers and power users consume YouTube content more efficiently—scanning hours of content in minutes.

Built by theluckystrike — More at zovo.one