Chrome Extension Key Points Extractor: Building a Content Summarization Tool

Chrome extensions that extract key points from web content have become essential tools for researchers, developers, and anyone who consumes large amounts of information. Building one requires understanding the Chrome extension architecture, content extraction techniques, and integration with AI or rule-based summarization services. This guide walks you through creating a functional key points extractor extension from scratch.

Understanding the Core Architecture

A key points extractor extension operates through a pipeline: content script injection, DOM parsing, text extraction, processing through a summarization engine, and display through a popup or side panel. The architecture must balance performance with accuracy, ensuring the extension doesn’t slow down page loads or consume excessive memory.

The manifest file defines the extension’s capabilities and permissions. For a key points extractor, you need activeTab permission for accessing page content, scripting permission for content script injection, and storage permission for saving extracted data:

{
  "manifest_version": 3,
  "name": "Key Points Extractor",
  "version": "1.0",
  "permissions": ["activeTab", "storage", "scripting"],
  "action": {
    "default_popup": "popup.html"
  },
  "content_scripts": [{
    "matches": ["<all_urls>"],
    "js": ["content.js"]
  }]
}

The content script runs in the context of the target page, extracting visible text while avoiding navigation elements, ads, and other non-content areas. This selective extraction improves the quality of the extracted key points by removing noise from the original content.

Content Extraction Strategies

Effective content extraction requires distinguishing between main article content and page clutter. The Document Object Model provides multiple approaches: semantic element targeting, text density analysis, and machine learning classifiers. For most implementations, a combination of semantic targeting and text density works well.

Target common article containers first:

// content.js
function extractPageContent() {
  const selectors = [
    'article',
    '[role="main"]',
    '.post-content',
    '.article-body',
    '.entry-content',
    'main'
  ];
  
  for (const selector of selectors) {
    const element = document.querySelector(selector);
    if (element && element.textContent.length > 500) {
      return cleanText(element.textContent);
    }
  }
  
  // Fallback: find largest text block
  return extractLargestTextBlock();
}

function cleanText(text) {
  return text
    .replace(/\s+/g, ' ')
    .replace(/[\n\r]+/g, '\n')
    .trim();
}

The fallback mechanism ensures the extension works even on pages without standard semantic markup, such as single-page applications or non-standard layouts.

Building the Summarization Engine

The summarization engine transforms extracted text into key points. Two primary approaches exist: extractive methods that identify and rank important sentences, and abstractive methods that generate new summaries using AI models.

For a pure JavaScript implementation without external API dependencies, implement a TF-IDF based extraction:

// summarizer.js
function extractKeyPoints(text, maxPoints = 5) {
  const sentences = text.split(/[.!?]+/).filter(s => s.trim().length > 20);
  
  // Calculate word frequencies
  const words = text.toLowerCase().split(/\W+/);
  const stopWords = new Set(['the', 'a', 'an', 'and', 'or', 'but', 'is', 'are', 'was', 'were', 'of', 'in', 'on', 'at', 'to', 'for']);
  const frequencies = {};
  
  words.forEach(word => {
    if (!stopWords.has(word) && word.length > 3) {
      frequencies[word] = (frequencies[word] || 0) + 1;
    }
  });
  
  // Score each sentence
  const scoredSentences = sentences.map(sentence => {
    const sentenceWords = sentence.toLowerCase().split(/\W+/);
    const score = sentenceWords.reduce((sum, word) => 
      sum + (frequencies[word] || 0), 0);
    return { sentence: sentence.trim(), score };
  });
  
  // Return top sentences as key points
  return scoredSentences
    .sort((a, b) => b.score - a.score)
    .slice(0, maxPoints)
    .map(s => s.sentence);
}

This approach works offline and requires no API keys, making it suitable for basic use cases. For higher quality summaries, integrate with an AI API like Anthropic or OpenAI.

Integrating AI Summarization

For production-quality key points, connect to a language model API. The extension sends extracted content to the API and receives generated summaries. Implement proper error handling and rate limiting:

// ai-summarizer.js
async function getAIKeyPoints(text, apiKey) {
  const response = await fetch('https://api.anthropic.com/v1/messages', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-api-key': apiKey,
      'anthropic-version': '2023-06-01'
    },
    body: JSON.stringify({
      model: 'claude-3-haiku-20240307',
      max_tokens: 1024,
      messages: [{
        role: 'user',
        content: `Extract 5-7 key points from this content. Format each point as a concise bullet:\n\n${text.slice(0, 8000)}`
      }]
    })
  });
  
  if (!response.ok) {
    throw new Error(`API error: ${response.status}`);
  }
  
  const data = await response.json();
  return data.content[0].text.split('\n').filter(line => line.trim());
}

Store API keys securely using chrome.storage with the protected storage API, ensuring credentials never appear in plain text or version control.

User Interface Design

The popup interface displays extracted key points with options to copy, save, or regenerate. Design for clarity and quick access:

<!-- popup.html -->
<!DOCTYPE html>
<html>
<head>
  <style>
    body { width: 320px; font-family: system-ui, sans-serif; padding: 16px; }
    h3 { margin: 0 0 12px; font-size: 14px; color: #333; }
    ul { list-style: none; padding: 0; margin: 0; }
    li { padding: 8px 0; border-bottom: 1px solid #eee; font-size: 13px; line-height: 1.4; }
    button { margin-top: 12px; padding: 8px 16px; background: #0066cc; color: white; border: none; border-radius: 4px; cursor: pointer; }
    button:hover { background: #0055aa; }
    .loading { color: #666; font-style: italic; }
  </style>
</head>
<body>
  <h3>Key Points</h3>
  <ul id="points-list"></ul>
  <button id="extract-btn">Extract Key Points</button>
  <script src="popup.js"></script>
</body>
</html>

The popup script communicates with the content script to trigger extraction and display results:

// popup.js
document.getElementById('extract-btn').addEventListener('click', async () => {
  const list = document.getElementById('points-list');
  list.innerHTML = '<li class="loading">Extracting key points...</li>';
  
  const [tab] = await chrome.tabs.query({ active: true, currentWindow: true });
  
  chrome.tabs.sendMessage(tab.id, { action: 'extract' }, async (response) => {
    if (chrome.runtime.lastError) {
      list.innerHTML = '<li>Unable to extract from this page</li>';
      return;
    }
    
    const keyPoints = await getAIKeyPoints(response.content);
    list.innerHTML = keyPoints.map(point => `<li>${point}</li>`).join('');
  });
});

Performance Optimization

Content extraction must complete quickly to avoid blocking the page. Use requestIdleCallback for non-critical processing, implement caching to avoid re-extracting content, and consider using web workers for heavy computation:

// Use requestIdleCallback for non-blocking extraction
function extractContentWhenIdle() {
  return new Promise((resolve) => {
    requestIdleCallback(() => {
      const content = extractPageContent();
      resolve(content);
    }, { timeout: 2000 });
  });
}

Implement a simple cache using chrome.storage to store previously extracted content, keyed by URL:

async function getCachedContent(url) {
  const result = await chrome.storage.local.get(url);
  return result[url] || null;
}

async function cacheContent(url, content) {
  await chrome.storage.local.set({ [url]: content });
}

Extension Testing and Debugging

Chrome provides excellent debugging tools for extension development. Load your extension in developer mode through chrome://extensions, use the service worker console for background script debugging, and inspect popup and content script contexts individually.

Test with various page types: news articles, blog posts, documentation pages, and single-page applications. Each has different structural patterns requiring robust extraction logic.

Build by theluckystrike — More at zovo.one