Turndown HTML to Markdown in Chrome Extensions: Complete Implementation Guide
Converting HTML content to Markdown is a common requirement for Chrome extension developers. Whether you’re building a web clipping tool, a content archival extension, or a note-taking app that needs to preserve formatted text, having reliable HTML to Markdown conversion is essential. Turndown, a powerful JavaScript library specifically designed for this purpose, has become the go-to solution for developers building Chrome extensions that need to transform web content into clean, portable Markdown format.
This comprehensive guide walks you through implementing Turndown in your Chrome extension from scratch. We’ll cover everything from initial setup to advanced customization options, ensuring you can build a robust content converter extension that handles even the most complex HTML structures.
Understanding Turndown and Its Role in Chrome Extensions
Turndown is a JavaScript library that converts HTML markup into Markdown. Developed by DOM manipulation expert Dom Christie, Turndown has gained widespread adoption in the web development community for its reliability and flexibility. Unlike simple regex-based converters, Turndown uses a proper HTML parser and maintains a sophisticated rule-based system for handling different HTML elements.
When building Chrome extensions, you often need to capture web page content and transform it for various purposes. Perhaps you’re building an extension that saves articles for offline reading, creates notes from web content, or exports web data in a portable format. In all these scenarios, Turndown proves invaluable by preserving the semantic structure of the original HTML while converting it to the lightweight Markdown syntax.
The library handles virtually every HTML element you might encounter, from simple paragraphs and headings to complex structures like tables, code blocks, and nested lists. It also provides extensive customization options, allowing you to control how specific elements convert, add custom rules for proprietary HTML, and fine-tune the output formatting to match your requirements.
Why Turndown Is the Best Choice for Chrome Extensions
Several factors make Turndown the optimal choice for Chrome extension development. First, it’s designed to work seamlessly in browser environments, with no Node.js dependencies that could complicate your extension’s architecture. The library runs entirely in the client’s browser, aligning perfectly with Chrome Extensions’ client-side execution model.
Turndown’s size is remarkably small despite its capabilities. The minified bundle weighs only around 15KB, making it lightweight enough for extensions where every kilobyte matters. This efficiency doesn’t come at the cost of functionality—the library handles edge cases and complex HTML structures with remarkable robustness.
The extensibility of Turndown deserves special mention. You can add custom rules to handle specific HTML elements, override default behavior for existing elements, and even create plugins that modify the conversion process. This flexibility proves invaluable when building specialized extensions that need to handle unique content types or follow specific formatting conventions.
Setting Up Turndown in Your Chrome Extension Project
Before implementing Turndown, you need to set up your Chrome extension project and install the necessary dependencies. Assuming you have a basic Chrome extension structure in place, the first step involves adding Turndown to your project.
If you’re using a build tool like Webpack or Rollup, you can install Turndown via npm:
npm install turndown
For simpler projects or those without a build system, you can include Turndown directly via a CDN in your extension’s HTML files:
<script src="https://unpkg.com/turndown/dist/turndown.js"></script>
However, for production extensions, bundling Turndown with your code is the recommended approach. This ensures your extension works offline and avoids dependency on external resources that might be blocked or unavailable.
Configuring Your Extension’s Manifest
When using Turndown in a Chrome extension, you need to ensure proper configuration in your manifest file. For extensions using content scripts that convert page content, you’ll need to specify the appropriate permissions. Here’s an example manifest configuration:
{
"manifest_version": 3,
"name": "HTML to Markdown Converter",
"version": "1.0",
"description": "Convert web page HTML to Markdown format",
"permissions": [
"activeTab",
"scripting"
],
"content_scripts": [
{
"matches": ["<all_urls>"],
"js": ["content.js"]
}
]
}
Remember that with Manifest V3, content scripts run in an isolated world, meaning you’ll need to bundle Turndown directly with your content script or use a background service worker to handle the conversion. For most use cases, bundling Turndown with your content script provides the simplest architecture.
Implementing Basic HTML to Markdown Conversion
With Turndown installed, implementing basic HTML to Markdown conversion is straightforward. The core functionality revolves around creating a TurndownService instance and calling its turndown method with the HTML content you want to convert.
Here’s a basic implementation for a Chrome extension content script:
// content.js
import TurndownService from 'turndown';
const turndownService = new TurndownService();
// Convert a specific element's HTML to Markdown
function convertElementToMarkdown(element) {
const html = element.innerHTML;
const markdown = turndownService.turndown(html);
return markdown;
}
// Example: Convert the main content of a page
function convertPageContent() {
const mainContent = document.querySelector('article') || document.body;
const markdown = turndownService.turndown(mainContent);
console.log(markdown);
return markdown;
}
This simple implementation handles the core conversion task. When you pass HTML to the turndown method, it returns the equivalent Markdown string. The service automatically handles common HTML elements, converting headings to Markdown headers, links to Markdown links, images to Markdown images, and so on.
Handling Different HTML Sources
Chrome extensions often need to convert HTML from various sources, each with its own characteristics and challenges. Let’s explore how to handle the most common scenarios.
Converting Selected Content: When users select specific content on a page, you can capture the HTML of the selection and convert it:
function convertSelection() {
const selection = window.getSelection();
if (selection.rangeCount === 0) return null;
const range = selection.getRangeAt(0);
const container = document.createElement('div');
container.appendChild(range.cloneContents());
return turndownService.turndown(container.innerHTML);
}
Converting Entire Pages: For full-page conversions, you might want to exclude certain elements like navigation, footers, or advertising:
function convertFullPage() {
// Clone the document to avoid modifying the original
const clone = document.documentElement.cloneNode(true);
// Remove unwanted elements
const unwantedSelectors = [
'nav', 'footer', 'aside', '.advertisement',
'.sidebar', '#comments', '.nav', '.menu'
];
unwantedSelectors.forEach(selector => {
clone.querySelectorAll(selector).forEach(el => el.remove());
});
return turndownService.turndown(clone.innerHTML);
}
Converting Dynamic Content: For Single Page Applications and dynamically loaded content, you might need to wait for the content to load:
function convertAfterContentLoaded() {
return new Promise((resolve) => {
// Wait for network to be idle
if (document.readyState === 'complete') {
setTimeout(() => {
resolve(turndownService.turndown(document.body.innerHTML));
}, 2000);
} else {
window.addEventListener('load', () => {
setTimeout(() => {
resolve(turndownService.turndown(document.body.innerHTML));
}, 2000);
});
}
});
}
Advanced Turndown Configuration and Customization
While the default Turndown settings work well for most use cases, understanding the configuration options allows you to tailor the output precisely to your needs. The TurndownService constructor accepts an options object that controls various aspects of the conversion process.
Configuring Heading Style
Turndown can generate headings in either ATX or SETEXT style. ATX-style headings use hash symbols (# through ######), while SETEXT style uses underline equals signs (for h1) or dashes (for h2). ATX is the default and most widely supported:
const turndownService = new TurndownService({
headingStyle: 'atx' // or 'setext'
});
Managing Code Blocks
Code block handling is particularly important for technical content. Turndown supports both indented code blocks (the GitHub Flavored Markdown style) and fenced code blocks:
const turndownService = new TurndownService({
codeBlockStyle: 'fenced', // or 'indented'
preformattedCode: true // preserves whitespace in code blocks
});
The fenced code block style uses triple backticks, which provides better compatibility with various Markdown processors and syntax highlighters.
Customizing Link and Image Handling
You can control how links and images are processed during conversion. This is particularly useful when you need to handle relative URLs or process images differently:
const turndownService = new TurndownService({
linkStyle: 'inlined', // or 'referenced'
linkReferenceStyle: 'full', // or 'collapsed' or 'shortcut'
// Custom image handler
imageHandler: function(legacy) {
return function(node) {
const src = node.getAttribute('src');
const alt = node.getAttribute('alt') || '';
return ``;
};
}
});
Bullet List and Emphasis Options
Turndown provides fine-grained control over list formatting and text emphasis:
const turndownService = new TurndownService({
bulletListMarker: '-', // or '*' or '+'
emDelimiter: '*', // or '_'
strongDelimiter: '**', // or '__'
hrStyle: '---' // or '***' or '___'
});
These options allow you to match the output formatting with your preferred Markdown flavor or existing content conventions.
Creating Custom Turndown Rules
One of Turndown’s most powerful features is its rule-based system for handling HTML elements. You can add custom rules to handle proprietary HTML elements, modify the conversion of existing elements, or add special processing for specific content types.
Adding Custom Element Rules
Suppose your extension needs to handle a custom element or modify how a standard element converts. You can add custom rules using the addRule method:
// Custom rule for blockquotes with citations
turndownService.addRule('blockquoteCitation', {
filter: 'blockquote',
replacement: function(content, node) {
const cite = node.querySelector('cite');
const citation = cite ? ` — ${cite.textContent}` : '';
const innerContent = content.replace(/\n+$/, '') + citation;
return `> ${innerContent}\n\n`;
}
});
// Custom rule for callout boxes
turndownService.addRule('callout', {
filter: function(node) {
return node.classList && node.classList.contains('callout');
},
replacement: function(content, node) {
const type = node.dataset.type || 'note';
return `!!! ${type} "${type.toUpperCase()}"\n${content}\n\n`;
}
});
Handling Tables with Greater Control
Tables are among the more complex HTML structures to convert. Turndown provides basic table support, but you can enhance it with custom rules for better formatting:
turndownService.addRule('table', {
filter: 'table',
replacement: function(content, node) {
const rows = [];
const cells = node.querySelectorAll('th, td');
cells.forEach((cell, index) => {
const row = Math.floor(index / (cell.parentElement.children.length));
if (!rows[row]) rows[row] = [];
rows[row].push(cell.textContent.trim());
});
if (rows.length === 0) return '';
// Calculate column widths
const colWidths = [];
rows.forEach(row => {
row.forEach((cell, i) => {
colWidths[i] = Math.max(colWidths[i] || 0, cell.length);
});
});
// Build markdown table
let markdown = '';
rows.forEach((row, rowIndex) => {
const formattedRow = row.map((cell, i) =>
cell.padEnd(colWidths[i])
).join(' | ');
markdown += `| ${formattedRow} |\n`;
// Add header separator after first row
if (rowIndex === 0) {
markdown += `| ${colWidths.map(w => '-'.repeat(w)).join(' | ')} |\n`;
}
});
return markdown + '\n';
}
});
Processing Links for Offline Use
When building an extension that saves content for offline reading, you might want to process links differently:
turndownService.addRule('absoluteLinks', {
filter: 'a',
replacement: function(content, node) {
const href = node.getAttribute('href');
const title = node.getAttribute('title');
if (!href) return content;
// Check if it's an absolute URL
if (href.startsWith('http://') || href.startsWith('https://')) {
const titlePart = title ? ` "${title}"` : '';
return `[${content}](${href}${titlePart})`;
}
// Keep relative links as-is
return `[${content}](${href})`;
}
});
Building a Complete Content Converter Extension
Now that you understand Turndown’s capabilities let’s put everything together into a functional Chrome extension. We’ll create an extension that allows users to select content on any page and convert it to Markdown.
Project Structure
markdown-converter/
├── manifest.json
├── popup.html
├── popup.js
├── content.js
├── background.js
├── styles.css
└── lib/
└── turndown.js
The Manifest File
{
"manifest_version": 3,
"name": "HTML to Markdown Converter",
"version": "1.0",
"description": "Convert web page HTML to Markdown using Turndown",
"permissions": [
"activeTab",
"scripting"
],
"action": {
"default_popup": "popup.html",
"default_icon": "icon.png"
},
"content_scripts": [
{
"matches": ["<all_urls>"],
"js": ["lib/turndown.js", "content.js"],
"run_at": "document_idle"
}
]
}
Content Script Implementation
// content.js
// Initialize Turndown service with custom configuration
const turndownService = new TurndownService({
headingStyle: 'atx',
codeBlockStyle: 'fenced',
bulletListMarker: '-',
emDelimiter: '*',
strongDelimiter: '**',
linkStyle: 'inlined'
});
// Add custom rules
turndownService.addRule('callout', {
filter: function(node) {
return node.classList && node.classList.contains('callout');
},
replacement: function(content) {
return `> **Note:** ${content}\n\n`;
}
});
// Listen for messages from popup or background
chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
if (request.action === 'convertSelection') {
const selection = window.getSelection();
if (selection.rangeCount > 0) {
const range = selection.getRangeAt(0);
const container = document.createElement('div');
container.appendChild(range.cloneContents());
const markdown = turndownService.turndown(container.innerHTML);
sendResponse({ success: true, markdown: markdown });
} else {
sendResponse({ success: false, error: 'No selection' });
}
} else if (request.action === 'convertElement') {
const element = document.querySelector(request.selector);
if (element) {
const markdown = turndownService.turndown(element.innerHTML);
sendResponse({ success: true, markdown: markdown });
} else {
sendResponse({ success: false, error: 'Element not found' });
}
}
return true;
});
Popup Interface
<!-- popup.html -->
<!DOCTYPE html>
<html>
<head>
<title>Markdown Converter</title>
<link rel="stylesheet" href="styles.css">
</head>
<body>
<div class="container">
<h2>HTML to Markdown</h2>
<div class="options">
<label>
<input type="checkbox" id="convertSelection" checked>
Convert selection only
</label>
</div>
<button id="convertBtn">Convert</button>
<div id="result" class="result">
<textarea id="output" placeholder="Markdown output will appear here..."></textarea>
<button id="copyBtn">Copy to Clipboard</button>
</div>
</div>
<script src="popup.js"></script>
</body>
</html>
Popup Script
// popup.js
document.addEventListener('DOMContentLoaded', () => {
const convertBtn = document.getElementById('convertBtn');
const copyBtn = document.getElementById('copyBtn');
const output = document.getElementById('output');
const convertSelection = document.getElementById('convertSelection');
convertBtn.addEventListener('click', async () => {
const [tab] = await chrome.tabs.query({ active: true, currentWindow: true });
const action = convertSelection.checked ? 'convertSelection' : 'convertElement';
chrome.tabs.sendMessage(tab.id, { action: action }, (response) => {
if (response && response.success) {
output.value = response.markdown;
} else {
output.value = `Error: ${response?.error || 'Unknown error'}`;
}
});
});
copyBtn.addEventListener('click', () => {
output.select();
document.execCommand('copy');
copyBtn.textContent = 'Copied!';
setTimeout(() => {
copyBtn.textContent = 'Copy to Clipboard';
}, 2000);
});
});
Best Practices and Performance Optimization
When implementing Turndown in production Chrome extensions, several best practices ensure optimal performance and user experience.
Efficient DOM Manipulation
Avoid unnecessary DOM operations during conversion. Instead of converting entire pages, target specific content areas when possible. Use efficient selectors and consider the page structure when designing your conversion logic:
// Good: Target specific content areas
const articleContent = document.querySelector('article .content');
if (articleContent) {
markdown = turndownService.turndown(articleContent.innerHTML);
}
// Avoid: Converting entire page unnecessarily
// markdown = turndownService.turndown(document.documentElement.innerHTML);
Handling Large Content
For pages with extensive content, consider processing in chunks to avoid blocking the main thread:
async function convertLargeContent(html) {
const chunkSize = 50000;
let markdown = '';
for (let i = 0; i < html.length; i += chunkSize) {
const chunk = html.slice(i, i + chunkSize);
markdown += await new Promise(resolve => {
setTimeout(() => {
resolve(turndownService.turndown(chunk));
}, 0);
});
}
return markdown;
}
Error Handling and Fallbacks
Always implement proper error handling to ensure your extension remains functional even with problematic HTML:
function safeConvert(html) {
try {
// Sanitize input before conversion
const sanitized = html.replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi, '');
return turndownService.turndown(sanitized);
} catch (error) {
console.error('Conversion error:', error);
// Return plain text fallback
const tempDiv = document.createElement('div');
tempDiv.innerHTML = html;
return tempDiv.textContent || tempDiv.innerText || '';
}
}
Conclusion
Implementing Turndown in your Chrome extension provides powerful HTML to Markdown conversion capabilities that can enhance a wide variety of extension types. From content clipping tools to note-taking applications, Turndown’s flexibility and reliability make it an excellent choice for any project requiring Markdown output.
Remember to consider your specific use case when configuring Turndown. The default settings work well for general purposes, but customizing rules and options ensures the output matches your requirements. Test your implementation with various HTML structures to verify that the conversion handles edge cases properly.
With the knowledge from this guide, you’re well-equipped to build robust content converter extensions that provide valuable functionality to users. Turn continue exploring Turndown’s documentation for additional customization options, and don’t hesitate to experiment with custom rules tailored to your specific needs.
The combination of Turndown’s powerful conversion engine and Chrome Extensions’ reach opens up numerous possibilities for building tools that help users capture, convert, and preserve web content in the convenient Markdown format.