markdown-for-agents
HTML to Markdown converter for AI agents. 90+% fewer tokens, one dependency, works everywhere.
markdown-for-agents
Runtime-agnostic HTML to Markdown converter built for AI agents. One dependency, works everywhere.
Convert any HTML page into clean, token-efficient Markdown — with built-in content extraction to strip away navigation, ads, and boilerplate. Inspired by Cloudflare's Markdown for Agents.
Try it in the playground — paste a URL or HTML and see the conversion live.

Audit any URL — no installation required:
npx @markdown-for-agents/audit https://docs.github.com/en/copilot/get-started/quickstart
HTML Markdown Savings
───────────────────────────────────────────────────
Tokens 138,550 9,364 -93.2%
Chars 554,200 37,456 -93.2%
Words 27,123 4,044
Size 541.3 KB 36.6 KB -93.2%
Features
- Runtime-agnostic — Node.js, Bun, Deno, Cloudflare Workers, Vercel Edge, browsers
- Content extraction — strip nav, footer, ads, sidebars, cookie banners automatically
- Framework middleware — drop-in support for Express, Fastify, Hono, Next.js, and any Web Standard server
- Content negotiation — respond with Markdown when clients send
Accept: text/markdown - Token estimation — built-in heuristic token counter for LLM cost planning, with support for custom tokenizers
- Plugin system — override or extend any element conversion with custom rules
- Single dependency — only htmlparser2 (no DOM required)
- ESM only — modern, tree-shakeable, with subpath exports
- Fully typed — written in TypeScript with complete type definitions
Install
# Core library
npm install markdown-for-agents
# Middleware (install only what you need)
npm install @markdown-for-agents/express
npm install @markdown-for-agents/fastify
npm install @markdown-for-agents/hono
npm install @markdown-for-agents/nextjs
npm install @markdown-for-agents/web
Python
Also available as a pure Python package with zero dependencies:
pip install markdown-for-agents
See the Python package docs for the full API, middleware (FastAPI, Flask, Django), and usage examples.
Quick Start
import { convert } from 'markdown-for-agents';
const html = `
<h1>Hello World</h1>
<p>This is a <strong>simple</strong> example.</p>
`;
const { markdown, tokenEstimate } = convert(html);
console.log(markdown);
// # Hello World
//
// This is a **simple** example.
console.log(tokenEstimate);
// { tokens: 12, characters: 46, words: 8 }
Content Extraction
Real-world HTML pages are full of navigation, ads, sidebars, and cookie banners. Enable extraction mode to get just the main content:
const { markdown } = convert(html, { extract: true });
This strips <nav>, <header>, <footer>, <aside>, <script>, <style>, ad-related elements, cookie banners, social widgets, and more. See the Content Extraction guide for full details.
Middleware
Serve Markdown automatically when AI agents request it via Accept: text/markdown. Each middleware is a separate package:
// Express
import { markdown } from '@markdown-for-agents/express';
app.use(markdown());
// Fastify
import { markdown } from '@markdown-for-agents/fastify';
fastify.register(markdown());
// Hono
import { markdown } from '@markdown-for-agents/hono';
app.use(markdown());
// Next.js (auto-unwraps /_next/image URLs)
import { withMarkdown } from '@markdown-for-agents/nextjs';
export default withMarkdown(handler);
// Any Web Standard server (Cloudflare Workers, Deno, Bun)
import { markdownMiddleware } from '@markdown-for-agents/web';
const mw = markdownMiddleware();
The middleware inspects the Accept header. Normal browser requests pass through untouched. When an AI agent sends Accept: text/markdown, the HTML response is automatically converted. See the Middleware guide for full details and the
Next.js example for a complete working app.
Caching
The middleware automatically sets headers to support proper HTTP caching:
Vary: Accept— ensures CDNs and proxies cache HTML and Markdown responses separately, preventing an AI agent from receiving a cached HTML response (or vice versa).ETag— a content hash of the Markdown output, enabling conditional requests viaIf-None-Match. CDNs can serve304 Not Modifiedwithout hitting your origin server.
For production deployments, add Cache-Control at your infrastructure layer to control how long responses are cached:
// Example: cache Markdown responses for 1 hour at the CDN
app.use((req, res, next) => {
if (req.headers.accept?.includes('text/markdown')) {
res.setHeader('cache-control', 'public, max-age=3600');
}
next();
});
app.use(markdown());
The contentHash is also available on the core convert() result for custom caching strategies:
const { markdown, contentHash } = convert(html);
// contentHash: "2f-1a3b4c5" — use as a cache key or ETag
Custom Rules
Override how any element is converted, or add support for custom elements:
import { convert, createRule } from 'markdown-for-agents';
const { markdown } = convert(html, {
rules: [
createRule(
node => node.name === 'div' && node.attribs.class?.includes('callout'),
({ convertChildren, node }) => `\n\n> **Note:** ${convertChildren(node).trim()}\n\n`
)
]
});
Custom rules have higher priority than defaults and are applied first. See the Custom Rules guide for the full API.
Options
All options are optional. Defaults are shown below:
convert(html, {
// Content extraction
extract: false, // true | ExtractOptions
// Custom conversion rules
rules: [], // Rule[]
// Base URL for resolving relative links and images
baseUrl: '', // "https://example.com"
// Heading style
headingStyle: 'atx', // "atx" (#) or "setext" (underline)
// Bullet character for unordered lists
bulletChar: '-', // "-", "*", or "+"
// Code block style
codeBlockStyle: 'fenced', // "fenced" or "indented"
// Fence character
fenceChar: '`', // "`" or "~"
// Strong delimiter
strongDelimiter: '**', // "**" or "__"
// Emphasis delimiter
emDelimiter: '*', // "*" or "_"
// Link style
linkStyle: 'inlined', // "inlined" or "referenced"
// Remove duplicate content blocks
deduplicate: false, // true | DeduplicateOptions
// Custom token counter (replaces built-in heuristic)
tokenCounter: undefined, // (text: string) => TokenEstimate
// Performance timing (populates convertDuration in result)
serverTiming: false // true to measure conversion duration
});
Server Timing
Enable serverTiming to measure conversion duration. The result includes convertDuration (in milliseconds), and middleware adapters set both a standard Server-Timing header and an x-markdown-timing header with the same value:
const { markdown, convertDuration } = convert(html, { serverTiming: true });
console.log(`Conversion took ${convertDuration}ms`);
// Middleware sets:
// Server-Timing: mfa.convert;dur=4.7;desc="HTML to Markdown"
// x-markdown-timing: mfa.convert;dur=4.7;desc="HTML to Markdown"
The x-markdown-timing header carries the same timing data as Server-Timing but survives CDN caching. Some CDNs strip the standard Server-Timing header from cached responses because the values are tied to a specific execution. The custom header preserves the timing from the
original render so it remains observable after caching.
The Next.js middleware additionally includes mfa.fetch duration for the proxy self-fetch. Both headers surface in browser devtools and are useful for production performance monitoring.
Custom Token Counter
By default, token estimation uses a fast heuristic (~4 characters per token). You can replace it with an exact tokenizer:
import { convert } from 'markdown-for-agents';
import { encoding_for_model } from 'tiktoken';
const enc = encoding_for_model('gpt-4o');
const { markdown, tokenEstimate } = convert(html, {
tokenCounter: text => ({
tokens: enc.encode(text).length,
characters: text.length,
words: text.split(/\s+/).filter(Boolean).length
})
});
The custom counter receives the final markdown string and must return a TokenEstimate object with tokens, characters, and words fields. It flows through to middleware as well — the x-markdown-tokens header will reflect your counter's value.
Deduplication Options
Pass deduplicate: true to use defaults, or pass a DeduplicateOptions object to customize behavior:
const { markdown } = convert(html, {
deduplicate: { minLength: 5 } // catch short repeated phrases like "Read more"
});
The minLength option (default: 10) controls the minimum block length eligible for deduplication. Blocks shorter than this are always kept. Lower it to catch short repeated phrases, raise it for more conservative deduplication.
Supported Elements
Block
| HTML | Markdown |
|---|---|
<h1>...<h6> |
# Heading (atx) or underline (setext) |
<p> |
Paragraph with blank lines |
<blockquote> |
> Quoted text |
<pre><code> |
Fenced code block with language |
<hr> |
--- |
<br> |
Trailing double-space line break |
<ul>, <ol>, <li> |
Lists with nesting and indentation |
<table> |
GFM pipe table with separator row |
<script>, <style>, <noscript>, <template> |
Stripped |
Inline
| HTML | Markdown |
|---|---|
<strong>, <b> |
**bold** |
<em>, <i> |
*italic* |
<del>, <s>, <strike> |
~~strikethrough~~ |
<code> |
`inline code` |
<a> |
[text](url) with title and baseUrl support |
<img> |
 with title and baseUrl support |
<sub> |
~subscript~ |
<sup> |
^superscript^ |
<abbr>, <mark> |
Pass-through (text preserved) |
Packages
TypeScript
Monorepo managed with pnpm workspaces:
| Package | Description |
|---|---|
markdown-for-agents |
Core HTML-to-Markdown converter |
@markdown-for-agents/audit |
CLI & library to audit token/byte savings |
@markdown-for-agents/express |
Express middleware |
@markdown-for-agents/fastify |
Fastify plugin |
@markdown-for-agents/hono |
Hono middleware |
@markdown-for-agents/nextjs |
Next.js middleware (with /_next/image URL unwrapping) |
@markdown-for-agents/web |
Web Standard middleware |
Python
| Package | Description |
|---|---|
markdown-for-agents |
Core converter - zero dependencies, FastAPI/Flask/Django middleware |
Subpath Exports
The core package provides fine-grained imports for tree-shaking:
import { convert } from 'markdown-for-agents';
import { extractContent } from 'markdown-for-agents/extract';
import { estimateTokens } from 'markdown-for-agents/tokens';
Documentation
View the full documentation | Playground
- Getting Started — installation, first conversion, common patterns
- Content Extraction — stripping non-content elements from web pages
- Middleware — Express, Fastify, Hono, Next.js, and Web Standard middleware
- Custom Rules — the rule system, priorities, and writing plugins
- API Reference — complete API documentation with all types
- Architecture — how the library works internally
- Contributing — development setup, testing, and contributing guidelines
Runtime Compatibility
| Runtime | Version | Status |
|---|---|---|
| Node.js | >= 22 | Tested |
| Bun | >= 1.0 | Tested |
| Deno | >= 2.0 | Tested |
| Cloudflare Workers | - | Compatible |
| Vercel Edge | - | Compatible |
| Browsers | ES2022+ | Compatible |
License
MIT
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi