opencode-parser

An opencode plugin that parses any file into structured text the LLM can work with.

Install

{
  "plugin": ["opencode-parser"]
}

Or install via CLI: opencode plugin opencode-parser -g

Or copy src/ into .opencode/tools/ for a local zero-config setup.

Format	Extensions	Extracted
PDF	`.pdf`	Text, metadata, pages
Word	`.docx`	Text, tables, metadata
Excel	`.xlsx`, `.xls`, `.csv`, `.tsv`	Text, tables, sheet names
PowerPoint	`.pptx`, `.ppt`	Slide text, speaker notes
Images	`.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, `.bmp`, `.tiff`	OCR text (opt-in)
EPUB	`.epub`	Full text with heading structure
HTML	`.html`, `.htm`	Body text, headings
XML	`.xml`	Stripped text content
Markdown	`.md`	Raw text
Jupyter	`.ipynb`	Code, markdown, outputs
ZIP	`.zip`	File listing with sizes
Archives	`.rar`, `.7z`, `.tar`, `.gz`	Listing (extraction notes)
Plain text	`.txt`, `.json`, `.yaml`, `.toml`, `.ini`	Raw content

Parse @report.pdf and give me a summary

parse the spreadsheet at @data.xlsx but only the first 3 sheets

parse @report.pdf and save the full output

Option	Default	Description
`filePath`	—	Path to the file (required)
`maxChars`	50000	Limit output chars (`-1` for unlimited). Pass `-1` to get the full document.
`extractTables`	true	Extract tables from docs/spreadsheets
`extractImages`	false	Enable OCR for images
`ocrLang`	"eng"	OCR language for tesseract.js (e.g. "eng", "fra", "ara")
`maxPages`	varies	Limit pages/slides/sheets processed
`save`	false	Save the full parsed output as a `.md` file alongside the original (no truncation)
`outputPath`	—	Custom path for the Markdown export (overrides `save` path)

All 15+ format handlers return the same output structure, so the LLM gets consistent results regardless of file type.

npm install
npm run typecheck

MIT