opencode-parser
skill
Fail
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Fail
- exec() — Shell command execution in src/parsers/pptx.ts
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
Parse any file in opencode. Supports PDF, DOCX, XLSX, PPTX, images, EPUB, HTML, Markdown, Jupyter, archives, and plain text.
README.md
opencode-parser
An opencode plugin that parses any file into structured text the LLM can work with.
https://github.com/user-attachments/assets/ed9d6ee7-d30b-43d5-83e0-4e09dafaa422
Install
{
"plugin": ["opencode-parser"]
}
Or install via CLI: opencode plugin opencode-parser -g
Or copy src/ into .opencode/tools/ for a local zero-config setup.
Supported formats
| Format | Extensions | Extracted |
|---|---|---|
.pdf |
Text, metadata, pages | |
| Word | .docx |
Text, tables, metadata |
| Excel | .xlsx, .xls, .csv, .tsv |
Text, tables, sheet names |
| PowerPoint | .pptx, .ppt |
Slide text, speaker notes |
| Images | .png, .jpg, .jpeg, .webp, .gif, .bmp, .tiff |
OCR text (opt-in) |
| EPUB | .epub |
Full text with heading structure |
| HTML | .html, .htm |
Body text, headings |
| XML | .xml |
Stripped text content |
| Markdown | .md |
Raw text |
| Jupyter | .ipynb |
Code, markdown, outputs |
| ZIP | .zip |
File listing with sizes |
| Archives | .rar, .7z, .tar, .gz |
Listing (extraction notes) |
| Plain text | .txt, .json, .yaml, .toml, .ini |
Raw content |
Usage
Parse @report.pdf and give me a summary
parse the spreadsheet at @data.xlsx but only the first 3 sheets
parse @report.pdf and save the full output
Options
| Option | Default | Description |
|---|---|---|
filePath |
— | Path to the file (required) |
maxChars |
50000 | Limit output chars (-1 for unlimited). Pass -1 to get the full document. |
extractTables |
true | Extract tables from docs/spreadsheets |
extractImages |
false | Enable OCR for images |
ocrLang |
"eng" | OCR language for tesseract.js (e.g. "eng", "fra", "ara") |
maxPages |
varies | Limit pages/slides/sheets processed |
save |
false | Save the full parsed output as a .md file alongside the original (no truncation) |
outputPath |
— | Custom path for the Markdown export (overrides save path) |
How it works
- File is verified by magic bytes, not just extension
- Type detection dispatches to the right parser
- Metadata is extracted (author, pages, sheet count, etc.)
- Tables become readable markdown
- Large content is truncated gracefully with a note to the LLM
All 15+ format handlers return the same output structure, so the LLM gets consistent results regardless of file type.
Development
npm install
npm run typecheck
License
MIT
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found