clipforge-PAKT
Lossless-first prompt compression for JSON, YAML, CSV, and Markdown. Library, CLI, MCP server, desktop app, and browser extension.
ClipForge PAKT
Lossless-first prompt compression for structured LLM data. Structured payloads often drop 30-50% tokens across core L1-L3.
Stop paying for syntax. Every token should carry meaning.
What is PAKT?
PAKT (Pipe-Aligned Kompact Text) is a lossless-first compression format that converts JSON, YAML, CSV, and mixed markdown content into a compact pipe-delimited syntax optimized for LLM token efficiency. Structured payloads often see 30-50% token savings, with higher gains on repetitive and tabular data, while preserving data fidelity across core lossless layers L1-L3. L4 is separately opt-in, budgeted, and lossy.
LLMs charge by the token. Structured data wastes tokens on syntax: braces, quotes, repeated keys, whitespace. PAKT eliminates the waste.
About ClipForge
ClipForge is the product suite built around PAKT. In this repository, that means:
- @sriinnu/pakt -- The core library, CLI, and MCP server. This is the stable release surface for Node.js and TypeScript projects, plus agent hosts that need stdio tools for compress, auto, and inspect.
- ClipForge Playground -- A lightweight local web UI for trying JSON, YAML, CSV, and mixed markdown compression before wiring PAKT into a real workflow. It is a browser lab, not a release integration. Hosted playground: pakt-4f9.pages.dev.
- ClipForge Desktop -- A Tauri desktop shell for clipboard compression workflows. The current release validation is macOS menu bar first; Windows and Linux tray targets exist in source but are not part of the validated release path yet.
- ClipForge Browser Extension (experimental) -- A Chrome extension with a popup, context-menu actions, and input helpers for supported web LLM UIs such as ChatGPT, Claude, and Gemini. Site coverage is intentionally limited today.
The goal is simple: every token you send to an LLM should carry meaning, not syntax.
For agent workflows, the MCP server is the integration bridge. pakt serve --stdio exposes pakt_compress, pakt_auto, and pakt_inspect through the standard MCP transport, so stdio-based MCP clients can call the same toolset without custom protocol glue. The generic stdio path is verified in-repo; named hosts like Claude Desktop and Cursor are integration targets rather than a certification matrix. pakt_inspect is the recommended first call when deciding whether compression is worth it.
The app surfaces now align on shared layer profiles: Structure only (L1), Standard (L1+L2), Tokenizer-aware (L1+L2+L3), and opt-in Semantic (L1+L2+L3+L4). Semantic mode requires a positive semanticBudget and is explicitly lossy.
JSON (28 tokens) PAKT (15 tokens)
------------------------------ --------------------------
{ @from json
"users": [ @dict
{ "name": "Alice", $a: dev
"role": "dev" }, @end
{ "name": "Bob",
"role": "dev" } users [2]{name|role}:
] Alice|$a
} Bob|$a
Monorepo Structure
This is a pnpm workspace monorepo.
clipforge-PAKT/
packages/
pakt-core/ Core compression engine, CLI, and MCP server
apps/
playground/ Local web playground for trying PAKT inputs
desktop/ ClipForge tray app (Tauri v2 + React)
extension/ Experimental Chrome extension for supported LLM UIs
docs/ Format spec and guides
assets/
pakt-logo.svg Logo assets
Packages
| Package | npm | Description |
|---|---|---|
@sriinnu/pakt |
PAKT compression engine -- the core library with API and CLI |
Quick Start
npm install @sriinnu/pakt
@sriinnu/pakt supports Node 18+. Monorepo development for this repository uses Node 22+.
import { compress, decompress, detect } from '@sriinnu/pakt';
// Compress JSON to PAKT
const result = compress('{"users": [{"name": "Alice", "role": "dev"}, {"name": "Bob", "role": "dev"}]}');
console.log(result.compressed);
console.log(`Saved ${result.savings.totalPercent}% tokens`);
// Decompress back to JSON
const original = decompress(result.compressed, 'json');
console.log(original.text);
// Detect input format
const detected = detect('name,role\nAlice,dev');
console.log(detected.format); // 'csv'
See the pakt-core README for comprehensive API documentation, CLI usage, format specification, and examples.
Core CLI example for opt-in lossy packing:
npx @sriinnu/pakt compress data.json --semantic-budget 120
Release-facing benchmark numbers live in docs/BENCHMARK-SNAPSHOT.md.
For LLM round-trips, the core package now also exposes interpretModelOutput() so your app can auto-detect PAKT in a model response, repair minor syntax issues, and decompress valid replies back to JSON/YAML/CSV.
Try the hosted playground: pakt-4f9.pages.dev.
Root Workspace Commands
From the repo root, you can install, build, and boot each surface directly:
pnpm install
pnpm build
pnpm build:all
pnpm build:core
pnpm build:playground
pnpm build:extension
pnpm build:desktop:web
pnpm build:desktop
pnpm build:apps
pnpm test:core
pnpm test:playground
pnpm dev:playground
pnpm dev:extension
pnpm dev:desktop:web
pnpm dev:desktop
pnpm start:mcp
Local surface entrypoints:
pnpm dev:playground # local playground
pnpm dev:extension # extension dev build
pnpm dev:desktop:web # desktop frontend only
pnpm dev:desktop # real Tauri desktop shell
pnpm start:mcp # core MCP server over stdio
Playground notes for release testing:
- Mixed-content restores embedded structured blocks semantically; exact original formatting may normalize.
- CSV is not always a win; some already-compact CSV can expand.
- Compare mode now includes an auto-pack lab; table-aware variants unlock for top-level CSV and top-level JSON arrays.
- The playground runs locally in the browser session and does not upload payloads.
- For mixed-content decompress, paste the PAKT-marked output back into the input area, then run
Decompress.
CLI/MCP note:
semanticBudgetnow cleanly opts into lossyL4; if you stay onL1-L3, the pipeline remains lossless.pakt serve --stdionow uses the official MCP SDK stdio transport, and embedders can register the same tools programmatically viaregisterPaktTools().
Key Features
- 4-layer compression pipeline -- Structural (L1), Dictionary (L2), Tokenizer-Aware (L3), and an opt-in budgeted Semantic layer (L4)
- Multi-format support -- JSON, YAML, CSV, Markdown, Plain Text with auto-detection
- Lossless data round-tripping -- L1-L3 preserve data fidelity on decompress; L4 is explicitly lossy
- Typical 30-50% token savings -- Real BPE token counting via gpt-tokenizer
- CLI included --
pakt compress,pakt decompress,pakt auto,pakt inspect,pakt detect,pakt tokens,pakt savings - MCP server included --
pakt serve --stdioexposespakt_compress,pakt_auto, andpakt_inspectover the official MCP SDK stdio transport for agent workflows - Embeddable MCP tools --
registerPaktTools()lets other MCP hosts add the same PAKT toolset without reimplementing schemas or handlers - Small runtime dependency set --
gpt-tokenizer, the MCP SDK, andzod - Full TypeScript support -- All types exported, dual ESM/CJS builds
Development
Prerequisites
Setup
git clone https://github.com/sriinnu/clipforge-PAKT.git
cd clipforge-PAKT
pnpm install
Commands
# Build all packages
pnpm build
# Run all tests
pnpm test
# Run tests in watch mode
pnpm test:watch
# Run benchmarks
pnpm bench
# Clean build artifacts
pnpm clean
Inspiration & Credits
PAKT would not exist without the prior work and ideas of these projects and researchers:
TOON Format
PAKT's core pipe-delimited syntax (Layer 1) is directly inspired by TOON Format v1.3 -- the original compact notation for structured data, created by Nicholas Charlton (@nichochar). TOON demonstrated that structured data can be represented without the syntactic overhead of JSON while remaining unambiguous and machine-parseable. PAKT builds on this foundation by adding multi-format support, a dictionary compression layer, and guaranteed lossless round-tripping. TOON has implementations across Python, TypeScript, Go, Rust, .NET, Elixir, Java, and Julia -- a testament to the strength of its design.
Research
- CompactPrompt (2025) -- Structured prompt compression for financial datasets, showing that redundant content in function-calling prompts can be safely removed.
- LLMLingua-2 (Microsoft, 2024) -- Task-agnostic prompt compression via data distillation, achieving high compression ratios with minimal accuracy loss.
- LTSC (2024) -- LLM-driven Token-level Structured Compression, combining structural and token-level techniques for long text workflows.
- LiteToken (2025) -- Lightweight token compression for efficient encoding of structured data in LLM contexts.
- Table Serialization Studies -- Research demonstrating that pipe-delimited formats consistently outperform JSON when presenting tabular data to LLMs.
License
MIT -- Srinivas Pendela
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi