tscg
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 7 GitHub stars
Code Warn
- process.env — Environment variable access in benchmark/harness/cli.ts
Permissions Pass
- Permissions — No dangerous permissions requested
This is a deterministic tool-schema compiler designed for LLM agents. It compresses tool definitions to save tokens (50-72% savings) while maintaining or improving model accuracy, operating in under a millisecond with zero external dependencies.
Security Assessment
Overall Risk: Low. The tool does not request any dangerous permissions, execute shell commands, or contain hardcoded secrets. There is no evidence of unwanted network requests. The only flagged issue is a `process.env` access located strictly within a benchmark harness file (`benchmark/harness/cli.ts`), which is a standard practice for test configurations and poses no threat to production environments.
Quality Assessment
The codebase is lightweight (1,200 lines) and recently active, with the last push occurring today. It is properly licensed under the standard MIT license. The developer claims strong quality metrics, including a large automated test suite and zero runtime dependencies, which significantly reduces supply-chain attack risks. However, community visibility and adoption are currently very low, with only 7 GitHub stars. While the internal benchmark results provided in the README are highly detailed, they are entirely self-reported and lack independent external validation at this time.
Verdict
Safe to use, though adopters should be aware that it is a very new, low-visibility project with self-reported performance claims.
TSCG — Deterministic tool-schema compiler for LLM agents. 50-72% token savings, 50 tools in 2.4ms. Phi-4 recovers from 0% to 90% accuracy. 459 tests, zero dependencies, MIT.
TSCG -- Tool-Schema Compression Grammar
Deterministic tool-schema compiler that reduces LLM tool-definition overhead by 50--72% while improving accuracy.
1,200 LOC TypeScript. Zero dependencies. Sub-millisecond. 23KB ESM bundle.
Latest Findings (April 2026)
720-Call E2E Benchmark on Claude Models
Claude Opus 4.7 -- matches-or-beats baseline with 57-63% token savings:
| Tool Count | Baseline | TSCG Balanced | Δ Accuracy | Savings |
|---|---|---|---|---|
| 16 | 70.0% | 77.5% | +7.5pp | 56.9% |
| 43 | 77.5% | 80.0% | +2.5pp | 63.0% |
| 50 | 72.5% | 80.0% | +7.5pp | 62.8% |
Claude Sonnet 4 -- consistent 57-63% compression with robust accuracy:
| Tool Count | Baseline | TSCG Balanced | Δ Accuracy | Savings |
|---|---|---|---|---|
| 16 | 77.5% | 80.0% | +2.5pp | 56.9% |
| 43 | 85.0% | 80.0% | -5.0pp | 63.0% |
| 50 | 77.5% | 77.5% | ±0.0pp | 62.8% |
480-Call MCP Proxy Benchmark (v1.4.1)
480-call extended proxy benchmark (n=40 per cell, 2 seeds, 2 models x 3 tool counts):
| Model | Tools | Baseline | TSCG Proxy | Δ Accuracy | Token Savings |
|---|---|---|---|---|---|
| Opus 4.7 | 16 | 70.0% | 75.0% | +5.0pp | 53.1% |
| Opus 4.7 | 43 | 75.0% | 75.0% | ±0.0pp | 55.8% |
| Opus 4.7 | 50 | 77.5% | 77.5% | ±0.0pp | 55.5% |
| Sonnet 4 | 16 | 80.0% | 77.5% | -2.5pp | 53.1% |
| Sonnet 4 | 43 | 85.0% | 82.5% | -2.5pp | 55.8% |
| Sonnet 4 | 50 | 77.5% | 77.5% | ±0.0pp | 55.5% |
Opus 4.7 matches-or-beats baseline in all conditions; Sonnet 4 within expected CI (max -2.5pp). Both achieve 53-56% token savings.
Tool-Optimizer E2E validation (@tscg/tool-optimizer withTSCG() wrapper, 30 calls, Sonnet 4 @ 16 tools): withTSCG 86.7% vs baseline 80.0% (+6.7pp), 36.6% character savings.
Three Frontier-Model Operator Archetypes
TSCG compression response is model-specific. Three distinct archetypes observed:
- Opus 4.7 -- Operator-HUNGRY -- every operator contributes; balanced (all-8) is optimal
- Sonnet 4 -- Operator-ROBUST -- config-agnostic; 6 of 7 configs near-identical accuracy
- GPT-5.2 -- Operator-SENSITIVE -- CFL helps, CFO hurts; custom config optimal
External Validation -- 4 Independent Benchmarks
TSCG's internal benchmark (TAB -- Tool-Agentic Bench, ~19,000 calls) is independently corroborated by four external benchmarks, including industry-standard evaluation suites:
| Benchmark | Type | Result | Significance |
|---|---|---|---|
| BFCL (Berkeley Function Calling Leaderboard) | Industry standard | 108--181% ARR across 3 frontier models | Sonnet 4: 85.7%→93.2% (+7.5pp), GPT-4o: 31.7%→57.4% (+25.7pp), GPT-5.2: 61.9%→89.4% (+27.5pp) |
| ToolBench (Qin et al.) | Academic benchmark | +5.0pp (75.0%→80.0%) | Real-world tool catalog, 20 tools |
| API-Bank (Li et al.) | Academic benchmark | -5.0pp (80.0%→75.0%) | Honest negative result -- not all benchmarks improve |
| Real MCP Server (@modelcontextprotocol/server-filesystem) | Production endpoint | 100% syntactic validity | 30 tasks on live MCP server, server-acceptance 90--97% |
TAB → Real MCP Transfer (0.1pp): The internal TAB benchmark is not merely a self-constructed evaluation -- it demonstrably predicts real-world MCP behavior within 0.1 accuracy points. Sonnet 4 on 43-tool MCP: synthetic TAB delta = -1.6pp vs real MCP delta = -1.7pp. This tight transfer validates TAB as a reliable proxy for production MCP deployments.
Mean across the 3 external catalog benchmarks: +2.5pp (80.2%→82.7%).
See paper for full methodology and per-benchmark analysis.
The Problem
Every LLM agent framework sends full JSON Schema definitions for every registered tool on every API call. Claude Code injects ~50,000 tokens of tool definitions per subprocess. At production scale (100K calls/day), the schema overhead alone costs >$30,000/month.
Worse: small models (4B--14B) cannot parse JSON-format tool schemas reliably at scale -- achieving 0--49% accuracy with >15 tools. This locks agentic capabilities behind expensive frontier APIs.
Key Results
Pareto Dominance: Better Accuracy AND Fewer Tokens
BFCL (Berkeley Function Calling Leaderboard) validation -- the industry standard for tool-calling evaluation:
| Model | Without TSCG | With TSCG | Improvement | Token Savings |
|---|---|---|---|---|
| Claude Sonnet 4 | 85.7% | 93.2% | +7.5pp | 46.8% |
| GPT-4o | 31.7% | 57.4% | +25.7pp (181% ARR) | 2.6% |
| GPT-5.2 | 61.9% | 89.4% | +27.5pp (144% ARR) | 8.3% |
Every model improves. TSCG achieves 108--181% Accuracy Retention Rate -- it doesn't just retain accuracy, it increases it.
Small Model Enablement
| Model | JSON Baseline (20 tools) | With TSCG | Recovery |
|---|---|---|---|
| Phi-4 14B | 0% | 84.4% | +84.4pp |
| Mistral 7B | 35% | 80.1% | +45.1pp |
| Gemma 3 4B | 49.9% | 67.0% | +17.1pp |
Seven small models (4B--14B) that achieve 0--49% accuracy on JSON tools recover to 65--90% with TSCG. The root cause: JSON format, not model capacity (R^2 = 0.88 against JSON baselines, collapses to 0.03 against text -- 97% of variance is format sensitivity).
Full Benchmark Summary
From ~19,000 API calls across 12 models (4B--32B + 3 frontier APIs), 5 scenarios:
| Finding | Detail |
|---|---|
| Token savings | 50--72% on tool schemas |
| BFCL validation | 108--181% Accuracy Retention Rate |
| Formal guarantee | >=51% savings on any well-formed schema (Theorem 3.1) |
| Predictive model | R^2 = 0.88 predicts TSCG benefit from single baseline measurement |
| Speed | 50 tools in 2.4ms (Node.js v24, commodity hardware) |
| Cost at scale | >$30,000/month savings at 100K calls/day |
Verified Performance (Fresh Install)
Independent reproduction on @tscg/core from npm:
| Metric | Measured |
|---|---|
| 5 realistic tools (Claude target) | 59.5% token savings |
| 50 tools | 66.6% savings in 2.4ms |
| Compression time (5 tools) | 0.9ms |
| Unit tests | 108 passing (core 47 + proxy 61) |
| Bundle | 34.7KB (11.7KB gzipped) |
| Dependencies | 0 |
What TSCG Does
TSCG applies 8 formally-defined transforms grounded in how causal transformers process tokens:
| Principle | Full Name | What It Does |
|---|---|---|
| TAS | Tokenizer-Aligned Syntax | Optimizes for BPE boundaries |
| CFL | Constraint-First Layout | Exploits the attention sink at position 0 |
| CFO | Causal-Flow Ordering | Orders operations into causal chains |
| SDM | Semantic Density Maximization | Removes 104+ filler patterns |
| DRO | Delimiter-Role Optimization | Converts verbose phrases to compact delimiters |
| CCP | Closure-Context Preservation | Appends closure block for recency bias |
| CAS | Causal Access Scoring | Scores and reorders by parameter fragility |
| SAD-F | Selective Anchor Duplication | Budget-constrained anchor duplication |
Quick Start
All three @tscg/* packages use umbrella versioning -- same version number, released together.
npm install @tscg/core # Core compression engine
npm install @tscg/mcp-proxy # Transparent MCP middleware
npm install @tscg/tool-optimizer # LangChain / Vercel AI SDK integrations
import { compress } from '@tscg/core';
const tools = [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get the current weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string', description: 'City name or coordinates' },
units: { type: 'string', enum: ['celsius', 'fahrenheit'] },
},
required: ['location'],
},
},
},
];
const result = compress(tools, { model: 'claude-sonnet' });
console.log(result.compressed);
console.log(`Saved ${result.metrics.tokens.savingsPercent}% tokens`);
// => "get_weather(location:str units?:str[celsius|fahrenheit])|Get current weather"
// => "Saved 62.3% tokens"
Result Object
const result = compress(tools, { model: 'claude-sonnet', profile: 'balanced' });
result.compressed // string — compressed tool definitions
result.metrics.tokens.original // number — original token count
result.metrics.tokens.compressed // number — compressed token count
result.metrics.tokens.savingsPercent // number — e.g. 62.3
result.metrics.compressionTimeMs // number — e.g. 0.9
result.appliedPrinciples // string[] — e.g. ['SDM', 'CAS', 'DRO', 'TAS']
result.metrics.perTool // { name, originalTokens, compressedTokens, savingsPercent }[]
Options
compress(tools, {
model: 'claude-sonnet', // Target model: 'claude-sonnet' | 'gpt-4o' | 'gpt-4' | ...
profile: 'balanced', // Profile: 'conservative' | 'balanced' | 'aggressive' | 'auto'
});
Description-Only Mode (v1.4.0)
Compress only .description fields while preserving the full JSON Schema structure -- compatible with native tool-calling APIs (OpenAI, Anthropic, Google):
import { compressDescriptions } from '@tscg/core';
const result = compressDescriptions(tools, { model: 'claude-sonnet' });
console.log(result.tools); // Tools with compressed descriptions
console.log(result.metrics.descriptions.savingsPercent); // ~25-40% description savings
Auto Profile (v1.4.0)
The auto profile selects compression principles based on catalog size. At >=30 tools, CFL/CFO are automatically disabled (they become harmful at scale per our 100-tool benchmark findings):
compress(tools, { model: 'claude-sonnet', profile: 'auto' });
Packages
| Package | Description | Install |
|---|---|---|
@tscg/core |
Core compression engine (8 operators) | npm i @tscg/core |
@tscg/mcp-proxy |
MCP stdio proxy -- transparent TSCG compression for any MCP server | npm i @tscg/mcp-proxy |
@tscg/tool-optimizer |
LangChain, MCP, Vercel AI SDK integrations | npm i @tscg/tool-optimizer |
CLI
# Compress tool schemas
npx tsx cli/tscg.ts compress --input tools.json --model claude-sonnet --profile balanced
# Run benchmarks
npx tsx cli/tscg.ts benchmark --model claude-sonnet
# Show compression info
npx tsx cli/tscg.ts info
MCP Proxy
@tscg/mcp-proxy sits between Claude Code (or any MCP client) and your MCP tool servers, transparently compressing tool schemas:
# Opus 4.7 -- 57-63% savings, +2.5 to +7.5pp accuracy
npx @tscg/mcp-proxy --target=claude-opus-4-7 --server=<your-mcp-command>
# Sonnet 4 -- 57-63% savings, robust accuracy
npx @tscg/mcp-proxy --target=claude-sonnet-4 --server=<your-mcp-command>
Setting --target automatically enables the full compression pipeline validated by our 720-call benchmark. No other flags required.
Legacy mode (backward compatible with v1.0.x):
npx @tscg/mcp-proxy --server=<your-mcp-command>
Integrations
LangChain:
import { withTSCG } from '@tscg/tool-optimizer/langchain';
const optimizedAgent = withTSCG(agent);
Vercel AI SDK:
import { tscgMiddleware } from '@tscg/tool-optimizer/vercel';
TSCG vs Other Approaches
| Property | TSCG | LLMLingua-2 | DSPy / SAMMO |
|---|---|---|---|
| Accuracy effect | Improves (108--181% ARR) | Degrades (-5 to -20%) | Degrades |
| Speed | 2.4ms / 50 tools | ~42s (GPU) | Minutes |
| Dependencies | None | GPU + ML framework | API calls |
| Deterministic | Yes | No | No |
| Formal guarantees | >=51% savings | None | None |
| Bundle size | 34.7KB | Requires PyTorch | Full stack |
| Works offline | Yes | GPU required | API required |
Who Benefits
- Claude Code / Cursor / Windsurf users: ~35K fewer tokens per subprocess
- Local LLM users (Ollama): 7B models become functional tool-use agents with 50+ tools
- Production API deployments: >$30,000/month savings at 100K calls/day
- Multi-agent orchestration: Savings multiply per sub-agent in the chain
- Edge / Mobile / Privacy: EU AI Act compliant local deployment becomes viable
Project Structure
packages/
core/ # @tscg/core — compression engine (8 operators, 47 tests)
mcp-proxy/ # @tscg/mcp-proxy — stdio proxy for MCP servers (61 tests)
tool-optimizer/ # @tscg/tool-optimizer — LangChain, Vercel AI SDK integrations
paper/ # LaTeX source (arXiv version)
cli/ # Unified CLI (compress, benchmark, analyze, info)
benchmark/ # TAB benchmark harness, analysis code, raw data
integrations/ # Framework integration examples
docs/ # Technical documentation
Development
git clone https://github.com/SKZL-AI/tscg.git
cd tscg
npm install
npm run build
npm test # 459 tests
npm run typecheck # Type checking
Paper
TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments
Furkan Sakizli. 2026.
TSCG-paper.pdf -- arXiv preprint (full version, 12 models, ~19,000 API calls, 4-class taxonomy)
LaTeX source is available in paper/.
Citation
@article{sakizli2026tscg,
title={TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments},
author={Sakizli, Furkan},
year={2026},
note={arXiv preprint}
}
Contributing
See CONTRIBUTING.md for development setup, code style, and PR guidelines.
License
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found