sift-gateway
Health Pass
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 26 GitHub stars
Code Fail
- rm -rf — Recursive force deletion command in .github/workflows/clawhub-sync.yml
Permissions Pass
- Permissions — No dangerous permissions requested
This tool acts as a reliability gateway for AI agents, intercepting outputs from other tools and CLI commands. It standardizes JSON schemas, sanitizes sensitive data, and manages pagination by storing large payloads in a local SQLite database, ultimately saving context window tokens.
Security Assessment
The tool explicitly handles potentially sensitive data by acting as a proxy to sanitize secrets before they reach the AI's context. Because it acts as a gateway, it inherently processes upstream network requests and captures shell command executions. No hardcoded secrets were found, and it does not request dangerous system permissions. The automated scan flagged a `rm -rf` command, but it is safely isolated within a GitHub Actions workflow (`clawhub-sync.yml`) rather than the application's core runtime code. Overall risk is rated as Medium due to its fundamental design of intercepting and storing external command and network outputs.
Quality Assessment
The project is in excellent health. It is licensed under the standard MIT license and has a clear, detailed description. It appears to be actively developed with repository updates pushed as recently as today. With 26 GitHub stars, it has a small but growing level of community trust, and its public benchmarks show a professional approach to demonstrating value.
Verdict
Use with caution (beneficial for managing context windows, but requires trust as it intercepts all upstream tool outputs and executes shell commands).
Reliability gateway for AI tool output: schema-stable, secret-safe, pagination-complete JSON for MCP and CLI agents.
Sift
Reliability gateway for AI tool output: schema-stable, secret-safe, pagination-complete JSON.
Sift is a drop-in reliability layer for MCP and CLI tool output. It persists full payloads as artifacts, returns either inline payload (full) or compact references (schema_ref), and lets agents query what they need with Python code over stored data.
Benchmark summary: on 103 factual questions across 12 real JSON datasets, Sift improved accuracy from 33.0% to 99.0% while cutting input tokens by 95.4% (10,757,230 -> 489,655). Full details: benchmarks/README.md.
How it works
┌─────────────────────┐
MCP tool call ──────────▶│ │──────────▶ Upstream MCP server
CLI command ──────────▶│ Sift │──────────▶ Shell/API command
│ │
│ ┌─────────────┐ │
│ │ Artifacts │ │
│ │ (SQLite) │ │
│ └─────────────┘ │
└─────────────────────┘
│
▼
Small output -> `full` inline
Large output -> `schema_ref`
Agent queries artifacts with code
Flow:
- Execute upstream tool/command and capture JSON.
- Persist full output as an artifact in SQLite and deterministically map schema/root hints.
- Return
full(small) orschema_ref(large/paginated). - Continue pages explicitly until
pagination.retrieval_status == COMPLETE. - Run focused Python queries on one artifact or the full pagination chain.
Main MCP pain points
These are recurring across MCP client issue trackers and protocol usage in production:
- Large tool definitions and large tool results consume context quickly.
- Upstream API pagination often sits outside MCP list-cursor flows, so agents can stop early and answer on partial data.
- Tool output shape differs across servers, which makes follow-up parsing brittle.
- Tool output is untrusted input and can contain sensitive values that should not re-enter model context.
- Raw outputs scroll away in chat history, so provenance and reproducibility degrade across multi-step runs.
Background and references: docs/why.md.
What Sift adds (without changing upstream servers)
- Artifact-backed outputs: keep full data out of prompt context while preserving it losslessly.
- Tool inspection helper: keep mirrored
tools/listdescriptions compact and pull full docs withgateway.inspect_tool. - Schema-aware references:
schema_refreturns query guidance for stable follow-up analysis. - Exact structured retrieval: run Python against stored artifacts instead of relying on prompt-sized payloads.
- Exact structured retrieval via
artifact(action="query", query_kind="code", ...)(MCP) orsift-gateway code(CLI). - Explicit pagination contract: continue with
artifact(action="next_page")orrun --continue-from. - Completion signaling: do not stop until
pagination.retrieval_status == COMPLETE. - Pagination-chain analysis: query one artifact or all related pages (
scope="all_related"; CLI default). - Outbound secret redaction enabled by default before output returns to the model.
MCP vs CLI positioning
- MCP: Sift is a reliability gateway for mirrored tool calls and artifact-based follow-up queries.
- CLI/OpenClaw: same artifact contract for command output (
sift-gateway run+sift-gateway code). - CLI pitfall: ad-hoc extraction can silently scope analysis to partial data (for example, inspecting only one row).
- CLI note: for one-off local extraction, plain
jqcan be enough. Sift is for repeatable, pagination-complete, policy-controlled workflows.
60-second quickstart
MCP clients
pipx install sift-gateway
sift-gateway init --from claude
Restart your MCP client, then use mirrored tools normally.
Supported --from shortcuts: claude, claude-code, cursor, vscode, windsurf, zed, auto, or an explicit config path.
CLI flow
# 1) Capture JSON output as an artifact
sift-gateway run --json -- kubectl get pods -A -o json
# 2) Query artifact data with Python
sift-gateway code --json <artifact_id> '$' --code "def run(data, schema, params): return {'rows': len(data)}"
Use $ when rows are at root. If nested, use metadata.usage.root_path from run --json (or metadata.queryable_roots in MCP schema_ref).
Pagination continuation
sift-gateway run --json --continue-from <artifact_id> -- <next-command-with-next-params-applied>
Do not claim completion until pagination.retrieval_status == COMPLETE.
Python codegen over all pages
For complex questions, generate Python once and run it over the entire pagination chain:
sift-gateway code --json --scope all_related <artifact_id> '$' --file ./analysis.py
CLI default is --scope all_related. Use --scope single for anchor-only analysis.
Benchmarks
Tier 1 result (claude-sonnet-4-6):
| Condition | Accuracy | Input Tokens |
|---|---|---|
| Baseline (context-stuffed) | 34/103 (33.0%) | 10,757,230 |
| Sift | 102/103 (99.0%) | 489,655 |
That is +66.0 points accuracy with 95.4% fewer input tokens on the same question set.
Methodology, scripts, and Tier 2 autonomous-agent results: benchmarks/README.md.
Documentation library
Start here: docs/README.md
Getting started
- Quick Start
- Installation
- Your first artifact (CLI)
- Your first artifact (MCP)
- Adding MCP servers after initial setup
- Troubleshooting
Core contracts
- API Contracts
- Mirrored Response Contract (
fullvsschema_ref) - Response Mode Selection
- Pagination Metadata
- Code Query Contract
- CLI output contract
- CLI default scope (
all_related)
Operations and security
- Deployment Guide
- Authentication tokens
- Outbound secret redaction
- Configuration Reference
- Code query runtime
- Error Contract
- Security policy
Patterns and deep dives
- Recipes
- Pagination chain (CLI)
- Pagination chain (MCP)
- Architecture
- Pagination model
- Observability
- Why Sift exists
- OpenClaw integration pack
- Upstream registration design
Security
See SECURITY.md for threat model and hardening guidance.
Development
git clone https://github.com/lourencomaciel/sift-gateway.git
cd sift-gateway
uv sync --extra dev
uv run python -m pytest tests/unit/ -q
Full contributor workflow: CONTRIBUTING.md
License
MIT - see LICENSE.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found