context-runtime
Health Gecti
- License — License: AGPL-3.0
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 86 GitHub stars
Code Gecti
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Context Runtime — a database query planner for LLM context. Decides what a model sees before it answers; plans it, runs it through reused substrate, and learns from the outcome.
Context Runtime
An efficiency optimizer for a fleet of apps — a database query planner for LLM
context. The application says "I need an answer"; the runtime decides what the model
sees — what to retrieve, compress, route, and verify — emits an inspectable,
replayable plan, and learns from the outcome. It does for AI context what query
planners did for SQL. See POSITIONING.md for the thesis.
It optimizes any app with (a) a decision point about what context/config to use and
(b) a measurable outcome. Eleven tenants are built and green (each number is the
learned-vs-baseline reward its offline examples/<tenant>.py prints):
| Tenant | Context Runtime tunes | Result |
|---|---|---|
| sidekick | which skills to recall · budget | drop-in for SkillStore; 67% vs 33% naive baseline acceptance |
| redevops-rag | pool · limit · threshold · rerank … per query |
ContextRuntimeRetrieverTuner; 0.773 vs 0.323 vs fixed default |
| edge-sentinel (SOC) | which sources to pull per alert (CrowdSec · threat-intel · EDR) | tool-using + approval-gated; 0.900 vs 0.800 always-full baseline |
| growth-engine | which attribution window + source bundle per lead-source query | 7.851 vs 5.282 vs fixed window |
| control-tower | which Metabase query set per "ask anything" question | 5.326 vs 1.643 vs core query set |
| agentic-billing | which usage/invoice/dunning signals to pull per account | 4.122 vs 2.442 vs full-stack |
| social-autopilot | which channel/timing/content strategy per goal | 3.875 vs 0.773 vs fixed strategy |
| agentic-support | which KB/tickets/account context to retrieve per ticket | 3.679 vs 2.394 vs full-context |
| agentic-books | which ledgers/reports to pull per books question | 3.632 vs 2.430 vs full-books |
| market-radar | which competitor watches to sweep per intel question | 3.611 vs 0.403 vs full-sweep |
| agentic-compliance | which rule-family evidence to pull per finding | 3.562 vs 2.463 vs full-evidence |
PYTHONPATH=. python examples/sidekick_learning.py # discrete-strategy bandit
PYTHONPATH=. python examples/rag_tuning.py # numeric-knob tuning
PYTHONPATH=. python examples/soc_triage.py # tool-using cybersecurity tenant
Plus the ToolPlugin seam (context_runtime/tools/ — how plans reach external systems,
with an approval-gated audit trail) and trace exporters (context_runtime/observability/ exporters.py — JSONL offline, or Langfuse / OpenLLMetry-OTel when the extras are
installed).
Status: v0.1 vertical slice. Runs fully offline with stub plugins; the real
redevops-rag retrieval and LiteLLM
model bindings are wired and lazy-imported. See SPEC.md §10 for the
conformance checklist these tests assert against.
Install
pip install -e . # core (offline stub path, zero heavy deps)
pip install -e ".[litellm]" # real models across 100+ providers
pip install -e ".[rag]" # redevops-rag — single-hop hybrid retrieval
pip install -e ".[hipporag]" # HippoRAG — multi-hop graph retrieval (the planner picks per query)
Single-hop vs multi-hop is a per-query decision. The planner classifies intent and
routes: BM25/hybrid (redevops-rag) when the answer is in one chunk, graph (HippoRAG)
when it lives in the connections between documents — and the cost model only pays the
graph premium when it's warranted. python examples/hop_routing.py shows single-hop
missing the bridge document that multi-hop surfaces.
30-second tour
from context_runtime import ContextRuntime, SourceRef
rt = ContextRuntime.default(docs) # offline: stub model + in-memory store
# RUN — the core abstraction (plan → build_context → execute → verify)
res = rt.run("Explain why deployment X failed",
sources=[SourceRef("docs", "docs")],
constraints={"max_cost_usd": 2.0, "require_citations": True})
print(res.answer, res.cost_usd, res.trace)
# EXPLAIN — debug the plan like SQL (add analyze=True for EXPLAIN ANALYZE)
ex = rt.explain("Explain why deployment X failed")
print(ex.intent.bucket, len(ex.candidates), ex.chosen.score.total)
# SIMULATE — forecast cost/latency/tokens with confidence intervals, no execution
sim = rt.simulate("Explain why deployment X failed")
print(sim.expected_cost_usd, sim.expected_models, sim.based_on_samples)
Or from the CLI / config:
PYTHONPATH=. python examples/incident_review.py
context-runtime --corpus ./docs run "what's our incident process?"
context-runtime --config context_runtime.yaml explain --analyze "why did deploy X fail?"
What's implemented (v0.1)
| Seam (SPEC) | v0.1 implementation | Real binding (lazy) |
|---|---|---|
| Planner trio (intent/candidate/optimizer) | rule-table intent → candidate gen → heuristic cost model | — (the genuinely new core) |
| Cost model + statistics | PlanScore weighted utility + pg_statistic-style calibration |
learned/neural (v0.3+) |
| Optimizer | knapsack / greedy-by-utility over the feasible set | OR-Tools CP-SAT (v0.2) |
| Execution Graph IR | linear graph carrying branch/loop/rollback kinds | full shapes (v0.4) |
| Scheduler | topo-sort waves | Dagster / cost-aware (v2) |
| Reasoner | SingleShotReasoner (one model) |
mixtures: plan-worker-critic (v0.3+) |
| Model plugin | offline StubModel |
LiteLLM + native cost-tiered routing |
| Retriever/Store | InMemoryStore (keyword) |
redevops-rag (DuckDB+BM25+RRF+rerank) |
| Compression | sidekick clip structural pack |
LLMLingua-2 semantic (v0.1 optional) |
| Verifier | citation/grounding check | RAGAS / Instructor |
| Observability | in-process Trace + JSON |
OpenLLMetry → Langfuse |
| Plan Cache | null/always-miss stub | semantic cache (v0.2) |
Architecture
The decision layer is thin; the substrate is reused. See:
- ARCHITECTURE.md — the layered design and the cost-based optimizer loop
- SPEC.md — the normative interface contracts (six plugin seams, IR, trace, plan-cache key)
- ROADMAP.md — v0.1 → v2 phasing with per-phase exit benchmarks
Test
pip install -e ".[dev]" && pytest # 18 tests; test_conformance.py == SPEC §10
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi