context-runtime

mcp
Guvenlik Denetimi
Gecti
Health Gecti
  • License — License: AGPL-3.0
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 86 GitHub stars
Code Gecti
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions — No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

Context Runtime — a database query planner for LLM context. Decides what a model sees before it answers; plans it, runs it through reused substrate, and learns from the outcome.

README.md

Context Runtime

An efficiency optimizer for a fleet of apps — a database query planner for LLM
context. The application says "I need an answer"; the runtime decides what the model
sees — what to retrieve, compress, route, and verify — emits an inspectable,
replayable plan, and learns from the outcome. It does for AI context what query
planners did for SQL. See POSITIONING.md for the thesis.

It optimizes any app with (a) a decision point about what context/config to use and
(b) a measurable outcome. Eleven tenants are built and green (each number is the
learned-vs-baseline reward its offline examples/<tenant>.py prints):

Tenant Context Runtime tunes Result
sidekick which skills to recall · budget drop-in for SkillStore; 67% vs 33% naive baseline acceptance
redevops-rag pool · limit · threshold · rerank … per query ContextRuntimeRetrieverTuner; 0.773 vs 0.323 vs fixed default
edge-sentinel (SOC) which sources to pull per alert (CrowdSec · threat-intel · EDR) tool-using + approval-gated; 0.900 vs 0.800 always-full baseline
growth-engine which attribution window + source bundle per lead-source query 7.851 vs 5.282 vs fixed window
control-tower which Metabase query set per "ask anything" question 5.326 vs 1.643 vs core query set
agentic-billing which usage/invoice/dunning signals to pull per account 4.122 vs 2.442 vs full-stack
social-autopilot which channel/timing/content strategy per goal 3.875 vs 0.773 vs fixed strategy
agentic-support which KB/tickets/account context to retrieve per ticket 3.679 vs 2.394 vs full-context
agentic-books which ledgers/reports to pull per books question 3.632 vs 2.430 vs full-books
market-radar which competitor watches to sweep per intel question 3.611 vs 0.403 vs full-sweep
agentic-compliance which rule-family evidence to pull per finding 3.562 vs 2.463 vs full-evidence
PYTHONPATH=. python examples/sidekick_learning.py   # discrete-strategy bandit
PYTHONPATH=. python examples/rag_tuning.py          # numeric-knob tuning
PYTHONPATH=. python examples/soc_triage.py          # tool-using cybersecurity tenant

Plus the ToolPlugin seam (context_runtime/tools/ — how plans reach external systems,
with an approval-gated audit trail) and trace exporters (context_runtime/observability/ exporters.py — JSONL offline, or Langfuse / OpenLLMetry-OTel when the extras are
installed).

Status: v0.1 vertical slice. Runs fully offline with stub plugins; the real
redevops-rag retrieval and LiteLLM
model bindings are wired and lazy-imported. See SPEC.md §10 for the
conformance checklist these tests assert against.

Install

pip install -e .                 # core (offline stub path, zero heavy deps)
pip install -e ".[litellm]"      # real models across 100+ providers
pip install -e ".[rag]"          # redevops-rag — single-hop hybrid retrieval
pip install -e ".[hipporag]"     # HippoRAG — multi-hop graph retrieval (the planner picks per query)

Single-hop vs multi-hop is a per-query decision. The planner classifies intent and
routes: BM25/hybrid (redevops-rag) when the answer is in one chunk, graph (HippoRAG)
when it lives in the connections between documents — and the cost model only pays the
graph premium when it's warranted. python examples/hop_routing.py shows single-hop
missing the bridge document that multi-hop surfaces.

30-second tour

from context_runtime import ContextRuntime, SourceRef

rt = ContextRuntime.default(docs)          # offline: stub model + in-memory store

# RUN — the core abstraction (plan → build_context → execute → verify)
res = rt.run("Explain why deployment X failed",
             sources=[SourceRef("docs", "docs")],
             constraints={"max_cost_usd": 2.0, "require_citations": True})
print(res.answer, res.cost_usd, res.trace)

# EXPLAIN — debug the plan like SQL (add analyze=True for EXPLAIN ANALYZE)
ex = rt.explain("Explain why deployment X failed")
print(ex.intent.bucket, len(ex.candidates), ex.chosen.score.total)

# SIMULATE — forecast cost/latency/tokens with confidence intervals, no execution
sim = rt.simulate("Explain why deployment X failed")
print(sim.expected_cost_usd, sim.expected_models, sim.based_on_samples)

Or from the CLI / config:

PYTHONPATH=. python examples/incident_review.py
context-runtime --corpus ./docs run "what's our incident process?"
context-runtime --config context_runtime.yaml explain --analyze "why did deploy X fail?"

What's implemented (v0.1)

Seam (SPEC) v0.1 implementation Real binding (lazy)
Planner trio (intent/candidate/optimizer) rule-table intent → candidate gen → heuristic cost model (the genuinely new core)
Cost model + statistics PlanScore weighted utility + pg_statistic-style calibration learned/neural (v0.3+)
Optimizer knapsack / greedy-by-utility over the feasible set OR-Tools CP-SAT (v0.2)
Execution Graph IR linear graph carrying branch/loop/rollback kinds full shapes (v0.4)
Scheduler topo-sort waves Dagster / cost-aware (v2)
Reasoner SingleShotReasoner (one model) mixtures: plan-worker-critic (v0.3+)
Model plugin offline StubModel LiteLLM + native cost-tiered routing
Retriever/Store InMemoryStore (keyword) redevops-rag (DuckDB+BM25+RRF+rerank)
Compression sidekick clip structural pack LLMLingua-2 semantic (v0.1 optional)
Verifier citation/grounding check RAGAS / Instructor
Observability in-process Trace + JSON OpenLLMetry → Langfuse
Plan Cache null/always-miss stub semantic cache (v0.2)

Architecture

The decision layer is thin; the substrate is reused. See:

  • ARCHITECTURE.md — the layered design and the cost-based optimizer loop
  • SPEC.md — the normative interface contracts (six plugin seams, IR, trace, plan-cache key)
  • ROADMAP.md — v0.1 → v2 phasing with per-phase exit benchmarks

Test

pip install -e ".[dev]" && pytest      # 18 tests; test_conformance.py == SPEC §10

Yorumlar (0)

Sonuc bulunamadi