SourceryKit

SourceryKit is the Python SDK for Provably. Adds a deterministic
eval layer (verifiable guardrails — distinct from proof verification)
to any Python agent: every outbound HTTP call is recorded, every claim
handed off to another agent is evaluated against a trusted Provably query
record, and the policy edge — which endpoints an agent is allowed to talk
to at all — is enforced before the request leaves the process.

Distribution name: provably-sdk (PyPI publish pending — see below)
Import name: provably
Source layout: src/provably/

What it does
Framework coverage
Install
Quick start
Configuration
The five pillars
Public API
Development
Tests
Docker
Security model
Status

What it does

flowchart LR
  agent[Your agent]
  patch[Provably interceptor<br/>monkey-patched requests + httpx]
  reg[("trusted_endpoints<br/><i>your Postgres</i>")]
  ext[(External API)]
  store[("provably_intercepts<br/><i>your Postgres</i>")]
  eval[Eval service<br/>verifiable guardrails]
  rec[("Provably query record<br/><i>Provably backend</i>")]

  agent -- "requests.get / httpx.post" --> patch
  patch -- "1 trusted?" --> reg
  reg -- "yes / BLOCKED" --> patch
  patch -- "2 request" --> ext
  ext -- "3 response" --> patch
  patch -- "4 store row" --> store
  patch -- "response" --> agent
  agent -- "post_handoff(HandoffPayload)" --> eval
  eval -- "evaluate_handoff:<br/>fetch query record" --> rec
  eval -- "PASS / CAUGHT" --> agent

The flow, in order:

Intercept + Police — every outbound requests / httpx / aiohttp
call goes through the SDK's monkey-patched HTTP path. Inside the interceptor, before
the request leaves the process, the URL is checked against the
trusted_endpoints table. If the URL is not registered the call is killed
with RuntimeError("BLOCKED: ...") and never reaches the network.
Capture + Store — if the endpoint is trusted, the request goes out, the
response is captured (status + headers + raw body), canonicalized, and
inserted by the interceptor into provably_intercepts. The agent only
sees the response after the row is written.
Hand off — when an agent finishes its work it builds a typed
HandoffPayload (one HandoffClaim per external call, describing what the
agent claims about that response) and ships it to the next agent / service
via post_handoff(...).
Eval — the receiving service runs evaluate_handoff(payload) (the
SDK's verifiable-guardrails check; not to be confused with proof
verification). For each claim the evaluator pulls the corresponding query
record from the Provably backend and runs one of four deterministic
comparisons (verbatim, field_extraction, schema_type,
range_threshold); the result is PASS, CAUGHT, or ERROR per claim.

Nothing in this loop relies on a model self-evaluating its own output.

Where things live

Component	Hosted by	Notes
`trusted_endpoints` table	You — sits in whatever Postgres `POSTGRES_URL` points to.	The SDK ships the schema (`ensure_trusted_endpoints_table`), the policy check, and CRUD helpers; it does not host the registry. Same DB instance as `provably_intercepts`.
`provably_intercepts` table	You — same Postgres as above.	Append-only. The interceptor inserts one row per outbound HTTP call, keyed by `query_record_id` so claims can be linked back.
Eval service	You — any HTTP service that calls `provably.evaluate_handoff(...)` on the incoming payload.	The SDK gives you the function; you decide where to host it.
Provably query record	Provably — fetched over HTTPS by the eval service using the `integration_api_key` from the handoff payload.	This is the source of truth the evaluator compares each claim against.

Framework coverage

The interceptor patches the central HTTP transport choke points, so coverage of
agent frameworks follows automatically from which library a framework uses
under the hood. As of v0.3.0:

Transport patches

Transport	Patched at
`requests`	module-level `get`/`post` + `Session.send`
`httpx`	module-level `get`/`post` + `Client.send` + `AsyncClient.send`
`aiohttp`	`ClientSession._request` (soft dep — patches only when `aiohttp` is importable)
`botocore` / `urllib3`	pending — see issue #10

Agent / LLM frameworks

Framework	Status	Notes
OpenAI SDK	✅	httpx
Anthropic SDK	✅	httpx
Pydantic AI	✅	delegates to AsyncOpenAI / AsyncAnthropic
LangChain	✅	delegates to provider SDKs
LangGraph	✅	same
LlamaIndex	✅	httpx via OpenAI SDK
AutoGen	✅	AsyncOpenAI
Haystack	✅	migrated to httpx (2024–25)
Phidata / Agno	✅	AsyncOpenAI / `httpx[http2]`
OpenAI Agents SDK	✅	httpx; e2e suite at tests/e2e/test_openai_agents_e2e.py; demo at examples/openai_agents/
Google GenAI	✅	httpx default + optional `aiohttp` extra
LiteLLM	✅	aiohttp transport (default since v1.71)
DSPy	✅	LiteLLM only
smolagents	✅	OpenAI SDK / HF / LiteLLM paths covered
CrewAI	⚠️	OpenAI/Anthropic ✅, LiteLLM fallback ✅, Bedrock provider ❌ (boto3)
AWS Strands	❌	boto3/botocore → urllib3; tracked in issue #10

Out of scope for the HTTP interception layer (separate shipping units):
MCP servers, in-process LLMs (transformers, mlx_lm), gRPC (Google ADK
A2A), websockets, raw sockets.

Install

Status: v0.2 — not yet published to PyPI. Install from source.

# from source (editable, recommended for now)
git clone [email protected]:ProvablyAI/provably-python-sdk.git
pip install -e ./provably-python-sdk

# or build a wheel
cd provably-python-sdk && python -m build
pip install dist/provably_sdk-0.2.0-py3-none-any.whl

When PyPI publishing lands the install will become:

pip install provably-sdk

The intended PyPI distribution name is provably-sdk. The import name is
provably. Requires Python 3.11+.

Quick start

import provably
import requests

# One-call bootstrap: initialize runtime + install interceptor + enable recording
provably.configure_indexing(True)

response = requests.get("https://my-trusted-endpoint.example/data")
record = response.json()

payload = provably.HandoffPayload(
    provably_org_id="my-org",
    integration_api_key="...",
    task="discharge_summary",
    claims=[
        provably.HandoffClaim(
            action_name="lookup_patient",
            claimed_value=record,
            query_record_id="qr_123",
        ),
    ],
)
provably.post_handoff("https://my-eval-service.example", payload)

On the eval-service side:

import provably

result = provably.evaluate_handoff(
    payload,
    provably_base_url="https://api.provably.ai",
)
# outcome is "PASS", "CAUGHT", or "ERROR"
assert result["outcome"] in ("PASS", "CAUGHT", "ERROR")

Configuration

The SDK reads configuration from environment variables. A typed
Provably(api_key=..., org_id=..., ...) client that replaces these globals is
planned (issue #2).

Getting `PROVABLY_API_KEY` and `PROVABLY_ORG_ID`

Sign up at app.provably.ai.
Create an organisation. Its id is what goes in PROVABLY_ORG_ID.
In the left-side menu, go to Integrations and create one. The generated key is your PROVABLY_API_KEY.

Full product docs: provably.ai/docs.

Variable	Used by	Required
`PROVABLY_API_KEY`	`initialize_runtime`, integration cache	yes
`PROVABLY_ORG_ID`	`initialize_runtime`, intercept allow-list	yes
`PROVABLY_RUST_BE_URL`	`initialize_runtime`, evaluator	yes
`POSTGRES_URL`	intercept storage, trusted endpoints, handoff preprocess	yes
`PROVABLY_APP_UI_URL`	optional UI deep-links	no
`PROVABLY_QUERY_RESOLVE_MAX_WAIT_S`	max seconds to wait for a query record to appear (default 15)	no

POSTGRES_URL is a hard dependency today. Three SDK modules open Postgres
directly (provably.intercept._storage, provably.trusted_endpoints,
provably.handoff._preprocess). Issue
#1 tracks moving
them onto a caller-injected connection and making psycopg2-binary optional.

The five pillars

`init`

import provably

# Option A: one-call bootstrap (recommended)
provably.configure_indexing(True)   # bootstrap + interceptor + enable
provably.configure_indexing(False)  # interceptor only, recording off

# Option B: step-by-step
provably.initialize_runtime()   # one-time bootstrap; idempotent per process
provably.init_interceptor()     # install monkey-patches for requests + httpx
provably.enable()               # turn recording on (default after init_interceptor)

initialize_runtime() registers a Provably middleware, onboards the configured
Postgres database, ensures the provably_intercepts collection exists, and
warms an in-memory cache with the integration API key.

configure_indexing(enable_indexing) is the recommended single-call entry point.
Pass True to enable full indexing (bootstrap + intercept + record); pass False
to install the interceptor in passthrough mode (patches installed, recording off).

`intercept`

provably.enable()                   # default after init_interceptor()
provably.disable()                  # stop recording (patch stays installed)
provably.is_enabled()               # bool

provably.set_interceptor_context(   # tag the next intercept rows
    agent_id="cluster_a",
    action_name="lookup_patient",
    intercept_index=0,
)

provably.set_intercept_body_hook(   # optional: (intercept_index, raw) -> what the caller sees
    lambda _idx, raw: {"user_edited": True},
)

provably.set_intercept_url_allowlist(   # scope simulation hook to specific URLs
    ["https://my-trusted-api.example/v1"],
)

The interceptor records every successful requests.get/post and httpx.get/post
into provably_intercepts. The original wire response is stored first; the hook
only affects the object returned to application code, not the stored row.

set_intercept_url_allowlist scopes the simulation body hook to an explicit set
of URLs. Internal SDK calls (bootstrap API, handoff transport) are never passed
to the hook regardless of this setting.

⚠ The interceptor monkey-patches the global requests and httpx modules.
This is intentional — every consumer in the process gets observed
automatically — but it means hosts that need a request-scoped opt-out should
wrap calls in disable() / enable() blocks.

`handoff`

from provably import HandoffPayload, HandoffClaim, post_handoff

payload = HandoffPayload(
    provably_org_id="my-org",
    integration_api_key="key",
    claims=[HandoffClaim(action_name="get", claimed_value=..., query_record_id="qr_1")],
)
post_handoff("https://my-eval-service.example", payload, headers={"x-trace-id": "abc"})

post_handoff POSTs canonical JSON to {base_url}/handoffs/receive and raises
on any non-2xx response.

Convenience builders

build_handoff_payload assembles a HandoffPayload automatically from the
interceptor's in-memory state — no manual claim construction needed:

from provably import build_handoff_payload, post_handoff

# fetch_and_claim is the raw JSON dict the LLM emitted
payload = build_handoff_payload(fetch_and_claim, run_id="run-001")
post_handoff("https://your-verifier.example.com", payload)

claim_contract generates the system-prompt text that tells an LLM how to
emit the correct HandoffClaim JSON shape:

from provably import claim_contract

system_prompt = claim_contract(
    action_names=["lookup_patient", "fetch_records"],
    wrapper_fields={"reasoning": "string"},
)

default_instructions and field_descriptions give you ready-made
instructions and per-field notes to embed in a receiving-agent prompt:

from provably import default_instructions, field_descriptions

# provably_indexing=True when you want the receiver to call the evaluator
instructions = default_instructions(provably_indexing=True)
guide = field_descriptions(provably_indexing=True)

`eval`

from provably import evaluate_handoff

result = evaluate_handoff(payload, provably_base_url="https://api.provably.ai")
# {"outcome": "PASS" | "CAUGHT" | "ERROR", "per_claim": [...], "errors": [...]}

Outcome semantics:

PASS — every claim's content matched its proven indexed value and every proof verified.
CAUGHT — at least one claim disagreed with the indexed value or a proof failed.
ERROR — the evaluator could not run (missing config, Provably backend unreachable, transient server error). Not evidence of tampering — the system was unhealthy, not the agent.

Getting `CAUGHT` and you don't expect to be?

CAUGHT means the indexed value the evaluator pulled from provably_intercepts doesn't match the claim. In practice when this surprises you, it's almost always one of:

The tool body never ran. @function_tool (or any agent-framework decorator) only registers the function — you still need an agent loop (e.g. Runner.run(...)) to invoke it. Bare LLM calls don't execute tools.
intercept_context(...) was called without with. It's a context manager; a bare call is a no-op (see the function's docstring).
agent_id mismatch. The agent_id you pass to intercept_context(...) inside the tool must match the intercept_agent_id you pass to build_handoff_payload(...) (default "fetch_and_claim"). Mismatch → the lookup misses → empty request_payload.
Wrong row-id helper. Use get_intercept_row_id(agent_id, action_name) to pick the row tagged with your action. take_last_intercept_row_id() returns the globally last insert (typically the final LLM POST), which is rarely what you want.

Comparison modes (the VerificationMode type):

Mode	Comparison
`verbatim`	Canonical-JSON equality between `claimed_value` and the indexed payload (or `json_path` slice).
`field_extraction`	Equality on the value at `json_path` only.
`schema_type`	`claimed_value` is ignored; the value at `json_path` is validated against `expected_json_schema`.
`range_threshold`	Numeric `claimed_value` must equal the indexed numeric and lie in `[range_min, range_max]`.

Outcome helpers

from provably import outcome_from_trace, aggregate_outcome

# Extract verdict from a raw evaluate_handoff result dict
verdict = outcome_from_trace(result)   # "PASS" | "CAUGHT" | None

# Roll up verification_results from a HandoffPayload
verdict = aggregate_outcome(payload)   # "PASS" | "CAUGHT"

`trusted_endpoints`

import psycopg2
from provably import (
    is_trusted_endpoint,
    list_trusted_endpoints,
    normalize_url_for_trust,
    ensure_trusted_endpoints_table,
)

conn = psycopg2.connect("...")
ensure_trusted_endpoints_table(conn)
ok = is_trusted_endpoint("https://api.example.com/v1/data", "my-org", conn)
rows = list_trusted_endpoints(conn, "my-org")

The registry is a single Postgres table (DDL embedded; created on first use).
URLs are normalized (lowercase scheme + host, default ports collapsed, trailing
slash dropped) before any read or write so that https://API.EXAMPLE.COM/x/
and https://api.example.com/x collide on the same row.

Path-pattern entries

Concrete URLs match exactly. To authorize a family of URLs with a single entry —
useful for templated routes like /customers/{id} or runtime-generated ids —
register the URL with FastAPI/Express-style placeholders:

Placeholder	Matches	Example
`{name}`	exactly one path segment (no `/`)	`https://api.example.com/customers/{id}` matches `…/customers/42` but not `…/customers/42/orders`
`{name:path}`	any subtree (including `/` separators)	`https://api.example.com/customers/{rest:path}` matches both `…/customers/42` and `…/customers/42/orders`

The placeholder name (id, rest, …) is purely descriptive and does not affect
matching. Plain URLs without { characters keep exact-match semantics — no
behavior change for existing entries.

-- Register a templated route once instead of enumerating every concrete id
INSERT INTO trusted_endpoints (org_id, normalized_url, display_label, entry_type)
VALUES ('my-org', 'https://api.example.com/customers/{id}', 'Customers (by id)', 'endpoint');

is_trusted_endpoint and the snapshot tamper-check inside evaluate_handoff
both honor the same matching rules, so a claim against …/customers/42 will
pass both gates when only the templated entry is registered.

Public API

All public symbols are re-exported from the top-level provably namespace. See
src/provably/__init__.py for the full list.

from provably import (
    # init
    initialize_runtime,
    configure_indexing,
    # intercept
    init_interceptor, enable, disable, is_enabled,
    set_interceptor_context, set_intercept_body_hook,
    set_intercept_url_allowlist, take_last_intercept_row_id,
    # handoff types
    HandoffPayload, HandoffClaim, HandoffProofAction, HandoffProofBundle,
    BenchmarkRow, Outcome, VerificationMode,
    # handoff transport
    post_handoff,
    # handoff builders
    build_handoff_payload, DEFAULT_HANDOFF_TASK,
    claim_contract, default_instructions, field_descriptions,
    # eval
    evaluate_handoff, extract_indexed_from_query_record,
    outcome_from_trace, aggregate_outcome,
    # trusted endpoints
    is_trusted_endpoint, list_trusted_endpoints,
    check_claim_endpoints_are_trusted, normalize_url_for_trust,
    ensure_trusted_endpoints_table,
)

Development

git clone [email protected]:ProvablyAI/provably-python-sdk.git
cd provably-python-sdk
uv sync --extra dev

uv run ruff check .
uv run pytest
python -m build              # wheel + sdist into ./dist/

The SDK has no fastapi, langgraph, or LLM-vendor dependencies, and CI
should keep it that way — see docs/architecture.md
for the dependency rules.

Tests

The suite is split into two layers:

tests/
  unit/    # fast, hermetic, mocks for httpx + psycopg2
  e2e/     # drives real requests + httpx against a loopback HTTP server

uv run pytest tests/unit       # ~0.2 s
uv run pytest tests/e2e        # ~5 s (real http.server on a loopback port)
uv run pytest                  # both
uv run pytest -m "not e2e"     # unit-equivalent inner loop

E2E tests register routes on a per-test FakeHttpServer and drive the real
requests / httpx patches against it. The Postgres-touching storage layer
is patched per-test, so the suite stays hermetic and runs without a live
database.

Docker

The repo ships a multi-stage Dockerfile. Three build targets are exposed:

Target	What it produces
`builder`	Wheel + sdist in `/dist` (used by the other stages, not run alone).
`test`	Wheel installed + dev tools; `CMD` runs `ruff check && pytest -q`.
`runtime`	Slim image with only the wheel; `CMD` smoke-imports `provably`.

Run the full lint + test suite in a container:

docker build --target test -t provably-sdk:test .
docker run --rm provably-sdk:test

Smoke-import the runtime image:

docker build --target runtime -t provably-sdk:runtime .
docker run --rm provably-sdk:runtime

Use docker-compose.yml for local development runs (no database required —
tests are hermetic):

docker compose run --rm sdk                  # ruff + pytest
docker compose run --rm sdk pytest -q -m e2e # only e2e tests

Security model

The interceptor monkey-patches the global requests.get/post and
httpx.get/post. Hosts in process control are observed automatically; subprocesses
and other languages are not.
Trusted-endpoint enforcement happens before any row is inserted into
provably_intercepts. A GET to an unlisted URL raises
RuntimeError("BLOCKED: ...") and never reaches Postgres.
The evaluator pulls query records over HTTPS using x-api-key from the
payload's integration_api_key. Revoking that key revokes eval access
for all in-flight handoffs.

Status

v0.2 — see CHANGELOG.md. License: Proprietary —
see LICENSE.md.

sourcerykit

SourceryKit

Contents

What it does

Where things live

Framework coverage

Install

Quick start

Configuration

Getting `PROVABLY_API_KEY` and `PROVABLY_ORG_ID`

The five pillars

`init`

`intercept`

`handoff`

Convenience builders

`eval`

Getting `CAUGHT` and you don't expect to be?

Outcome helpers

`trusted_endpoints`

Path-pattern entries

Public API

Development

Tests

Docker

Security model

Status

Yorumlar (0)

SourceryKit

Contents

What it does

Where things live

Framework coverage

Install

Quick start

Configuration

Getting PROVABLY_API_KEY and PROVABLY_ORG_ID

The five pillars

init

intercept

handoff

Convenience builders

eval

Getting CAUGHT and you don't expect to be?

Outcome helpers

trusted_endpoints

Path-pattern entries

Public API

Development

Tests

Docker

Security model

Status

Yorumlar (0)

Getting `PROVABLY_API_KEY` and `PROVABLY_ORG_ID`

`init`

`intercept`

`handoff`

`eval`

Getting `CAUGHT` and you don't expect to be?

`trusted_endpoints`