Promptise Foundry

The production framework for agentic AI systems.

Every other framework gives you an LLM wrapper.
Promptise Foundry gives you the stack behind it.

Documentation · Quick Start · Showcase · Discussions

Agents that survive production need more than a prompt and a tool list.

They need MCP-native tool discovery. A reasoning engine you can shape. Memory you can trust. Guardrails that actually fire. Governance that enforces budgets. A runtime that recovers from crashes. Promptise Foundry ships all of it as one coherent framework — built for engineering teams who are done assembling AI infrastructure from ten half-finished libraries.

Get started in 30 seconds

pip install promptise

import asyncio
from promptise import build_agent, PromptiseSecurityScanner, SemanticCache
from promptise.config import HTTPServerSpec
from promptise.memory import ChromaProvider

async def main():
    agent = await build_agent(
        model="openai:gpt-5-mini",
        servers={
            "tools": HTTPServerSpec(url="http://localhost:8000/mcp"),
        },
        instructions="You are a helpful assistant.",
        memory=ChromaProvider(persist_directory="./memory"),
        guardrails=PromptiseSecurityScanner.default(),
        cache=SemanticCache(),
        observe=True,
    )

    result = await agent.ainvoke({
        "messages": [{"role": "user", "content": "What's the status of our pipeline?"}]
    })
    print(result["messages"][-1].content)
    await agent.shutdown()

asyncio.run(main())

_{One call. Auto tool discovery from MCP servers. Memory auto-searched before every invocation.

Guardrails block injection and redact PII. Semantic cache serves similar queries instantly. Full observability.}

Five pillars. One framework.

_{Each pillar replaces an entire category of libraries you would otherwise assemble yourself.}

01 _🤖	Agent Turn any LLM into a production-ready agent with one function call. Replaces: LangChain + a guardrails library + an output validator + a vector-store wrapper + a retry helper. `build_agent()` · auto MCP tool discovery · semantic tool optimization (40–70% fewer tokens) · 3 memory providers with auto-injection · 4 conversation stores · 6-head security scanner · semantic cache with per-user isolation · sandboxed code execution · auto-approval classifier · pluggable RAG · streaming · model fallback · adaptive strategy. Agent docs →

02 _🧠	Reasoning Engine Compose reasoning the way you compose code. Not a black box. Replaces: hand-rolled LangGraph wiring, bespoke planner/executor loops, ReAct-from-scratch. `PromptGraph` with 20 node types — 10 standard (`PromptNode`, `ToolNode`, `RouterNode`, `GuardNode`, `ParallelNode`, `LoopNode`, `HumanNode`, `TransformNode`, `SubgraphNode`, `AutonomousNode`) and 10 reasoning (`ThinkNode`, `PlanNode`, `ReflectNode`, `CritiqueNode`, `SynthesizeNode`, `ValidateNode`, `ObserveNode`, `JustifyNode`, `RetryNode`, `FanOutNode`). 7 prebuilt patterns (`react`, `peoatr`, `research`, `autonomous`, `deliberate`, `debate`, `pipeline`). 18 node flags for typed capabilities. Agent-assembled paths from a node pool. Lifecycle hooks. Skill registry. JSON serialization. Reasoning docs →

03 _🔧	MCP Server SDK Production server and native client for the Model Context Protocol. Replaces: rolling your own tool server. What FastAPI is to REST, this is to MCP. `@server.tool()` with auto-schema from type hints · JWT + OAuth2 + API key auth · role/scope guards · 12+ middleware (rate limit, circuit breaker, audit, cache, OTel) · HMAC-chained audit logs · priority job queue with retries and progress · versioning + transforms · OpenAPI import · `MCPMultiClient` federation · live 6-tab dashboard · `TestClient` for in-process testing · 3 transports (stdio, HTTP, SSE). MCP docs →

04 _⚡	Agent Runtime The operating system for autonomous agents. Replaces: Celery + cron + a state store + your own crash recovery + a governance layer. 5 trigger types (cron, webhook, file watch, event, message) · crash recovery via journal replay · 5 rewind modes · 14 lifecycle hooks · budget enforcement with tool costs · health monitoring (stuck, loop, empty, error rate) · mission tracking with LLM-as-judge · secret scoping with TTL and zero-fill revocation · 14 meta-tools for self-modifying agents · 37-endpoint REST API with typed client · live agent inbox · distributed multi-node coordination. Runtime docs →

05 _✨	Prompt Engineering Prompts built like software. Not strings. Replaces: f-strings + `instructor` + ad-hoc few-shot files + prompt sprawl across a codebase. 8 block types with priority-based token budgeting · conversation flows that evolve per phase · 5 composable strategies (`chain_of_thought + self_critique`) · 4 perspectives · 14 context providers auto-injected every turn · SSTI-safe template engine with opt-in shell · 5 guards · SemVer registry with rollback · inspector that traces every assembly decision · test helpers (`mock_llm()`, `assert_schema()`) · `chain`, `parallel`, `branch`, `retry`, `fallback`. Prompts docs →

Why Promptise Foundry?

_{Honest comparison. ✅ native · ⚠️ partial or via adapter · ❌ not supported}

	Promptise	LangChain	LangGraph	CrewAI	AutoGen	PydanticAI
MCP-first tool discovery	✅ Native	⚠️ via adapter	⚠️ via adapter	⚠️ via adapter	⚠️ via adapter	⚠️ via adapter
Native MCP server SDK (auth · middleware · queue · audit)	✅ Full	❌	❌	❌	❌	❌
Composable reasoning graph	✅ 20 nodes · 7 patterns · agent-assembled	❌	✅ Graph-native	⚠️ Crew/Flow	⚠️ GroupChat	❌
Semantic tool optimization (ML selects relevant tools per query)	✅ 40–70% savings	❌	❌	❌	❌	❌
Local ML security guardrails (prompt-injection · PII · creds · NER · content)	✅ 6 heads	❌ external	❌ external	❌	❌	❌
Semantic response cache	✅ Per-user isolated	⚠️ Basic (shared)	⚠️ via LangChain	❌	❌	❌
Human-in-the-loop	✅ 3 handlers + ML classifier	⚠️ Basic	✅ interrupt_before/after	⚠️ `human_input=True`	✅ UserProxyAgent	❌
Sandboxed code execution	✅ Docker · seccomp · gVisor	⚠️ PythonREPL	❌	❌	✅ Docker executor	❌
Crash recovery / replay	✅ 5 rewind modes	❌	✅ Checkpointer	❌	❌	❌
Autonomous runtime (triggers · lifecycle · messaging)	✅ Full OS	❌	⚠️ Persistence only	❌	❌	❌
Budget / health / mission governance	✅ Built-in	❌	❌	❌	❌	❌
Live agent conversation (inbox · ask)	✅	❌	❌	❌	❌	❌
Orchestration REST API	✅ 37 endpoints + typed client	❌	❌	❌	❌	❌

_{LangGraph's checkpointer gives it genuine replay; AutoGen ships a real Docker code executor; LangChain has a basic semantic cache.

Promptise unifies every row above — one dependency, one type-checked API, one runtime.}

Benchmarks

_{Apples-to-apples. Same model, same 40-tool MCP server, same prompts, fresh agent per run.}

6 frameworks × 30 tasks × 5 repeats = 900 real agent runs

Promptise · LangGraph · LangChain · PydanticAI · CrewAI · AutoGen
— all driven by openai:gpt-5-mini at temperature=0.

Tier	Measures	Count
T1 Direct lookup	Can the agent pick the right tool and quote the result?	6
T2 Multi-step	Can it chain 5–7 tools with state carried across calls?	6
T3 Synthesis	Can it reason across 3+ tool outputs?	6
T4 Tool selection	Can it disambiguate across 40 tools in 7 namespaces?	6
T5 Autonomous reasoning	Can it decompose a goal, branch on intermediate results, re-plan on failure, and synthesize evidence-grounded answers?	6

We measure latency (median, p95), tokens (in/out), cost, tool-call count, tool precision, factual accuracy (LLM-as-judge, 0–5), and hallucination rate — for every framework, on every task, every time. The full trace (answers, tool calls, judge rationales) is committed as raw JSON under benchmarks/results/. Nothing is cherry-picked.

export OPENAI_API_KEY=sk-...
./benchmarks/reproduce.sh           # end-to-end: start server, run 900 agents, regenerate RESULTS.md

→ benchmarks/RESULTS.md · benchmarks/README.md (fairness protocol + honesty guarantees)

Model-agnostic

_{Any LLM, one string. Or any LangChain BaseChatModel. Or a FallbackChain across providers.}

build_agent(model="openai:gpt-5-mini", ...)
build_agent(model="anthropic:claude-sonnet-4-20250514", ...)
build_agent(model="ollama:llama3", ...)
build_agent(model="google:gemini-2.0-flash", ...)

Deploy autonomous agents

_{Triggers, budgets, health checks, missions, secrets — all in Python.}

from promptise.runtime import (
    AgentRuntime, ProcessConfig, TriggerConfig,
    BudgetConfig, HealthConfig, MissionConfig,
)

async with AgentRuntime() as runtime:
    await runtime.add_process("monitor", ProcessConfig(
        model="openai:gpt-5-mini",
        instructions="Monitor data pipelines. Escalate anomalies.",
        triggers=[
            TriggerConfig(type="cron", cron_expression="*/5 * * * *"),
            TriggerConfig(type="webhook", webhook_path="/alerts"),
        ],
        budget=BudgetConfig(max_tool_calls_per_day=500, on_exceeded="pause"),
        health=HealthConfig(detect_loops=True, detect_stuck=True, on_anomaly="escalate"),
        mission=MissionConfig(
            objective="Keep uptime above 99.9%",
            success_criteria="No P1 unresolved for more than 15 minutes",
            evaluate_every_n=10,
        ),
    ))
    await runtime.start_all()

Documentation

Section	What it covers
Quick Start	Your first agent in 5 minutes
Key Concepts	Architecture, design principles, the five pillars
Building Agents	Step-by-step, simple to production
Reasoning Engine	Graphs, nodes, flags, patterns
MCP Servers	Production tool servers with auth and middleware
Agent Runtime	Autonomous agents with governance
Prompt Engineering	Blocks, strategies, flows, guards
Showcase	Working patterns, end-to-end
API Reference	Every class, method, parameter