Name: starling
Author: jerkeyray

Event-sourced agent runtime for Go.

Replayable runs · Tamper-evident logs · Provider-neutral tools · Production debugging

Every run is an event log.

That's the whole pitch. Starling treats the agent loop as a stream of
typed, append-only events - every prompt, model chunk, tool call,
budget decision, and terminal state - committed to a BLAKE3 hash chain
with a Merkle root over the whole run. The log is the source of truth;
RunResult is just a convenience derived from it.

That single decision is what gives you everything else:

Replay is reading the log back through the same agent wiring and
byte-comparing each re-emitted event. Divergence is a structured
error pointing at the first event that didn't reproduce.
Resume is appending to a chain that didn't reach a terminal
event. The hash chain enforces "nothing was lost in the gap."
Audit is the Merkle root on the terminal event committing to
every leaf - tampering with any earlier event invalidates the
commitment.
Cost control, observability, the inspector, replay tests -
all of them are projections of the same event stream.

If you've worked with event sourcing before, this should sound
familiar. If you've shipped LLM agents before, you know what it costs
to not have this.

What's included

Event-sourced execution: every meaningful runtime action is an event.
Deterministic replay: recorded runs can be replayed without calling the
model or re-running recorded side effects.
Durable event logs: in-memory, SQLite, and Postgres backends with schema
migration and validation helpers.
Provider adapters: OpenAI-compatible APIs, Anthropic, Gemini, Amazon
Bedrock, and OpenRouter.
MCP tools: stdio subprocess and streamable HTTP clients backed by the
official Go MCP SDK.
Tool safety: retries, transient error classification, typed tool errors,
max MCP output caps, and replay-safe side effects.
Hermetic tests: starlingtest ships a scripted provider and replay
assertions so agent tests run without an LLM.
Inspector: dependency-free browser UI for exploring runs and replay
divergence.
HTTP daemon helper: starlingd lets your own agent binary accept runs
over HTTP, stream SSE updates, expose metrics, and mount the inspector.
Observability: metrics wrappers, OpenTelemetry-friendly examples, and
opt-in structured slog output (silent by default; pass
Config.Logger = slog.New(...) to enable).

Install

go get github.com/jerkeyray/[email protected]

Starling is a single Go module. The provider sub-packages
(provider/anthropic, provider/openai, provider/gemini,
provider/bedrock, provider/openrouter) come along with this go get;
no separate install is needed.

Pin a tag rather than tracking main - Starling is in beta and breaking
changes are permitted between beta cuts. See Release policy
and CHANGELOG.md.

Documentation

docs/getting-started.md - install, your
first agent, tools, durable storage, replay.
docs/mental-model.md - what a Run is, when
it terminates, when to use one Run versus many, what replay
actually checks.
docs/faq.md - quick answers to recurring questions.
Cookbook: branching,
manual writes,
multi-turn.
Reference: events,
step primitives,
cost model,
tools,
replay,
contracts,
metrics,
starlingd,
save file,
MCP server.

docs/README.md is the full index.

Quickstart

Single-turn, no tools — the most common shape:

package main

import (
	"context"
	"fmt"
	"os"

	starling "github.com/jerkeyray/starling"
	"github.com/jerkeyray/starling/eventlog"
	"github.com/jerkeyray/starling/provider/openai"
)

func main() {
	prov, err := openai.New(openai.WithAPIKey(os.Getenv("OPENAI_API_KEY")))
	if err != nil {
		panic(err)
	}

	log := eventlog.NewInMemory()
	a := &starling.Agent{
		Provider: prov,
		Log:      log,
		Config:   starling.Config{Model: "gpt-4o-mini"},
	}

	text, err := a.RunOnce(context.Background(), "Give me a three bullet incident summary.")
	if err != nil {
		panic(err)
	}

	fmt.Println(text)
}

RunOnce ignores any tools on the agent and caps the loop at one turn —
ideal for prompt-in/text-out use cases. For tool-using or multi-turn
flows, call Agent.Run (see examples/incident_triage).
Note that MaxTurns counts every model call, so a forced single-tool
flow needs MaxTurns >= 2 (turn 1 emits the tool_use, turn 2 lets the
model respond to the tool result).

Core Model

Agent.Run
  -> provider.Stream
  -> tool execution
  -> budget checks
  -> append-only event log
  -> replay / inspect / resume

Starling treats the event log as the source of truth. The runtime records model
requests, streaming chunks, tool calls, usage, budget decisions, terminal states,
and replay metadata as structured events. Backends validate event ordering,
schema versions, and hash continuity.

Durable Logs

Use SQLite or Postgres when runs must survive process restarts or be inspected
later.

log, err := eventlog.NewSQLite("starling.db")
if err != nil {
	panic(err)
}
defer log.Close()

Durable backends support schema preflight checks, migrations, validation, and
read-only inspection workflows.

Replay And Resume

Replay a recorded run against the same agent wiring:

if err := starling.Replay(ctx, log, runID, a); err != nil {
	if errors.Is(err, starling.ErrNonDeterminism) {
		// Inspect the log for the first diverging event.
	}
	panic(err)
}

Resume continues from a persisted run while preserving call correlation and
budget accounting.

next, err := a.Resume(ctx, runID, "Continue with remediation steps.")

The starlingtest package wires the same machinery into Go tests
without touching a real model:

p := &starlingtest.ScriptedProvider{Scripts: scripts}
a := &starling.Agent{Provider: p, Log: eventlog.NewInMemory(), Config: cfg}
res, _ := a.Run(ctx, "...")
p.Reset()
starlingtest.AssertReplayMatches(t, a.Log, res.RunID, a)

Providers

Provider	Package	Notes
OpenAI-compatible	`provider/openai`	OpenAI, Groq, Together, Ollama, vLLM, LM Studio, Azure OpenAI, and compatible APIs via custom `BaseURL`.
Anthropic	`provider/anthropic`	Messages API support, tool use, thinking/signatures, and prompt caching metadata.
Gemini	`provider/gemini`	Native Gemini adapter for Google models.
Amazon Bedrock	`provider/bedrock`	Native Bedrock ConverseStream adapter with AWS SDK auth, tool use, reasoning, and cache-aware usage.
OpenRouter	`provider/openrouter`	OpenRouter-specific convenience wrapper over the OpenAI-compatible path.

Provider behavior is covered by a conformance suite so adapters share the same
streaming, usage, tool-call, and error contracts.

MCP Tools

Starling can expose remote MCP tools as regular tool.Tool values.

client, err := toolmcp.NewCommand(ctx,
	exec.Command("uvx", "mcp-server-filesystem", "/tmp"),
	toolmcp.WithIncludeTools("read_file", "list_directory"),
	toolmcp.WithMaxOutputBytes(64<<10),
)
if err != nil {
	panic(err)
}
defer client.Close()

tools, err := client.Tools(ctx)
if err != nil {
	panic(err)
}

a := &starling.Agent{
	Provider: prov,
	Log:      log,
	Tools:    tools,
	Config:   starling.Config{Model: "gpt-4o-mini", MaxTurns: 8},
}

Supported transports:

toolmcp.NewCommand(ctx, cmd, opts...) for stdio subprocess servers.
toolmcp.NewHTTP(ctx, endpoint, httpClient, opts...) for streamable HTTP servers.
toolmcp.New(ctx, transport, opts...) for custom transports.

MCP tool calls are wrapped in step.SideEffect, so replay uses the recorded
result instead of contacting the remote MCP server again. Starling currently
supports MCP tools; resources, prompts, and sampling are intentionally deferred.

Budgets And Retries

Budgets can cap input tokens, output tokens, USD cost, and wall-clock runtime.

a := &starling.Agent{
	Provider: prov,
	Log:      log,
	Budget: &starling.Budget{
		MaxInputTokens:  20_000,
		MaxOutputTokens: 4_000,
		MaxUSD:          0.50,
		MaxWallClock:    30 * time.Second,
	},
	Config: starling.Config{Model: "gpt-4o-mini", MaxTurns: 8},
}

Tool retries are explicit and replay-aware:

out, err := step.CallTool(ctx, step.ToolCall{
	CallID:      "fetch-ticket",
	TurnID:      turnID,
	Name:        "fetch_ticket",
	Args:        args,
	Idempotent:  true,
	MaxAttempts: 3,
})

MCP server

starling-mcp exposes the recorded event log to AI assistants
(Claude Desktop, Cursor, Claude Code) over stdio. Read-only by
construction. Once wired into your MCP client, you ask normal
questions about your agent's runs and the model calls the
appropriate tool — list_runs, summarize_run, get_event,
diff_runs, search_runs, etc.

go install github.com/jerkeyray/starling/cmd/starling-mcp@latest

Add to your client config (Claude Desktop shape; Cursor / Claude
Code follow the same pattern):

{
  "mcpServers": {
    "starling": {
      "command": "starling-mcp",
      "args": ["/path/to/runs.db"]
    }
  }
}

Full reference: docs/reference/mcp-server.md.

HTTP Daemon

Use package starlingd when you want your own agent wiring exposed as
a private HTTP service. It provides a bounded in-process queue, async
POST /api/v1/runs, SSE progress streams, run/event read APIs,
/metrics, bearer auth, and an optional inspector mount.

if err := starlingd.Command(buildAgent).Run(os.Args[1:]); err != nil {
	panic(err)
}

Full reference: docs/reference/starlingd.md.

Inspector

go run ./cmd/starling-inspect starling.db

Loopback web UI: runs list with per-row totals, per-event timeline
with a syntax-highlighted JSON detail pane, a /sessions page that
groups runs by Config.SessionID, and a /diff page aligning any
two runs side-by-side by sequence number. Dark by default, theme
toggle in the topbar, hashes and run ids are click-to-copy, no CDN
or JS build step. Runs read-only - Append is impossible on the
inspector's DB handle.

Inspector run detail with timeline and JSON pane

Inspector diff page

CLI

go install github.com/jerkeyray/starling/cmd/starling@latest for the
stock binary, or build a dual-mode binary around
starling.InspectCommand / starling.ReplayCommand to wire your
own agent factory.

Subcommand	What it does
`validate <db> [<runID>]`	Hash-chain + Merkle check, one run or every run.
`export <db> <runID>`	Dump events as NDJSON (pipe into `jq`).
`prune [flags] <db>`	Delete old whole runs after an explicit dry-run.
`inspect [flags] <db>`	Read-only web inspector.
`replay <db> <runID>`	Headless replay. Dual-mode binaries only.
`contracts <db> <file> <runID...>`	Validate explicit runs against a YAML contract file.
`migrate <db>`	Apply pending schema migrations.
`schema-version <db>`	Print the on-disk schema version.
`doctor [<db>]`	Health check: env vars, schema, chain validation.
`--version`	Print the linked starling module version.

Production Checklist

Run make check before release: format, vet, build, race tests, lint, and
vulnerability scan.
Pick a durable log backend for production runs: SQLite for single-node use,
Postgres for shared infrastructure.
Run eventlog preflight and migrations during deploys.
Protect inspector access behind your normal internal auth boundary.
Put starlingd behind TLS, rate limiting, and your normal service auth;
its built-in bearer token is a private-service guard, not a full auth system.
Set explicit budgets for tokens, cost, and wall-clock runtime.
Use idempotent retries and per-call timeouts for tools that touch external
systems.
Use replay regression tests for critical agent workflows.
Store raw provider responses only when your privacy and retention policy
allows it.
Export runs you must keep, then run starling prune --older-than <duration> --confirm <db> as a scheduled retention job. Without --confirm, prune is
a dry-run report.

Examples

Example	What it shows
examples/hello	The smallest end-to-end agent (~50 lines). Start here.
examples/m1_hello	Dual-mode pattern: run / inspect / replay / reset / show.
examples/multi_turn	Chat-style workflow: one Run per user message.
examples/branching	`eventlog.ForkSQLite` to split a recorded run into a counterfactual branch.
examples/manual_writes	Writing events without `Agent.Run`, including the Merkle root.
examples/incident_triage	End-to-end production-style workflow with budgets, replay, resume, metrics, OTel, and durable logs.
examples/mcp_tools	MCP server tools adapted into Starling tools.
examples/m4_inspector_demo	Local run data for the inspector.

Code layout

package starling lives at the module root - that's a Go convention,
not a layout choice. The interesting parts are under sub-packages.

.
├── agent.go, config.go, errors.go, result.go,    Core API: Agent, Config,
│   stream.go, runstream.go, resume.go,           RunResult, Resume, replay
│   replay_api.go, metrics.go, version.go,        wrappers, sentinel errors,
│   *_command.go, *_test.go                       CLI command helpers, tests.
│
├── bench/             benchmarks
├── budget/            pricing tables, USD/token caps
├── cmd/
│   ├── starling/         stock CLI (validate / export / inspect / replay / migrate / doctor)
│   └── starling-inspect/ standalone inspector binary
├── docs/              prose docs (getting-started, mental-model, cookbook, reference)
├── event/             Event / Kind types, per-kind payload schemas
├── eventlog/          append-only log: in-memory / SQLite / Postgres
├── examples/          runnable agents (start with examples/hello)
├── inspect/           inspector server + UI templates + static assets
├── internal/          unexported helpers (cborenc, obs)
├── merkle/            public BLAKE3 Merkle helpers
├── provider/          OpenAI / Anthropic / Gemini / Bedrock / OpenRouter adapters
├── replay/            replay re-execution + Stream
├── starlingtest/      test helpers (ScriptedProvider, AssertReplayMatches)
├── starlingd/         HTTP daemon package + command helper
├── step/              step.Now / step.Random / step.SideEffect, CallTool
└── tool/              Tool interface, Typed[In,Out], Wrap middleware, MCP client

Development

make check

Useful targets:

make test      # race-enabled Go test suite
make lint      # golangci-lint
make vuln      # govulncheck
make inspect   # run the inspector locally
make smoke     # quick end-to-end smoke run

Release policy

Starling is in beta. Versions are tagged v0.x.y-beta.N and
distributed through Go module proxy.

Pin a tag. Don't track main; the working branch may carry
breaking changes between beta cuts.
Breaking changes are permitted between beta tags. Each tag's
delta is recorded in CHANGELOG.md. Until GA there
is no API or wire-format compatibility promise.
Within a tag, breakage is a bug. A pinned beta is reproducible.
Schema versioning for the event log is documented under
Event log schema below - this is the one
surface that has its own forward/back-compat contract.
GA (v1.0.0) will land when the public API surface, event
schema, and replay contract are stable enough to commit to. No
date promised.

Event log schema

event.SchemaVersion is the format version of the events written
into the log. Resume and replay both read this field and refuse runs
written by an unknown schema (ErrSchemaVersionMismatch).

When the constant bumps. Whenever the wire-format of an event
payload, the set of event.Kind values, or the canonical-encoding
rules change in a way that affects the BLAKE3 hash chain.
What consumers must do. Re-pin to the matching beta tag, then
run starling migrate <db> (also exposed in-process as
starling.MigrateCommand) to bring on-disk logs forward. The
starling schema-version <db> command prints the current version.
Compatibility within a major-schema family. Minor bumps must
remain resume-compatible: an older agent binary should be able to
resume a run written by a newer one whenever the new schema is a
superset. Breaking format changes bump the major part and require
the explicit migrate step.
Migrations live in migrate_command.go. Each new on-disk
format ships its forward migration alongside the schema bump in
the same beta.

License

Apache 2.0. See LICENSE.