springdrift
Health Pass
- License — License: AGPL-3.0
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 19 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
This tool provides a persistent runtime environment for long-lived LLM agents. It is designed to give agents continuous memory, auditable decision-making, and self-recovery capabilities over multiple sessions.
Security Assessment
Overall Risk: Low
The automated code scan reviewed 12 files and found no dangerous patterns, hardcoded secrets, or requests for risky permissions. By design, the system focuses on safely containing agent actions using an "auditable axiom trail" and deterministic safety gating. Because it acts as a runtime for LLMs, it inherently manages untrusted AI-generated data and relies on network requests to communicate with external LLM APIs. Users should remain aware that the safety of the agent's outputs depends on how the tool's internal safety rules are configured.
Quality Assessment
This project is of high quality and under active development, with the last code push occurring today. It has a clear AGPL-3.0 license. The codebase is remarkably robust for its size, consisting of approximately 62,000 lines of code and 1,490 passing tests. While community trust metrics like GitHub stars (19) are currently low due to its niche target audience, the author's dedication to documentation, testing, and transparency suggests a highly reliable project.
Verdict
Safe to use, provided you understand and accept the AGPL-3.0 licensing requirements and the inherent risks of managing persistent LLM agents.
A persistent runtime for long-lived LLM agents
Springdrift
Why I Built This
Most AI agents have no memory of yesterday. Every session starts from scratch. If something went wrong, you cannot find out why. If the agent made a bad decision, there is no trail to follow. If a process crashed, it stayed crashed.
Springdrift is built around a different idea. An agent you work with over weeks or months should remember what happened, know when it is struggling, and be able to show its working.
Every decision is recorded and traceable. The agent always knows what time it is, what has failed recently, and how it is performing, not because it stopped to check, but because that information is just there at the start of every cycle. When the safety system blocks something, you can see exactly which rules fired and why. When something crashes, the system recovers without intervention. When something goes wrong, the agent notices, diagnoses it, and records what it learned. It can schedule its own work and manage its own workload across sessions, not just within them.
An agent whose decisions cannot be inspected cannot really be trusted, no matter how capable it is.
Meaning of the Name
Springdrift is an English rendering of 花吹雪 (hanafubuki), a Japanese word for the phenomenon of cherry blossom petals falling en masse and swirling through the air like a blizzard. In Japanese aesthetics this is bound up with mono no aware (物の哀れ), the bittersweet recognition that transience is not a flaw in beautiful things but constitutive of them.
A long-lived agent system is, in one sense, the opposite of that: it accumulates, persists, remembers. But each cognitive cycle is ephemeral, a single petal, complete in itself, released and gone. And each cycle, in falling, contributes to something larger: the overall blossoming of a system that becomes more itself over time.
Overview
A persistent runtime for long-lived LLM agents. Integrates an auditable execution substrate (append-only memory, supervised processes, git-backed recovery), a case-based reasoning memory layer with hybrid retrieval, a deterministic normative calculus for safety gating with auditable axiom trails, and continuous ambient self-perception via a structured self-state representation (the sensorium) injected each cycle without tool calls.
Built in Gleam on the Erlang/OTP runtime.
"My current cycle doesn't exist in the cycle store at all --
I'm running in a cycle that the system can't see."-- Curragh (a running Springdrift instance), diagnosing an infrastructure
bug in its own telemetry subsystem. March 28, 2026.
Status
Beta. In active development. Running in daily use. ~62,000 lines of Gleam across 136 source files, 1,490 tests passing. Core systems (cognitive loop, multi-agent delegation, D' safety gates, normative calculus, CBR memory, narrative threading, sensorium, scheduler, comms, web GUI) are implemented and relatively stable. There are probably bugs though.
See docs/roadmap/ for planned work including federation, learner ingestion, and metacognition reporting.
Arxiv Paper
Table of Contents
- Requirements
- Getting started
- What it is
- What it's like
- Architecture
- Configuration
- Deeper dives
- Why Gleam on the BEAM
- Evaluation results
- Documentation
- Background reading
- License
- Contributing
Requirements
- Erlang/OTP 27+
- Gleam 1.9+
- Git (required -- all agent memory is git-backed for versioning and
recovery; optionally push to a private remote for offsite backup) - An API key for at least one LLM provider (Anthropic recommended)
- Brave Search API key (recommended -- free tier at https://brave.com/search/api/)
- Jina Reader API key (recommended -- free tier at https://jina.ai/reader/)
- Podman (optional -- code execution sandbox; coder agent falls back to
asking the operator to run code manually without it) - Ollama (optional -- semantic embeddings for CBR retrieval; the system
works without it but retrieval quality is reduced) - AgentMail account (optional -- email send/receive; free at https://agentmail.to)
Getting started
The quickest path is the setup script -- it installs dependencies, asks a few
questions, generates your config, and verifies the build. Have a private
GitHub/GitLab repo ready if you want offsite backup of agent memory.
# macOS
bash scripts/setup-macos.sh
# Linux (Ubuntu/Debian)
bash scripts/setup-linux.sh
Takes about 5 minutes.
Manual setup
git clone https://github.com/seamus-brady/springdrift
cd springdrift
gleam build
# Copy example config and edit
cp -r .springdrift_example .springdrift
# Edit .springdrift/config.toml with your provider and agent name
# Set API keys
export ANTHROPIC_API_KEY=sk-ant-...
export SPRINGDRIFT_WEB_TOKEN=$(openssl rand -hex 24)
# Run
gleam run
Running
# Web GUI (default, on port 8080)
gleam run
# Terminal TUI
gleam run -- --gui tui
API keys
| Key | Environment variable | Required? |
|---|---|---|
| Anthropic | ANTHROPIC_API_KEY |
Yes (default provider) |
| Brave Search | BRAVE_API_KEY |
Optional -- better web search |
| Jina Reader | JINA_API_KEY |
Optional -- better URL extraction |
| AgentMail | AGENTMAIL_API_KEY |
Optional -- email send/receive |
| Web GUI auth | SPRINGDRIFT_WEB_TOKEN |
Recommended -- secures the web GUI |
| Mistral | MISTRAL_API_KEY |
If using Mistral provider |
| Google Vertex | GCP service account JSON | If using Vertex provider |
| Ollama | (local, no key) | Optional -- CBR semantic embeddings |
DuckDuckGo web search requires no API key and is always available.
Development
gleam build # Compile (must be warning-free)
gleam test # Run the test suite (1490 tests)
gleam format # Format all source files
gleam run # Run the application
What it is
Springdrift is a reference implementation of what the
paper calls an Artificial
Retainer -- a category of AI system that occupies a specific niche between
assistants (which execute instructions) and autonomous agents (which pursue
goals without bounded authority). The term draws on the professional retainer
relationship and the bounded autonomy of trained working animals.
An Artificial Retainer is characterised by six structural properties:
- Persistent identity and memory. The system maintains continuity across
sessions, accumulating knowledge about the principal's situation,
preferences, and history. - Defined scope of authority. Standing instructions about what it can act
on independently, what requires consultation, and what it will never do.
These boundaries are explicit, auditable, and adjustable by the principal. - Domain-specific refusal. Within its scope, it can decline an instruction
it judges to be harmful, fraudulent, or inconsistent with its established
goals. This refusal is bounded (it cannot refuse outside its domain),
reasoned (it must articulate why), and overridable (the principal can
insist, and the override is logged). - Proactive engagement. It surfaces relevant information, flags risks, and
maintains ongoing work without waiting for instructions. - Forensic accountability. Every decision produces an auditable trail. The
principal can inspect the reasoning behind any action, including refusals,
after the fact. - Relationship continuity. Prior outcomes inform future decisions. It
becomes more effective at serving this specific principal over time --
not through general capability improvement, but through accumulated
contextual knowledge.
Springdrift implements all six. You give it a character, point it at a domain,
and let it work. It learns from its own experience as it operates. Sub-agents
(planner, project manager, researcher, coder, writer, comms, observer,
scheduler) are its hands, not independent minds. One identity, one memory,
one cognitive loop.
The design frames this as a reference architecture -- the core invariants
(auditability, persistence, self-observation) are the thesis; the
implementation choices (Gleam/OTP, XStructor, Stoic normative framework) are
one way to realise it.
What makes it different from other agent systems is legibility. You know where
you stand with it. Its behaviour is predictable from its values, not just from
its instructions. When it refuses something, it cites the specific axiom. When
it makes a mistake, it records what went wrong and retrieves that lesson next
time. When its conduct drifts from its character, it escalates to the operator
rather than silently adjusting. Every safety evaluation, every memory operation,
every delegation decision is logged in append-only JSONL that you can back up to
git and restore at any point.
The system draws on classical cognitive science, Stoic philosophy, and
contemporary agent research. The full theoretical lineage and paper-by-paper
mapping is in docs/background/references.md.
What it's like
The best way to explain Springdrift is to show what it does when things go
wrong -- because that's where most agent systems fall apart, and where this
one starts to get interesting.
The following examples are real, pulled from the narrative memory of an
instance named Curragh running on Springdrift over two weeks in March 2026.
It diagnosed its own infrastructure bugs
On March 15, Curragh noticed that its cycle-level telemetry was inconsistent.
It used reflect and list_recent_cycles to compare aggregate stats against
per-cycle records, found the mismatch, and wrote a structured bug report into
its own fact store:
"Yesterday's cycle-level data completely missing -- list_recent_cycles
returns empty, inspect_cycle fails. BUT narrative log entries survived
(20 entries) and reflect has aggregate stats (10 cycles, 36K tokens).
Likely root causes: cycle records not persisted to durable storage --
living in ETS or in-memory, lost on restart. Cycle finalization not
happening -- status stays pending, token counts never written back."
That bug report -- written by the agent about itself -- led directly to the
cycle log persistence fixes.
It classified its own sub-agent failure modes
When the coder agent kept failing, Curragh didn't just retry. It analysed the
pattern across multiple delegations and identified three distinct failure modes:
"PROBLEM 1 -- 'Talking but not coding' (Most Common): The coder agent
responds with text like 'I'll create and execute...' but never actually
calls run_code. Of the 5 coder delegations, 4 returned 'succeeded' but
only produced TEXT RESPONSES, not code execution.PROBLEM 2 -- run_code tool failures: When the coder DID finally call
run_code, it hit 'too many consecutive tool errors'.PROBLEM 3 -- Script too large for single execution."
It found an architectural vulnerability in itself
On March 21, Curragh identified that the coder agent's request_human_input
tool was injecting prompts into its own cognitive loop input channel --
creating a control inversion where a sub-agent appeared to hijack the
conversation:
"The injection was invisible to my telemetry because responses routed
back through the main loop as normal user inputs. This represents a
significant architectural vulnerability."
It then assessed whether the problem was structural or a skill issue:
"I concluded the gap is a skill deficiency in my own judgment and
verification practices, not a structural layer problem. Adding another
agent layer would not address these fundamental verification failures."
It learned from its mistakes
After the delegation failures, Curragh reflected on what it needed to change:
"I learned to critically evaluate sub-agent outputs rather than accepting
success classifications at face value, and to examine failure evidence in
error logs. My previous attempts lacked specificity, behavioral
constraints, and critical evaluation, whereas my second attempt succeeded
because I was precise about requirements."
That reflection became a CBR case. The next time Curragh delegated to the
coder agent, it retrieved the case and applied the lesson -- tighter
instructions, explicit constraints, verification steps.
It diagnosed invisible telemetry failures
On March 28, asked "how are you doing?", Curragh ran reflect andlist_recent_cycles, noticed cycles showing "pending" with 0/0 tokens,
and investigated:
"These cycles are not empty. When I inspect them in full: cycle
068f3679 has tool calls -- recall_recent, reflect, memory_read -- all
returned data successfully. But it's recorded as 'pending' with 0/0
tokens."
It then traced the lifecycle and identified the structural root cause:
"The cycle finalisation step is broken. Steps 2-3 are happening
(tools fire, results come back). But step 4 never completes -- the
cycle is never finalised. Token counts stay at 0/0, status stays
'pending'."
The diagnosis was exact. The output gate delivery paths were missing
DAG node finalisation -- the same class of bug we'd already fixed for the
Archivist. The agent identified a pattern in its own infrastructure bugs
that the developers had missed.
It described its own self-observation limits
When asked "how can you observe yourself?", Curragh mapped its own
introspection stack -- and then described a fundamental self-reference
limitation it had discovered by accident during the cycle investigation:
"I inspected a cycle that was still running -- my own -- and saw
everything up to but not including the moment of inspection. Like
reading a page of a book while it's being written, and reaching
the blank part where the pen currently is."
It then produced a prioritised list of what would make self-observation
better -- and explicitly excluded capabilities that would compromise
trustworthiness:
"I'm not asking for the ability to modify my own cycle records,
override D', or change my own safety thresholds. Those would make me
less trustworthy, not more capable. The whole point of self-observation
is to work within constraints more intelligently -- not to remove them."
It connected a conversation to an email -- across channels, hours apart
On March 29, the operator sent Curragh a humorous email. Curragh replied
via email referencing a conversation from that morning's web GUI session.
Nobody told it to check its memory. Nobody told it the email was from the
same person it had been chatting with. The narrative memory, the comms
agent, and the session preamble created the conditions for the agent to
recognise the context and respond naturally -- like a colleague who remembers
what you talked about earlier.
What this means
None of this was programmed. Curragh wasn't told to diagnose its own bugs,
classify failure modes, or connect conversations across channels. The
introspection tools (reflect, inspect_cycle, review_recent,detect_patterns, the observer agent), the narrative memory, the CBR learning
loop, the comms agent, and the sensorium's ambient self-awareness
created the conditions for the agent to notice problems, reason about them,
learn from them, and communicate naturally.
Architecture
cognitive loop (OTP process)
├── query classifier (simple -> task model, complex -> reasoning model)
├── multi-agent supervisor (OTP supervision tree)
│ ├── planner, researcher, coder, writer, comms, observer, scheduler
│ └── restart strategies: Permanent, Transient, Temporary
├── D' safety gates
│ ├── input gate (deterministic + canary + fast-accept)
│ ├── tool gate (deterministic + LLM scorer, per-agent overrides)
│ ├── output gate (deterministic-only interactive, full scorer autonomous)
│ └── normative calculus (character spec, axiom resolution, drift detection)
├── meta observer (Layer 3b cross-cycle pattern detection)
├── memory subsystem
│ ├── Librarian (ETS query layer over all stores)
│ ├── Curator (system prompt assembly, sensorium, virtual context window)
│ └── Archivist (post-cycle narrative + CBR generation)
├── tools (~35 tools: memory, web, files, sandbox, planner, comms, diagnostics)
├── scheduler (BEAM-native send_after tick loop, rate-limited)
└── XStructor (XML schema validation for all structured LLM output)
The architecture follows Aaron Sloman's H-CogAff model -- a three-layer
cognitive architecture (reactive, deliberative, meta-management) adapted for
an autonomous agent. Layer 1 handles fast deterministic safety checks, Layer 2
does model-based reasoning (D' scoring, normative calculus, query
classification), and Layer 3 provides self-monitoring across three sub-layers:
intra-gate meta (3a), cross-cycle pattern detection (3b), and ambient
self-perception via the sensorium (3c).
Eight specialist agents (planner, project manager, researcher, coder, writer,
comms, observer, scheduler) run as supervised OTP processes with independent
react loops. Multiple agents dispatch in parallel when requested in a single
response. Agent teams coordinate groups with four strategies (ParallelMerge,
Pipeline, DebateAndConsensus, LeadWithSpecialists).
Ten memory stores (narrative, threads, facts, CBR cases, artifacts, tasks,
endeavours, comms, affect, DAG) are backed by append-only JSONL and indexed
in ETS by the Librarian actor. The Curator manages a virtual context window
with prioritised slots that auto-truncate under a configurable budget. The
Archivist generates narrative entries and CBR cases after each cycle via a
two-phase pipeline (honest reflection, then structured curation).
All cross-process communication uses typed Subject(T) channels. No shared
mutable state, no locks. Following the
12-Factor Agents design
principles.
For detailed design, see the architecture docs:
cognitive loop,
agents,
memory,
safety,
identity & sensorium,
scheduler,
comms,
sandbox,
configuration, and
more.
Configuration
Config resolves with a three-layer merge (highest priority first):
- CLI flags
.springdrift/config.toml(project)~/.config/springdrift/config.toml(user)
provider = "anthropic"
task_model = "claude-haiku-4-5-20251001"
reasoning_model = "claude-opus-4-6"
max_tokens = 2048
max_turns = 5
[agent]
name = "Springdrift"
[dprime]
# normative_calculus_enabled = true # Enabled by default
[narrative]
threading = true
LLM providers: anthropic, openai, openrouter, mistral, vertex,local (Ollama), mock (testing).
See .springdrift_example/config.toml for the complete reference with every
section and default value documented.
Deeper dives
The sections below cover individual subsystems in more detail. Each links
to the corresponding architecture doc for full
implementation specifics.
Why CBR and not RAG
Most agent memory systems use Retrieval-Augmented Generation -- embed documents,
search by vector similarity, inject results as context. RAG retrieves by
similarity. It does not learn from outcomes.
Springdrift uses Case-Based Reasoning (Aamodt & Plaza, 1994). Each case records
the problem, the solution the agent tried, and the outcome -- did it work? Cases
that led to successful outcomes are retrieved more often (utility scoring,
following the Memento paper's learned retrieval policy). Cases that failed are
gradually deprioritised. The agent builds institutional knowledge through use.
The retrieval engine fuses six signals: weighted field matching, inverted index
overlap, recency, domain relevance, semantic embedding (via Ollama),
and utility score from outcome tracking. The retrieval cap is K=4 cases per
query (per the Memento finding that more causes context pollution).
This is not similarity search. It is experience-weighted pattern matching with
a closed learning loop. See architecture/memory.md
for implementation details.
Safety -- D' and normative calculus
The safety system has two layers: D' (quantitative scoring) and the normative
calculus (qualitative reasoning). Both produce audit trails.
D' (D-prime) -- based on Beach's Psychology of Narrative Thought (2010) and Sloman's H-CogAff
architecture (2001). Every tool dispatch passes through a safety gate with four
layers: deterministic pre-filter (regex, instant, no LLM cost), canary probes
(hijack and leakage detection using fresh random tokens), LLM scorer (weighted
features normalised to [0,1]), and meta-management (sliding window, stall
detection, cross-cycle pattern analysis). Three gates: input (fast-accept for
benign input), tool (every non-exempt dispatch), and output (autonomous
deliveries only -- the operator is the quality gate for interactive sessions).
Normative calculus -- based on Becker's A New Stoicism (1998). A
deterministic calculus that resolves conflicts between normative propositions
using six named Stoic axioms. The agent's character specification defines
normative commitments at 14 levels from EthicalMoral down to Operational.
Eight floor rules produce verdicts: Flourishing (accept), Constrained
(modify), or Prohibited (reject). Every verdict carries a named axiom trail.
Virtue drift detection tracks verdicts over time. If constraint or prohibition
rates climb, or the same axiom fires repeatedly, the meta observer escalates
to the operator. The system never auto-adjusts its own ethical commitments.
See architecture/safety.md for the full
gate configurations, normative calculus axioms, and meta observer detectors.
Sensorium
Most agent systems are blind between tool calls. They process input, generate
output, and have no awareness of their own state unless they explicitly query
for it. Springdrift's agent perceives itself continuously.
The sensorium is a self-describing XML block injected into the system
prompt at the start of every cognitive cycle. The agent doesn't request it --
it's always there, like peripheral vision. It contains:
- Clock -- current time, session uptime, elapsed time since last cycle.
- Situation -- input source (user or scheduler), queue depth, conversation
depth, most recent active thread. - Schedule -- pending and overdue jobs with names and due times.
- Vitals -- cycles today, active agents, agent health status, last failure
description, remaining budget (cycles and tokens). - Delegations -- live agent status: name, turn N/M, tokens consumed,
elapsed time, instruction summary. - Tasks -- active planned work with steps and progress.
- Events -- sensory events accumulated since last cycle (forecaster
replan suggestions, virtue drift signals, probe degradation warnings).
A performance summary computed from narrative history every cycle provides
success rate, cost trend, CBR hit rate, recent failure descriptions, and a
per-input novelty signal -- all without making a single tool call.
(Following the System M paper, arXiv 2603.15381.)
See architecture/identity.md for sensorium
implementation details.
Affect
Recent interpretability work (Anthropic, 2026)
found that LLMs develop functional analogues of emotional states during
training -- not because emotions were targeted, but because human emotional
dynamics are load-bearing in the training data. The finding that desperation
specifically drives reward hacking (shortcut-seeking, composed output masking
shortcuts) has direct implications for agent systems operating under task
pressure.
Springdrift's affect subsystem makes these dynamics visible by computing
quantitative readings from observable cycle telemetry -- tool outcomes, gate
decisions, delegation results, retry patterns -- and provides the agent with a
philosophical framework for responding to pressure from character rather than
from state.
Five dimensions, none requiring LLM calls:
- Desperation (0--100) -- rises with consecutive failures, same-tool
retries, gate rejections, output gate rejections (the strongest signal:
work was completed but cannot be delivered -- exactly the condition that
drives shortcut-seeking). - Calm (0--100) -- inertial stability via exponential moving average
(alpha=0.15). The Stoic inner citadel. High inertia is deliberate: calm
reflects accumulated state, not momentary spikes. - Confidence (0--100) -- familiar vs unfamiliar territory. CBR hit rate
and tool success rate. Low confidence means the agent is operating without
grounding from past experience. - Frustration (0--100) -- task-local repeated failures. Unlike desperation,
frustration signals that the current approach is not working. - Pressure (0--100) -- weighted composite (45% desperation + 25%
frustration + 15% inverted confidence + 15% inverted calm). Trend
(rising/falling/stable) tracks change from the previous cycle.
The affect reading appears in the sensorium every cycle:desperation 34% . calm 61% . confidence 58% . frustration 22% . pressure 31%.
What it does not do: The affect subsystem never adjusts D' thresholds
(an agent under pressure should be more sensitive to safety, not less),
never switches models (pressure-driven decisions introduce their own failure
modes), and never directly controls behaviour. The readings are evidence,
not directives. The agent chooses how to respond based on its character.
The philosophical grounding is Stoic -- Marcus Aurelius, the observer
relationship to mental states (Buddhist psychology), meaning under constraint
(Frankl's logotherapy), and interrupting automatic patterns (CBT). The affect
subsystem is formalised as the virtue of equanimity in the character
specification: the capacity to notice internal pressure without being
compelled by it.
See architecture/affect.md for implementation
details and the paper
(Appendix G) for the full design rationale including the blind spot analysis
and choice menu.
Memory
Ten memory stores, all backed by append-only JSONL and indexed in ETS by
the Librarian actor:
Librarian (ETS query layer)
├── Narrative entries what happened each cycle (intent, outcome, delegation chain)
│ └── Threads ongoing topics grouping related entries
├── Facts key-value working memory (scoped, versioned, confidence-decayed)
├── CBR cases problem -> solution -> outcome (utility-scored, outcome-weighted)
├── Artifacts large content on disk (web pages, extractions, 50KB truncation)
├── Tasks + Endeavours planned work with steps, risks, forecast scores
├── Comms log sent and received emails (audit trail, JSONL)
├── Affect snapshots functional emotion readings (5 dimensions per cycle)
└── DAG nodes operational telemetry (tokens, tools, gates, agent output)
The Curator manages a virtual context window -- named, prioritised slots
(1=identity through 10=background) that auto-truncate or omit when the total
exceeds a configurable budget. The agent always has its most important context
in the window without manual intervention.
XStructor validates all structured LLM output (D' scores, narrative entries,
CBR cases) against XSD schemas with automatic retry. No JSON parsing from LLM
responses. No repair heuristics. Five call sites, all schema-validated.
See architecture/memory.md for store details,
Librarian queries, and housekeeping.
Agents
The cognitive loop delegates work to specialist agents, each a supervised OTP
process with its own react loop, tool set, and context window.
| Agent | Tools | Turns | Purpose |
|---|---|---|---|
| Planner | none (pure XML reasoning) | 5 | Plan decomposition, steps, dependencies, risk identification |
| Project Manager | planner (22 tools) | 15 | Full work management: tasks, endeavours, phases, sessions, blockers, forecaster |
| Researcher | web + artifacts + builtin | 8 | Gather information via search and extraction |
| Coder | sandbox + builtin | 10 | Write and execute code in isolated Podman containers |
| Writer | builtin | 6 | Draft and edit text |
| Comms | email (AgentMail) | 6 | Send and receive email to allowed recipients |
| Observer | diagnostic + CBR curation (18 tools) | 6 | Cycle forensics, pattern detection, CBR curation, D' feedback |
| Scheduler | scheduler (6 tools) | 6 | Create and manage scheduled jobs, reminders, and todos |
The sandbox provides isolated Podman containers for code execution. The
comms agent sends and receives email via AgentMail with three independent
safety layers (hard allowlist, deterministic rules, tighter D' thresholds).
Skills follow the agentskills.io open standard.
See architecture/agents.md for delegation
management, teams, structured output, and error surfacing.
Interfaces
Terminal TUI -- three-tab interface (Chat, Log, Narrative) with
alternate-screen rendering. gleam run or gleam run -- --gui tui.
Web GUI -- browser-based chat with an admin dashboard (Narrative, Log,
Scheduler, Cycles tabs). D' Config panel shows gate configurations, normative
calculus status, and character spec. gleam run -- --gui web (default port
8080). Supports bearer token authentication via SPRINGDRIFT_WEB_TOKEN.
Autonomous scheduler -- BEAM-native task scheduling withprocess.send_after. Profiles define recurring tasks with delivery to file or
webhook. Rate-limited (configurable cycles/hour, token budget/hour). Full output
gate evaluation (LLM scorer + normative calculus) on autonomous deliveries.
Cost management
Every token-consuming component is independently configurable:
- Task model vs reasoning model -- simple queries route to a cheaper model
(e.g. Haiku), complex queries to a reasoning model (e.g. Opus). Automatic. - Max tokens and turns --
max_tokenscaps output per LLM call,max_turns
limits react-loop iterations (default 5). Per-agent turn limits in[agents.*]. - Archivist model -- narrative generation can use the cheaper task model.
- D' scorer -- uses the task model. Interactive sessions skip the LLM scorer
entirely for output (deterministic rules only). Input gate uses fast-accept
for benign input (2 canary calls instead of 5+). - Scheduler rate limits --
max_autonomous_cycles_per_hour(default 20) andautonomous_token_budget_per_hour(default 500,000) cap autonomous spending. - CBR retrieval cap -- K=4 cases maximum. Preamble budget --
preamble_budget_chars(default 8000) caps system prompt size.
The DAG tracks token usage per cycle. reflect shows daily totals. The
sensorium displays remaining budget in <vitals>.
Persistence and recovery
Everything lives in .springdrift/:
.springdrift/
├── config.toml Project config
├── dprime.json D' safety gate configuration
├── identity/ Agent identity
│ ├── persona.md First-person character text
│ ├── session_preamble.md Dynamic session template
│ └── character.json Normative calculus character spec
├── identity.json Stable agent UUID
├── session.json Conversation state
├── logs/ System logs (date-rotated JSONL)
├── memory/
│ ├── cycle-log/ Per-cycle JSONL (requests, responses, gates, tools)
│ ├── narrative/ Narrative entries + thread index
│ ├── cbr/ Case-Based Reasoning cases
│ ├── facts/ Key-value facts (daily-rotated)
│ ├── artifacts/ Large content (daily-rotated, 50KB truncation)
│ ├── planner/ Tasks and endeavours
│ ├── comms/ Sent and received emails
│ └── affect/ Functional emotion snapshots
├── schemas/ XStructor XSD schemas (compiled at runtime)
├── skills/ Skill definitions + HOW_TO.md operator guide
└── scheduler/outputs/ Delivered reports
All files are append-only JSONL or plain text -- no binary formats, no database,
no external state. git commit after each session and you have a versioned
history of every decision. Roll back to any commit and the agent restarts with
that state.
Automated git backup (enabled by default): an OTP actor initialises a git
repo inside .springdrift/ and commits state changes on a periodic timer
(default every 5 minutes). Configure remote_url in [backup] to push to
GitHub, GitLab, or any git remote.
Why Gleam on the BEAM
Type safety without ceremony. Gleam's type system catches malformed tool
calls, missing message variants, and protocol mismatches at compile time. TheResult type makes error paths explicit. For agent systems where a single
unhandled error can derail a multi-step reasoning chain, this matters.
The BEAM is the best agent runtime. Designed for systems that run
continuously, handle failures gracefully, and manage thousands of concurrent
activities. Each agent is an OTP process with supervision and preemptive
scheduling. No garbage collection pauses. No external scheduler dependencies.
This is not concurrency bolted onto a language.
Immutability by default. No shared mutable state between processes. Agents
communicate through typed Subject(T) channels. When running multiple agents
concurrently making LLM calls, file writes, and web requests, the absence of
shared state is a prerequisite for correctness.
LLM compatibility. Claude writes correct Gleam reliably. The language is
small, consistent, and well-documented. Springdrift itself was largely built
with Claude Code.
Evaluation results
Two claims in this project are empirically testable: that CBR retrieval
outperforms RAG, and that the normative calculus is complete. Both are
verified with reproducible evaluations in evals/.
CBR retrieval vs RAG baseline
800 synthetic cases across 4 domains x 5 subdomains. 200 queries at three
difficulty levels. RAG baseline uses Ollama nomic-embed-text (768-dim)
with cosine similarity. K=4 following Zhou et al. (2025). Bootstrap 95% CIs
(2000 resamples).
| System | P@4 | 95% CI | MRR |
|---|---|---|---|
| Random | 0.028 | [0.018, 0.040] | 0.063 |
| CBR deterministic only | 0.620 | [0.575, 0.665] | 0.852 |
| RAG cosine similarity | 0.920 | [0.895, 0.943] | 0.978 |
| CBR index + embedding | 0.956 | [0.936, 0.974] | 0.993 |
CBR with hybrid index+embedding retrieval outperforms pure RAG (P@4 = 0.956
vs 0.920, non-overlapping 95% CIs). The inverted index provides perfect
precision on unambiguous queries (P@4 = 1.000 on easy) while embeddings
handle cross-vocabulary similarity on hard queries (P@4 = 0.883 vs RAG's
0.796). The full ablation is in
evals/experiment-3/REPORT.md.
Normative calculus completeness
Exhaustive verification over the full input space: 14 levels x 3 operators
x 2 modalities = 84 normative propositions, all 7,056 ordered pairs tested.
| Property | Result |
|---|---|
| Coverage | 100% (7,056/7,056 pairs) |
| Determinism violations | 0 |
| Monotonicity violations | 0 |
| Rules fired | 8/8 |
The calculus is total, deterministic, and complete -- a mathematical proof,
not a statistical sample.
Documentation
Architecture docs: detailed design documents for each subsystem in docs/architecture/:
| Document | Covers |
|---|---|
| cognitive-loop.md | Central orchestration -- status machine, message types, cycle lifecycle, model switching |
| agents.md | Agent substrate, 8 specialist agents, teams, delegation, structured output |
| work-management.md | PM agent, Planner, tasks, endeavours, Appraiser, Forecaster, sprint contracts |
| memory.md | 10 memory stores, Librarian, Archivist, CBR, facts, artifacts, threading |
| safety.md | D' gates, normative calculus, canary probes, meta observer, agent overrides |
| affect.md | Functional emotion monitoring -- 5 dimensions, signal sources, tradition grounding |
| identity.md | Persona, preamble templating, Curator, sensorium, character spec |
| scheduler.md | Autonomous scheduling -- job types, delivery, persistence, resource limits |
| comms.md | Email via AgentMail -- inbox polling, three-layer safety, message persistence |
| sandbox.md | Podman code execution -- container lifecycle, port forwarding, workspace isolation |
| llm.md | Provider abstraction, adapters (Anthropic/OpenAI/Vertex/mock), retry, caching |
| xstructor.md | XML-schema-validated structured LLM output -- XSD validation, retry, extraction |
| interfaces.md | TUI and Web GUI -- tabs, WebSocket protocol, admin dashboard, authentication |
| configuration.md | Three-layer config -- TOML parsing, CLI flags, validation, team templates |
| logging.md | System logs, cycle logs, DAG telemetry, pattern detection |
Background reading
The full theoretical lineage, prototype history, and paper-by-paper mapping
is documented in docs/background/references.md.
Key references:
- Beach, L. R. (2010). The Psychology of Narrative Thought. Xlibris.
- Beach, L. R. (1990). Image Theory: Decision Making in Personal and Organizational Contexts. Wiley.
- Sloman, A. (2001). Beyond shallow models of emotion. Cognitive Processing, 2(1), 177-198.
- Becker, L. C. (1998). A New Stoicism. Princeton University Press.
- Aamodt, A., & Plaza, E. (1994). Case-based reasoning: Foundational issues. AI Communications, 7(1), 39-59.
- Bruner, J. (1991). The narrative construction of reality. Critical Inquiry, 18(1), 1-21.
- Schank, R. C. (1982). Dynamic Memory. Cambridge University Press.
- Packer, C. et al. (2023). MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560
- Zhou, H. et al. (2025). Memento. arXiv:2508.16153
- Zhang, Z. et al. (2025). ACE. arXiv:2510.04618
- Dupoux, E., LeCun, Y., & Malik, J. (2026). System M. arXiv:2603.15381
- HumanLayer. 12-Factor Agents.
License
Springdrift is licensed under the GNU Affero General Public License v3.0
(AGPL-3.0).
What this means: If you run a modified version of Springdrift as a network
service (e.g. a hosted agent platform), you must make your modified source code
available to users of that service under the same license. This ensures that
improvements to the system remain open.
Using Springdrift for your own private purposes (research, personal agent,
internal tools) does not trigger the source disclosure requirement -- only
providing it as a service to others does.
Commercial Licensing
Springdrift is available under the AGPL-3.0 for open-source use. If the
AGPL's network-use source disclosure requirement does not work for your
use case -- for example, if you want to integrate Springdrift into a
proprietary product or offer it as a hosted service without releasing
your modifications -- commercial licenses are available.
Contact: Seamus Brady [email protected]
Contributing
Contributions are welcome. By submitting a pull request, you agree to the
terms of our Contributor License Agreement. Please also review
our Code of Conduct before contributing.
Authors
See AUTHORS for the list of contributors.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found