Yume (夢)

Yume is a programmable, explicit world model — built on Godot 4.6.1.
A world's entities and rules are written as pure JSON; a small interpreter
advances that world tick by tick; Godot projects the resulting state to pixels,
audio, HUD, or text. The engine ships a fixed set of primitives + interpreter
— no game-specific GDScript; you describe a world, never edit the engine
(Invariant #1 / #8; ADR 0021). Games are one use of this — not the only one.

Why "Yume"? 夢 (yume) is Japanese for "dream." You describe a world
in plain language and it materializes into something runnable — imagine it,
and it exists — without writing any per-world code. The name captures that
declarative "dream it into being" quality.

🤖 Built by Claude, for Claude

This repository was written entirely by Claude (Anthropic's AI) and is
designed to be read and operated by Claude — its conventions, build steps,
and run workflow live in CLAUDE.md and .claude/ for exactly that purpose.

The recommended way to use Yume is to open it in Claude Code
and ask, in plain English, for what you want — "generate a game about X",
"run the tests", "record a headless multiplayer video". Claude knows the
fiddly invocation details (syncing, --path ., asset imports, the visual-QA
gate). You can run the commands yourself, but it's easy to get them wrong;
letting Claude drive is the intended, lower-friction path. See
INSTALLATION.md to set up.

Status: pre-1.0 / experimental (0.x — the 7 primitives + JSON schema
aren't frozen yet; 1.0 will freeze the contract). Last updated: 2026-06-06.

Demos

Describe a world in plain English → an LLM writes the JSON → a fixed engine runs
it on Godot. A few things built (or rendered) this way:

🔫 `doomarena3d` — a first-person arena shooter

doomarena3d — first-person arena shooter

Movement, shooting, enemies, deaths — all JSON rules. No game-specific engine code.

🏎️ An arcade racer, generated end-to-end from a prose pitch

arcade racer generated via /yume-design

/yume-design "<pitch>" — code-drawn, no asset generation needed.

🌲 A walkable 3D scene — the text → 3D world pipeline

walkable 3D forest scene

Built with /yume-create-scene.

🗿 Scene-generation algorithm (experimental)

scene-generation algorithm — text to placed 3D world

Text → semantic map → placed 3D world. Rough, but end-to-end — an active research direction.

🧩 Sokoban — a committed 2D demo (runs on a fresh clone, no API keys)

sokoban 2D puzzle demo

🎥 Camera-orbit trajectory — the 3D rendering / camera system

camera orbit trajectory

🌐 Server-authoritative multiplayer (in development)

server-authoritative multiplayer

What Yume is for

Yume is a programmable explicit world model, not just a game engine. A world
model is a transition function f(state, action) → next_state; Yume lets you
write f as JSON and run it:

JSON is the world-spec language — entities (the state) + rules
({trigger, query, effect}, the transition function).
The runtime is the interpreter — it executes that spec, ticking state forward.
Godot is the projection function — state → pixels / audio / HUD / text.
(Per ADR 0021: expose Godot, don't reimplement it.)

Games are one downstream consumer. The same substrate serves:

Use	How
🎮 Games	the Godot projection — a playable build
🤖 RL / agent-evaluation testbeds	deterministic, seedable, gym-like stepping (ADR 0060)
🏞️ Scene / world generation	prose → 3D scene pipelines (`/yume-create-scene`)
🧠 Training-data for neural world models	roll a JSON world out, record `(state, action, next_state)` trajectories, train an implicit model (Dreamer/Genie-style) that approximates the same `f` at scale

That last row is the thesis: Yume aims to be the clean, authorable explicit
substrate that bridges to the implicit (neural) world-model world — interpret
it directly, and use it as a faucet of reproducible training data. The seven
primitives (Entity / Tag / Rule / Trigger / Effect / Query / Relation) are the
minimal universal vocabulary for describing discrete-time worlds — which is
why the engine refuses game-specific verbs (no damage / heal / attack;
those are content, expressed by composing primitives).

📖 Full vision: docs/guideline/00_what_yume_is.md.

Vision — a programmable, explicit world model

"Everything is a world model if you squint hard enough."
— Zihan "Zenus" Wang (@wzenus)

Two trends motivate Yume:

Programming is becoming unstructured. With LLMs (and VLMs like Qwen) you
increasingly describe what you want in natural language and the model writes
the code. The interface to software is shifting from syntax to intent.
World-model research is accelerating. Neural world models
(Genie-style
generative-interactive-environment models,
Dreamer-style latent models) are
trained on large action-labelled trajectory datasets — frames paired with the
actions that produced them. That data is expensive: studios pay annotators to
play games and label frames, or assemble reference→video corpora by hand.

Yume sits at the convergence: a programmable world model. Instead of learning
a world's dynamics implicitly from millions of frames, you author them
explicitly — set the physics, the rules, the goal — the way you'd program a game
(or, in a robotics framing, set up the sensors and a goal). The catch is that "just
write it in prose" is too unstructured to execute, so Yume's substrate is
structured JSON, and an LLM (Claude) is the compiler from prose → JSON. That
makes the world model programmable (authorable in language) and explicit
(it's code + a deterministic game engine you can read, seed, and diff — not opaque
weights).

Why "explicit"? Because Yume's transition function f(state, action) → next_state is literally code on a deterministic engine. (Half-joking,
half-serious: our own universe runs a small set of fixed physical laws,
deterministic given initial conditions — a lot like a game engine. Yume just
makes that analogy authorable.)

The two world-model worlds are complementary, not rival:

Explicit (Yume) — clean, authorable, deterministic; cheap to roll out and to
label (every transition is exact, by construction).
Implicit (neural) — scales to messy, photoreal, open-ended dynamics no one
wants to hand-author.

Yume aims to be the explicit substrate that bridges to the implicit one:
interpret a JSON world directly and use it as a reproducible faucet of
(state, action, next_state) training data.

Research directions

Yume is built to assist game development and to give the research community a
concrete substrate to push on. Open questions it lets you probe:

Does an LLM need to generate a whole game — or just the world rules? Most
"AI makes a game" systems generate per-game engine code. Yume's bet — borne
out so far — is that the LLM only needs to understand the framework + the
world's rules and emit JSON; the fixed interpreter does the rest. Less to get
wrong, far more to verify.
LLM-authored 3D structure. Some of Yume's Claude-authored meshes /
kit-of-parts composites already look good. With more training data — and tooling
like Blender's MCP bridge, or models that emit procedural-generation code
(cf. 3DCodeBench) — Claude may
eventually generate solid 3D structure directly, or excel at procedural
generation. Yume is a place to try.
Scene generation. /yume-create-scene is an end-to-end
text→concept→semantic-map→placement→assets pipeline. It runs, but it is not
robust yet and has no aesthetic sense — a genuinely open research
direction (it's harnessed, not solved). A strong text→3D-world API (cf.
WorldGen) dropped into this slot would,
I believe, complete the framework.
Assist, don't replace. Yume is not trying to replace game studios. Gameplay
is art; 3D is art; scene composition is art; story is art. A studio with
resources can plug in its own aesthetic (scene generation needn't be
Claude's), and a non-programmer can realize a game from a single strong idea in
any one of those dimensions — without ever touching engine internals.
A research substrate, broadly. Because a world is explicit JSON + a
deterministic engine, one artifact serves world-model training-data generation,
3D / scene / audio generation research, and game-AI (the
ADR 0020 external-agent IPC seam + the
ADR 0060 gym-like env). As
scene / 3D / asset-generation APIs mature, the framework's remaining gaps close
from the outside in.

This framework was written entirely by Claude; the author reviewed the code
but freely admits an LLM now manages a codebase this size faster than they could
alone. In the end the ideas matter most — Yume is a bet on where they point.

Features

What the framework can do today — JSON-authored unless noted; most rows map to
an ADR under docs/adr/.

Area	Capabilities
Core engine (primitives + interpreter — ADR 0001, 0021)	7 primitives (Entity / Tag / Rule / Trigger / Effect / Query / Relation); 8 trigger types (`tick`, `contact`, `signal`, `input`, `spawn`, `despawn`, `relation_changed`, `frame_tick`); ~60 effect verbs (`state_set/add/mul/clamp`, `spawn`/`remove`/`transform`, relations, tags, velocity ×4, arrays, `pathfind_to`, `raycast_hit`, zones, shell-lifecycle); formula evaluator (whitelisted math); deterministic fixed-rate tick + phase-flush scheduler; `@lib`/`$include` cross-game reuse (ADR 0027); rule macros; deterministic scatter/ring/grid placement.
Rendering & world	2D + 3D renderers; 8 camera modes (top-down / side-scroll / isometric / third- / first-person / fixed / free-cam, 2D & 3D); GLB meshes, kit-of-parts composites, MultiMesh decoration, trimesh static world; shaders-as-JSON; multi-biome ground from a semantic map; water surface; day/night; fog + procedural sky.
Gameplay system primitives (opt-in directors — mounted only if used)	party, schedule, class/occupation, zones, faction, tech-tree, dynasty, lifecycle/aging, vehicles, multi-actor + scripted-policy AI, pathfinding, procedural generation, grid/dynamic placement, animation.
"Complete game" layer	declarative screens/modals, save/load, tutorial overlays, settings schema, HUD-from-JSON, event→SFX audio, juice (shake/flash/particles).
Physics	Godot PhysicsServer + CharacterBody motion, AABB blockers, camera-relative WASD.
Generation pipelines (LLM-in-the-loop)	`/yume-design` is the single orchestrator — prose → full game, with `--scene` (generate a 3D world to play in) + `--with-assets` (AI textures/meshes) composing as three disjoint-ownership layers (World / Game / Assets — ADR 0067); no flags = key-free code-draw. Plus standalone authors: `/yume-create-scene`, `/yume-hud-author`, `/yume-screen-author`, `/yume-map-author`. 38 specialist skills; optional codegen + AI assetgen (textures via OpenAI/Gemini, meshes + rig via Tripo3D, shaders). See § Generation pipeline.
Networking & I/O (ADR 0060–0066)	deterministic gym-like stepping env (Python) + determinism oracle; lockstep; client-server (server-authoritative) with data-driven `net.json` replication; synced animation; record-then-replay smooth headless video of N-player synced sessions (`scripts/net_video.py` — normal window / `--linux` headless, GPU, grid, 60 fps).
Tooling & QA	25 static validators (sync gate); Playwright-style scenario tests; visual QA (Gemini + Claude vision); tech-director invariant gate.

Quick start

Set up once via INSTALLATION.md (Godot 4.6.1 + a Python
venv). Then — recommended — open the repo in Claude Code and just ask:

"Generate a game: a roguelike where vampires steal HP from light sources."
"Run the engine tests." · "Record a 4-player headless multiplayer video."

Claude runs the right pipeline and handles the invocation details.

Or drive it yourself (manual fallback — demos are gitignored, generate or
copy a demo_<name>/ first):

/yume-design "<your prose pitch>" --autonomous   # prose → full game
./scripts/play.sh <name>                         # play / capture (Windows binary)
./scripts/play.sh <name> --capture               # render a frame for visual QA

⚠️ Manual runs have sharp edges (the cd "$TEMPLATE_DST" && --path . trap, asset
--import, …) — see CLAUDE.md § "Running Godot". This is why letting Claude
drive is recommended.

The model in one paragraph

A game is a folder of JSON under godot/data/demo_<name>/. The engine loads
it into entities (dicts with tags/properties/state/position) and rules
({trigger, query, effect}). Each tick, the engine fires rules whose trigger
matches (a tick elapsed, a contact happened, an input arrived, a signal fired),
runs the rule's query to pick entities, and applies the effect (a primitive verb
like state_set / velocity_set / spawn). The renderer is a separate read-only
layer that draws entity state. Seven primitives: Entity, Tag, Rule, Trigger,
Effect, Query, Relation (ADR 0001).

Generation pipeline (prose → game)

/yume-design is the single orchestrator. It's a skill loaded into
Claude's own context (Tier 2.6 — no subagents); it walks specialist skills in
sequence, and each one writes one slice of the game's JSON. A game is composed
from three layers with disjoint file ownership, so they never clobber each
other (ADR 0067):

/yume-design "<pitch>"  [--scene]  [--with-assets]  [--autonomous]

 Phase W   (--scene)        yume-scene-class-catalog ─▶ compose_scene --no-shell
   WORLD                      └─ image-gen (gpt-image) ─▶ compose_world (3D scene:
                                 biome ground, water, heightmap, placed props)
                                 + compose_shell (walk/jump/sprint + camera + .tscn)
 Phases 1-4 (always)        game-designer ─▶ game-reviewer ─▶ game-planner ─▶
   GAME                     level-designer ─▶ [combining-logic / economy / story]* ─▶
                            systems-designer ─▶ content-designer ─▶
                            game-rules-designer ─▶ asset-designer
                            (+ soul skills: flavor-writer, audio, juice, lighting,
                             screen-flow, save-policy, tutorial — as the GDD needs)
 Phase A   (--with-assets)  tools.yume_assetgen  (gpt-image concepts + Tripo3D .glb,
   ASSETS                     patches visual.* in place)
 Phase QA  (always)         qa-tester ─▶ visual-designer / visual-tester ─▶
                            gdd-coverage-tracker  (GDD = contract; no silent drops)
 on demand                  tech-director  (gates engine / new-primitive / ADR changes)

* conditional — those run only if the GDD signals crafting, an economy, or a
story. Genre detection swaps in strict specialists where they exist
(shooter / merchant / racing designers + their reviewers) on top of the
generic game-designer / game-reviewer floor.

How the layers stay disjoint — World writes scene.json,
entities/auto_gen.json, assets/; Game writes entities/<slug>.json,
world/rules/*.json, hud.json, goals; Assets patches visual.* on existing
defs. The engine globs entities/*.json + world/rules/*.json and merges by
id, so the layers compose with no merge code. No flags = a key-free,
code-drawn single-player game (and the graceful-degrade target when API keys
are absent).

Standalone authoring pipelines (outside `/yume-design`)

Slash command	Makes	Driver script(s)
`/yume-create-scene`	a walkable 3D scene / diorama (scene + walk shell)	`compose_scene` → `compose_world` + `compose_shell` (+ `compare_semantic` QA)
`/yume-hud-author`	`hud.json` fit to a wireframe	`wireframe_to_hud` (preprocess/postprocess)
`/yume-screen-author`	`screens.json` fit to a wireframe	`wireframe_to_screen`
`/yume-map-author`	a level's instances/patterns from a 2D sketch	`compose_map` + `wireframe_to_map`

Which skill drives which script

Most skills only write JSON — the orchestrator does the single sync + Godot
run. The skills that actually invoke a script:

Skill	Script(s) it runs
`yume-design`	`compose_scene --no-shell` (World), `tools.yume_assetgen` (Assets), `scripts/play.sh` (QA capture)
`yume-create-scene`	`compose_scene` → `compose_world` + `compose_shell`, `tools.yume_assetgen`, `compare_semantic`
`yume-scene-class-catalog`	authors the catalog `compose_world` consumes (no run)
`yume-asset-designer`	`tools.yume_assetgen`, `tools/validators/run_all.py`
`yume-hud-author` / `screen-author` / `map-author`	`wireframe_to_{hud,screen,map}`
`yume-qa-tester` · `playtest` · `visual-designer` · `visual-tester` · `lighting-designer`	`scripts/play.sh` (run + `--capture`)
`yume-tech-director`	invariant greps + the unit suite (gate, no content)
all other designers	write JSON only — no scripts

Pipeline-stability tiers live in .claude/rules/pipeline-stability.md: the 2D
HUD/screen pipelines are locked (ADR required to change the harness); the 3D
scene/world + level/map pipelines are active.

Repository layout

yume/
├── godot/                         ← the Godot project (engine + scaffolding)
│   ├── project.godot              ← autoloads: CaptureRunner, AudioBus, StdioStepDriver
│   ├── scenes/                    ← TRACKED scaffolding scenes only
│   │   ├── play.tscn              ← universal launcher (--game=<name>)
│   │   ├── test_main.tscn         ← engine unit tests
│   │   └── scenario_test.tscn     ← per-game scenario test runner
│   ├── scripts/engine/            ← THE ENGINE (see file map below)
│   ├── scripts/renderer_2d/       ← 2D entity renderer (read-only view)
│   ├── scripts/renderer_3d/       ← 3D entity renderer (read-only view)
│   └── data/
│       ├── lib/                   ← shared JSON libs (shapes, meshes, input, shaders) — TRACKED
│       └── demo_<name>/           ← per-game content — GITIGNORED, regenerated
├── scripts/
│   ├── play.sh                    ← run via the WINDOWS Godot binary (play/capture/tests)
│   └── run_linux.sh               ← run via the LINUX binary (headless tests + the stdio env)
├── tools/
│   ├── yume_env/                  ← ADR 0060 env: oracle.py, env.py (gym-like), test_env.py
│   ├── yume_codegen/ yume_assetgen/  ← optional authoring-time emitters
│   └── visual_layout/             ← text→2D/3D layout pipelines (HUD/screen/map)
├── docs/                          ← guideline/ (contract 30_*, architecture), adr/NNNN-*, per-game design
├── .claude/                       ← skills (yume-*) + path-scoped rules + plan/
└── CLAUDE.md                      ← full project instructions (read this for conventions)

Engine file map (`godot/scripts/engine/`)

The engine is organized into layers. Read top-to-bottom for a mental model:
core primitives → stores → the tick scheduler → effects → coordinators (boot +
per-frame subsystems) → directors (optional gameplay primitives) → ui →
io (input/output bridges) → qa.

`core/` — the seven primitives + the interpreter

File	Role
`entity.gd`	Primitive #1 Entity — id, tags, properties (static), state (dynamic), position.
`rule.gd`	Primitive #3 Rule — `{trigger, query, effect}`; `from_dict` parses + validates JSON.
`query.gd`	Primitive #6 Query — declarative entity matcher (tags/state/relations/`id`/radius).
`phase_scheduler.gd`	The tick loop — 4 phases (input → decide → react) + the effect write-buffer + flushes.
`effect_apply.gd`	Primitive #5 Effect — dispatches an effect dict to a handler in `effects/`.
`effects/effect_core.gd`	Foundational verbs: `state_set/add/mul/clamp`, `spawn`, `remove`, `relate`, …
`effects/effect_motion.gd`	Motion + spatial verbs: `velocity_set`, `velocity_add_relative`, queries.
`effects/effect_actor.gd`	Actor-control + party verbs (switch actor, etc.).
`effects/effect_shell.gd`	Engine↔shell boundary verbs: `transition_level/screen`, `save/load_state`, toasts.
`effects/effect_adr_extensions.gd`	Later-ADR primitives (build/place, tech, zones, …).
`effects/effect_resolution.gd`	Shared target/value resolution for effect handlers.
`formula.gd`	Evaluates `"a.state.x + 10"`-style formula strings (whitelisted math).
`world.gd`	Top-level orchestrator. Owns canonical state; runs `_process` (live) + `advance_one_tick`.
`engine_error.gd` · `scripted_policy.gd`	Structured engine errors · in-process scripted-JSON actor AI (ADR 0018).

`stores/` — indices the query/relation system reads

relation_store.gd (Primitive #7 Relation — typed directed edges) ·
spatial_index.gd (grid-bucket hash for radius queries) ·
zone_store.gd (aggregated zone state, ADR 0031).

`coordinators/` — boot + per-frame subsystems (extracted from world.gd)

world_boot.gd (the boot sequence load_data() runs — mounts directors, loads
JSON) · world_loader.gd (boot-time JSON parsers) · spawn_manager.gd (entity
spawn pipeline) · physics_body_builder.gd (builds CharacterBody3D etc. from a
def) · character_body_runner.gd (per-entity motion: velocity→position) ·
ground_constraint.gd / ground_renderer.gd / grass_renderer.gd (floor +
biome ground) · actor_manager.gd (which entity receives input, ADR 0016) ·
level_transition_coordinator.gd / save_load_coordinator.gd /
world_reset_coordinator.gd (the "pending" pipelines GameShell drains) ·
variant_overlay.gd (per-scenario variant overlays).

`directors/` — optional gameplay primitives (mounted only if content uses them)

animation, lifecycle/aging, schedule, party, faction, class/occupation,
tech-tree, dynasty, chunk-streaming, lighting (day/night), multimesh batching.
Each is one ADR; each is a no-op when no content references it.

`ui/` — Godot UI built from JSON (reads `hud.json` / `screens.json`)

game_shell.gd (the per-frame game-loop host: drives camera, HUD, and the
pending pipelines in live play) · screen_flow.gd (declarative screen/modal
stack; sets screen_freeze_world) · control_factory.gd (JSON→Godot Control
tree) · overlay.gd (tutorial overlays) · settings_manager.gd ·
widgets/ (hud_builder, camera_director, minimap_widget,
viewmodel_director, win_lose_widget, nameplate_renderer, bounds_renderer).

`io/` — input/output bridges (where the outside world meets the engine)

File	Role
`input_registrar.gd`	Registers `ui/input.json` actions into Godot's InputMap; `poll()` reads action state → `scheduler.queue_input`. The one input seam.
`capture_runner.gd`	Autoload — `--capture-*`: render a frame / run a step-script, save PNG, quit.
`stdio_step_driver.gd`	Autoload — `--stdio-step`: the ADR 0060 env. stdin actions → 1 tick → stdout state (+ optional frame file).
`determinism_hash.gd`	`canonical_state_hash` — SHA-256 of canonical world state (the determinism contract).
`save_state.gd` · `audio_bus.gd`	Persistence (ADR 0010) · procedural SFX bus.

`qa/` — test/automation drivers (all share one execution path)

File	Role
`step_runner.gd`	The canonical scenario engine. Interprets `steps[]` verbs (press/hold/click/wait/tick/expect/…). Used by scenario tests, captures, AND the env.
`scenario_runner.gd`	`scenes/scenario_test.tscn` entry — loads `tests.json`, runs each scenario through StepRunner.
`screen_smoke_runner.gd`	Per-screen headless playability smoke.

`libs/` + `util/`

lib_resolver.gd (@lib.* / $include cross-game JSON reuse, ADR 0027) ·
macro_expander.gd (rule macros) · shape_lib/mesh_lib (code-draw libraries) ·
instance_patterns.gd (declarative scatter/ring/grid placement — deterministic
per-pattern RNG) · grid_snap · pathfinding · build_validators · vec3_util.

`renderer_2d/` + `renderer_3d/`

entity_sprite_2d.gd / entity_mesh_3d.gd — read-only views: each reads an
entity's state.position + visual block and draws it. Three-tier fallback
(authored mesh/sprite → shape-lib primitive → colored box). The renderer never
writes engine state.

How a game runs — the flows

Boot

scenes/play.tscn (or <game>_3d.tscn)  →  World._ready()
  → (auto_start) start() → load_data()
      → world_boot.run():  mount directors · register ui/input.json actions
        · load entities/ + world/rules/ + scene.json + screens.json
        · spawn initial_instances (+ expand instance_patterns)
  → GameShell + ScreenFlow + 12 other directors mounted as siblings of World

Per frame (live play) — `world.gd::_process(delta)`

_poll_input()                  → InputRegistrar.poll → queue this frame's input
_ground_constraint.apply()
scheduler.fire_frame_tick()    → ADR 0050 per-frame content rules
if _tick_due(delta):           → drain the real-time accumulator (fixed sim rate)
    advance_one_tick()
GameShell (separate _process)  → drains pending pipelines (level/save/reset) + camera + HUD

Per tick — `world.gd::advance_one_tick()` (the canonical tick body)

_tick_count++                              (ws._tick — read by velocity auto-reset)
actor_manager.tick_policies()              AI (ADR 0018), if an actor_manager exists
scheduler.tick()                           ← the 4-phase rule loop (below)
lifecycle/chunk directors
trajectory + determinism-hash log          (opt-in)

The phase loop — `phase_scheduler.gd::tick()`

input  phase: fire rules whose trigger == {input, action} for queued actions → flush
decide phase: fire tick rules (interval) → flush
react  phase: fire signal/contact rules → flush

Effects don't mutate state directly; they go to a write-buffer that flushes
at phase boundaries, so a later phase sees the earlier phase's results (the
blocker-pattern guarantee, tech-director Invariant #9).

The input flow (live vs scripted vs Python env)

All three input sources converge at the polled action state → InputRegistrar.poll
→ scheduler.queue_input. They are NOT unified as synthesized InputEvents —
the seam is one level below events (the Input.is_action_pressed / _just_pressed
state that real keys and Input.action_press() both set). See ADR 0060 Phase 1
notes for why (parse_input_event needs fragile frame-timing in the synchronous
stepping loop; action_press is immediate).

 (A) LIVE PLAYER
   physical key ──► Godot InputEventKey ──► updates ┐
                                                     │
 (B) SCENARIO / CAPTURE  (StepRunner)                │   ┌───────────────────────┐
   {press/hold:"X"} → Input.action_press("X") ───────┼──►│ Godot action STATE     │
                                                     │   │ is_action_pressed /    │
 (C) PYTHON ENV  (stdio_step_driver)                 │   │ is_action_just_pressed │
   stdin {"actions":["X"]} → Input.action_press("X")─┘   └──────────┬────────────┘
                                                                     ▼
   world.gd::_poll_input() [live]  /  StepRunner._drive_poll() [B,C]
        → InputRegistrar.poll(scheduler, actor_id, press_list, hold_list, …)
              press-edge action → is_action_just_pressed     hold action → is_action_pressed
                                                                     ▼
        scheduler.queue_input("X", {"actor": <resolved actor id>})   (deduped per tick)
                                                                     ▼
   scheduler.tick() input phase: rules with trigger {input, action:"X"} fire
                                                                     ▼
        rule.query picks entities (binds self/actor) → rule.effect runs
                                                                     ▼
        e.g. velocity_set(self, …)  →  character_body_runner integrates velocity→position

Two JSON maps make this game-agnostic:

data/<game>/ui/input.json — key → action name (+ edge: press|hold).
InputRegistrar registers these into Godot's InputMap and into the engine's
input_actions_press / input_actions_hold poll lists.
data/<game>/world/rules*.json — action → effect, via a rule
{trigger:{type:"input", action:"X"}, query:{…}, effect:[…]}.

So the engine is the interpreter: it never knows what "move_north" means — it
just routes the action to whatever rule the JSON wired to it.

Why three drivers, one seam: live play drives ticks from _process (per
frame); scenario/capture/env disable _process and drive ticks themselves
(sole driver), so they set the action state via action_press() and call the
same poll(). Below the poll seam, a scripted key is identical to a real one.
(Caveat: scripted input does not flow through Godot's _input/_gui_input
event pipeline — UI buttons are driven by StepRunner's click verb instead.)

The three runtimes

Runtime	Binary	Used for
`scripts/play.sh`	Windows Godot (`.exe` via WSL)	play, captures, unit/scenario tests — the authoritative path
`scripts/run_linux.sh`	native Linux Godot	fast headless tests + the env; no `/mnt/c` round-trip, no WSL-pipe flakiness
`tools/yume_env/env.py`	native Linux Godot	the ADR 0060 stepping env (Python orchestrator)

Both binaries are the same build (14d19694e); the Linux one exists because
Windows-via-WSL can't do reliable stdin/stdout piping or /dev/fd. See CLAUDE.md
for the exact invocation (always cd "$TEMPLATE_DST" + --path ., never an
absolute --path, which silently aborts).

The Python env (ADR 0060)

from tools.yume_env.env import YumeEnv
env = YumeEnv("demo_sokoban")          # or frames=True for pixel observations
env.reset()
obs = env.step(["move_north"])         # {"tick","hash","state"} (+ "frame" if frames=True)
env.close()

One stdin JSON line = exactly one tick = one stdout state line. Two separate
observe channels: state (JSON + canonical hash, on stdout — the evaluator's
"ground truth") and frame (raw RGBA8 to a file, opt-in — a future pixel
agent's "eyes"). Determinism is verifiable: tools/yume_env/oracle.py runs a
demo twice and diffs per-tick hashes.

Data layer (per game, `godot/data/demo_<name>/`)

entities/*.json     definitions (tags/properties/state_init/visual) + initial_instances
world/rules*.json   the mechanics — {trigger, query, effect} rules
world/state.json    initial _engine/world_state singletons
game/goals.json     win/lose/score · game/flow.json multi-level progression
scene.json          camera + lighting + ground + tick_seconds
ui/input.json       key→action map      hud.json   HUD layout      screens.json  menus/modals
audio/cues.json     event→SFX           tests.json scenario tests (steps[])

data/lib/ holds shared, TRACKED JSON (shapes.json, meshes.json,
input/universal.json, shaders) that games $include via @lib.* (ADR 0027).

Known gaps / what's lacking

An honest list of where the framework is thin or demo-grade.

Area	Gap
Networking (newest, most demo-grade)	Pure server-authoritative → no client-side prediction (your own character has round-trip input latency), no lag compensation. LAN/localhost only — no NAT/relay/matchmaking, no reconnection, no persistence of multiplayer state. Only the walk-shell is networked; the real genre games (merchant, shooter, …) aren't net-tested.
Headless-render fidelity	The truly-windowless `--headless-render` path is a custom Godot patch on 4.7-beta, so it can't load our 4.6.1 assets (renders boxes) — needs porting to 4.6.1. The Xvfb path works (real meshes, no window) but is GPU-readback-bound in WSL (fast on a native-GPU Linux box).
Animation	No validator that a declared `animation_clip` exists in the mesh (mismatch silently falls back → "laggy"; bit us in `tiny_village`). `anim_phase` is fixed-cadence, not speed-proportional → foot-sliding at speed. No blend trees, IK, or root motion.
Formula / query	Ternary `a if c else b` is broken in Godot 4.6.1's Expression. `self.nearest({…})` is not implemented — no spatial query inside formulas (use a `contact` trigger instead).
Pipelines	The 3 generation layers no longer clobber (ADR 0067 — `/yume-design --scene` reuses `compose_world` cleanly). Remaining: re-running `compose_world`/`compose_shell` regenerates the World/shell files, so hand-edits to those (not the game's own `entities/<slug>.json` / `world/rules/`) are overwritten; and `/yume-create-scene`'s walk-shell still conflicts if run on a `/yume-design` game folder (use `--scene` instead).
Authoring / UX	No in-engine visual editor (everything is JSON + skills); input is keyboard/mouse/gamepad — no touch/mobile path.
AI / audio	LLM-driven NPC behavior (ADR 0020 external-agent IPC) is a seam, not a shipped feature; audio is procedural SFX + cues — music/BGM is thin.

Roadmap

Direction, not a promise — only Now is version-pinned (0.1.x); the buckets
below are priority tiers, not version numbers (what lands in 0.2 vs later
isn't decided yet, deliberately). Several Later items are the Known gaps above,
sequenced.

Now (0.1.x — polish): convex-hull colliders for terrain props (today's box
is mesh-fit but can clip on steep slopes —
ADR 0067 § colliders) ·
scatter-count gate (compose_world's scatter_in_mask should cap at the
catalog's expected_count, enforced in code, not prose) · CI (a GitHub
Action running the engine unit suite on push).
Next (recording · reuse · RL ergonomics): auto game recording (a full
single-player session → video, no manual stepping) · multiplayer-recording
hardening (the server-authoritative path + net_video.py replay are
implemented but need more testing + real-GPU validation — see Development
environment below) · play-mode inheritance — lift the walk/jump/sprint/camera
shell into data/lib/play_modes/<mode>/ so a same-type game is just assets +
scene + a few rules (ADR 0027 /
ADR 0043) · gym.Env
first-class (ADR 0060:
Spaces, batching, in-process reset()) · animation fidelity (validate a
declared animation_clip exists; speed-proportional anim_phase).
Later: multiplayer hardening (client-side prediction, lag compensation,
NAT/relay/matchmaking, reconnection, persistence) · headless-render fidelity
(port the windowless --headless-render path to 4.6.1) · audio depth (music/BGM
beyond procedural SFX + cues). (See Known gaps above for the why.)
Someday: in-engine visual editor (everything is JSON + skills today) ·
touch / mobile input · LLM-driven NPC behavior as a shipped feature
(ADR 0020 IPC seam exists).
1.0 (the stability promise): freeze the 7 primitives + the JSON schema.
Until then, expect schema churn between minor versions — that's why we're
honestly 0.x.

Prior art & acknowledgments

Yume studied these two projects closely and used them as references while
shaping its own design:

Donchitos/Claude-Code-Game-Studios
— a Claude-Code "studio" of specialist agents organized with path-scoped rules
and automated quality gates. Yume's .claude/rules/ + .claude/skills/
architecture comes directly from studying it.
htdt/godogen — an autonomous game generator
(Godot / Bevy / Babylon.js, driven by Claude Code / Codex, iterating on
screenshots). It showed the autonomous generate → run → screenshot → fix loop
works in practice.

Thanks to both — they showed LLM-driven game generation is real, and Yume stands
on what they figured out.

How Yume is different. Both of those — and most "AI game" systems — generate
per-game engine code for a mainstream engine (C# / GDScript, scene trees,
scripts) and then iterate on that code. Yume takes the opposite stance:

The engine is fixed; the game is data. A primitives-+-interpreter engine
ships once (ADR 0001 /
ADR 0021); the LLM emits
JSON world rules, never engine code
(Invariant #1/#8). Nothing per-game
is compiled — which is exactly what makes research direction #1 (above) testable.
It's a world model, not just a game maker. Games are one projection of the
substrate; RL testbeds, scene generation, and neural-world-model training data
are equal first-class uses.
Research-first. The explicit + deterministic design exists to advance
world-model, 3D-generation, scene-generation, audio-generation, and game-AI
research — not only to ship a finished title.

References

WorldGen: From Text to Traversable and Interactive 3D Worlds — Wang et al.,
1. arXiv:2511.16825. The kind of
  text→3D-world API that would slot directly into Yume's scene-generation stage
  (research direction #3).
3DCodeBench: Benchmarking Agentic Procedural 3D Modeling via Code — Gao et
al., 2026. arXiv:2606.01057. Benchmarks VLMs
emitting procedural-3D code — the path toward LLM-authored 3D structure
(research direction #2).
Neural world models for context —
Genie (generative interactive
environments) and Dreamer (latent world
models): the implicit counterpart that Yume's explicit substrate is designed
to bridge to.

Development environment

Yume was developed on a Dell XPS 15 7590 laptop with no dedicated GPU
available for testing. GPU-bound paths — windowless headless rendering and
multi-player video capture — are therefore untested at scale; under WSL they're
GPU-readback-bound and should run substantially faster on a native-GPU Linux box.
If you have real GPU hardware, the recording / render paths are where you'll see
the biggest speedup, and where help validating performance is most welcome.

Where to read more

docs/guideline/30_framework_primitives.md — the engine contract (invariant-bearing)
docs/adr/ — architectural decisions (e.g.
ADR 0021 expose-don't-reimplement,
ADR 0039 step-runner,
ADR 0060 deterministic I/O + env)
.claude/rules/ — path-scoped invariants (engine-scripts, data-demo, …)
CLAUDE.md — conventions, the run workflow, the post-mortem ritual