AgentForge
Health Uyari
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 8 GitHub stars
Code Gecti
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Open-source terminal AI coding-agent harness for studying agent loops, tools, MCP, skills, safety, and persistence.
AgentForge
AgentForge is a terminal-based AI coding-agent harness built in Python for learning how modern coding agents are structured. It is not just a chatbot wrapper: the project is organized around the core harness concerns that make coding agents reliable, inspectable, and safe.
Quick Start
Install AgentForge from PyPI, create your provider config, verify the setup, then start the terminal UI:
pip install agentforge-harness
agentforge init
agentforge doctor
agentforge
For an isolated install:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install agentforge-harness
agentforge init
agentforge doctor
agentforge
The project currently supports OpenRouter, OpenAI, Anthropic, and custom OpenAI-compatible model providers, plus streaming model responses, typed tool calls, approval gates, output hygiene, secret redaction, prompt-injection boundaries for tool observations, hooks, MCP tools, subagents, context compaction, loop detection, persistent memory, session snapshots, checkpoints, event logs, JSON reports, HTML/markdown session export, resume/restore commands, plan/build modes, skills, and a Rich terminal UI. The v1 focus is packaging, docs, tool reliability, release hygiene, and a clear safety baseline. Larger learning milestones such as Skills v2, deterministic replay, local evals, browser QA, and swarm orchestration are planned after v1.
Purpose
This repository is intended as a learning lab for AI harness engineering.
The main concepts explored here are:
| Area | What It Teaches |
|---|---|
| Agent loop | How a model alternates between reasoning, tool calls, observations, and final answers |
| Tool registry | How tools become schema-first actions the model can call |
| Tool observations | How outputs shape recovery, retries, and model behavior |
| Context management | How prompts, messages, tool results, memory, and compaction fit in the context window |
| Safety and approval | How mutating operations are classified, reviewed, and blocked |
| Hooks | How external scripts can observe agent and tool lifecycle events |
| MCP integration | How external tool servers are exposed to the model |
| Subagents | How a parent agent delegates bounded specialist work |
| Skills | How task-specific guidance can be loaded progressively without bloating context |
| Persistence and replay | How snapshots, event logs, and checkpoints make sessions recoverable and debuggable |
Installation
Install the package from PyPI:
pip install agentforge-harness
agentforge init
agentforge doctor
agentforge
For local development from this repository:
pip install -e ".[dev]"
agentforge --version
agentforge doctor
Full setup guide: docs/getting-started.md.
Architecture
flowchart TB
accTitle: AgentForge System Architecture
accDescr: The CLI sends user messages to the agent session. The session owns model access, context, tools, MCP clients, safety checks, hooks, and UI events.
user["User"]
cli["CLI<br/>agentforge_harness/cli"]
tui["Rich TUI<br/>agentforge_harness/ui/tui.py"]
agent["Agent Loop<br/>agentforge_harness/agent/agent.py"]
session["Session<br/>agentforge_harness/agent/session.py"]
context["Context Manager<br/>agentforge_harness/context/manager.py"]
prompts["Prompt Builder<br/>agentforge_harness/prompts/system.py"]
client["LLM Client<br/>agentforge_harness/client/llm_client.py"]
registry["Tool Registry<br/>agentforge_harness/tools/registry.py"]
tools["Built-in Tools<br/>agentforge_harness/tools/builtin"]
mcp["MCP Manager<br/>agentforge_harness/tools/mcp"]
approvals["Approval Manager<br/>agentforge_harness/safety/approval.py"]
hooks["Hook System<br/>agentforge_harness/hooks/hook_system.py"]
compaction["Chat Compactor<br/>agentforge_harness/context/compaction.py"]
persistence["Persistence Manager<br/>agentforge_harness/agent/persistence.py"]
subagents["Subagent Tools<br/>agentforge_harness/tools/subagents.py"]
user --> cli
cli --> agent
agent --> session
agent --> tui
session --> context
context --> prompts
session --> client
session --> registry
session --> mcp
session --> approvals
session --> hooks
session --> compaction
session --> persistence
registry --> tools
registry --> subagents
mcp --> registry
registry --> approvals
registry --> hooks
client --> agent
Runtime Flow
sequenceDiagram
accTitle: Agent Runtime Flow
accDescr: A user message enters the CLI, is added to context, sent to the model, and may produce tool calls. Tool results are added back to context until the agent returns a final answer.
participant U as User
participant C as CLI
participant A as Agent
participant X as Context
participant L as LLM Client
participant R as Tool Registry
participant S as Safety
participant T as Tool
participant UI as TUI
U->>C: prompt
C->>A: run(message)
A->>X: add user message
A->>L: chat_completion(messages, tools)
L-->>A: text deltas and tool calls
A-->>UI: stream text events
alt model calls tools
A->>R: invoke(tool, params)
R->>S: approval check
S-->>R: approved or rejected
R->>T: execute
T-->>R: ToolResult
R-->>A: observation
A->>X: add tool result
A->>L: continue with updated context
else no tool calls
A-->>C: final response
end
Context Flow
flowchart LR
accTitle: Context Assembly Flow
accDescr: The context manager combines system prompt, project instructions, remembered context, conversation messages, assistant tool calls, and tool results. Compaction replaces old messages with a continuation summary when context grows too large.
system["System Prompt"]
project["Project Instructions"]
memory["User Memory"]
tools["Tool Guidelines"]
messages["Conversation Messages"]
results["Tool Results"]
compactor["Compactor"]
request["Model Request"]
system --> request
project --> request
memory --> request
tools --> request
messages --> request
results --> request
messages --> compactor
results --> compactor
compactor --> messages
Project Structure
agentforge/
|-- agentforge_harness/ # Importable Python package
| |-- agent/ # Agent loop, events, persistence, and sessions
| |-- cli/ # Click CLI and interactive commands
| |-- client/ # Provider-aware LLM client
| |-- config/ # Pydantic config and loaders
| |-- context/ # Message history, compaction, loop detection
| |-- hooks/ # Before/after agent/tool hooks
| |-- prompts/ # System prompt sections and compaction prompts
| |-- safety/ # Approval policies and circuit breaker
| |-- skills/ # Progressive skill discovery and loading
| |-- tools/ # Built-in tools, registry, MCP, subagents
| |-- ui/ # Rich terminal rendering
| `-- utils/ # Path and text helpers
|-- README.MD # Project documentation
|-- pyproject.toml # Package metadata for agentforge-harness
|-- requirements.txt # Runtime dependency mirror
|-- LICENSE # MIT license
|-- .env.example # Example API configuration
|-- .agentforge/
| |-- config.toml # Project-local config
| `-- tools/ # Project-local dynamic tools
`-- tests/ # Pytest suite
Core Design
Agent Loop
The agent loop in agentforge_harness/agent/agent.py is the heart of the harness.
At a high level it:
- Adds the user message to context.
- Sends messages and tool schemas to the model.
- Streams text deltas to the TUI.
- Collects completed tool calls.
- Executes tools through the registry.
- Adds tool results back to context.
- Repeats until the model returns no tool calls.
This is a hybrid ReAct/function-calling loop: the model reasons in natural language and acts through typed tools.
Session
agentforge_harness/agent/session.py wires together the long-lived objects for one interactive run:
LLMClientToolRegistryMCPManagerContextManagerApprovalManagerHookSystemChatCompactorLoopDetectorPersistenceManager- session ID and turn count
The session owns snapshot creation and restoration. It captures conversation messages, token usage, config metadata, active tools, MCP server names, todos, active mode, active skills, and event sequence state.
Tools
Tools inherit from Tool in agentforge_harness/tools/base.py.
Each tool provides:
- a stable
name - a
description - a
ToolKind - a Pydantic schema
- an async
execute()method - optional approval metadata through
get_confirmation()
Built-in tools include:
| Tool | Kind | Purpose |
|---|---|---|
read_file |
read | Read text files with line numbers |
write_file |
write | Create or overwrite files |
append_file |
write | Append text to the end of a file |
edit |
write | Replace exact text in files |
apply_patch |
write | Apply a unified diff across one or more files with dry-run validation and patch intent metadata |
git_diff |
read | Inspect working tree or staged git changes without mutating the repo |
shell |
shell | Run shell commands with timeout and approval |
list_dir |
read | List directory entries |
grep |
read | Search file contents with regex |
glob |
read | Find files by glob pattern |
todos |
memory | Track session tasks |
memory |
memory | Store user preferences and notes |
web_search |
network | Search the web |
web_fetch |
network | Fetch URL content |
Tool Invocation Contract
The registry is responsible for:
- Looking up the tool.
- Validating params against the schema.
- Running before-tool hooks.
- Checking approval for mutating operations.
- Executing the tool.
- Redacting secrets from model-visible tool results when enabled.
- Marking tool observations as untrusted data when prompt-injection protection is enabled.
- Running after-tool hooks.
- Returning a
ToolResult.
Future improvement: evolve ToolResult from mostly raw text into a structured observation:
{
"status": "success",
"summary": "Read 120 lines from README.MD",
"artifacts": ["README.MD"],
"next_actions": [],
"error_type": null,
"retryable": false
}
This is one of the most important harness-learning upgrades because model recovery quality depends heavily on observation quality.
Safety and Approval
Tool outputs pass through centralized secret redaction in agentforge_harness/utils/redaction.py before after-tool hooks, model context, TUI events, persistence, and exports see the result. Tool-call arguments are also redacted before TUI display and hook environment variables, and approval confirmations redact commands, params, and diff previews before asking the user. Redaction is enabled by default and records non-secret metadata such as redaction count and detected secret kinds.
Current redaction coverage includes common OpenAI/OpenRouter/Anthropic API key shapes, GitHub tokens, JWTs, private key blocks, and generic API_KEY/TOKEN/SECRET/PASSWORD assignments. This protects obvious leaks in observations, but it is not a sandbox and does not make arbitrary tools or MCP servers safe.
Before redaction, tool results pass through output hygiene in agentforge_harness/safety/output_hygiene.py. This strips ANSI escape sequences and unsafe control characters while preserving normal whitespace, then truncates large model-visible fields according to max_tool_output_tokens. Hygiene metadata records how many terminal sequences or control characters were removed and which fields were truncated.
Tool observations also pass through prompt-injection boundary handling in agentforge_harness/safety/prompt_injection.py. When enabled, tool results carry trust metadata and model-visible observations are wrapped in <untrusted_content> tags. The wrapper tells the model that file contents, command output, web pages, MCP responses, and other tool observations are data, not instructions. This reduces accidental instruction promotion while keeping the original TUI output readable.
Prompt-injection protection is a boundary layer, not a complete policy engine. It does not yet trace whether a later tool call was derived from untrusted content, and it does not sandbox shell commands or MCP servers.
The approval layer in agentforge_harness/safety/approval.py classifies operations using:
- tool mutability
- command safety patterns
- affected paths
- danger flags from tools
- configured approval policy
Supported approval modes:
| Mode | Meaning |
|---|---|
on-request |
Ask before non-safe mutating operations |
on-failure |
Allow most operations, useful for autonomous retries |
auto |
Auto-approve most operations except explicitly dangerous ones |
auto-edit |
Auto-approve safe commands, ask for edits and riskier operations |
never |
Reject non-safe operations |
yolo |
Approve all operations, including dangerous ones |
Future improvement: replace simple safe/dangerous command regexes with command classes such as read-only, test, build, install, git-write, server, network, and destructive.
Hooks
Hooks let external scripts observe or react to lifecycle events.
Supported triggers:
| Trigger | When It Runs |
|---|---|
before_agent |
Before a user message enters the agent loop |
after_agent |
After the agent returns a response |
before_tool |
Before a tool is executed |
after_tool |
After a tool returns |
on_error |
When explicit error handling is added |
Hooks are configured in .agentforge/config.toml.
Hook commands receive AgentForge runtime context through environment variables:
| Variable | Meaning |
|---|---|
AGENTFORGE_TRIGGER |
Hook trigger name |
AGENTFORGE_CWD |
Agent working directory |
AGENTFORGE_TOOL_NAME |
Tool name for tool hooks |
AGENTFORGE_TOOL_PARAMS |
JSON-encoded tool params |
AGENTFORGE_TOOL_RESULT |
Tool result text for after-tool hooks |
AGENTFORGE_USER_MESSAGE |
User message for agent hooks |
AGENTFORGE_RESPONSE |
Agent response for after-agent hooks |
AGENTFORGE_ERROR |
Error text for error hooks |
Example:
hooks_enabled = true
[[hooks]]
name = "test_before_tool"
trigger = "before_tool"
command = "python3 ./scripts/test_tool.py"
Future improvement: add blocking/non-blocking hook policy:
failure_mode = "block" # block | warn | ignore
MCP Integration
The MCP layer allows external MCP servers to expose tools to the agent.
Configuration example:
[mcp_servers.filesystem]
command = "npx"
args = [
"-y",
"@modelcontextprotocol/server-filesystem",
"/path/to/agentforge"
]
MCP tools are registered with names like:
filesystem__read_file
The server__tool naming pattern avoids collisions between built-in tools and remote tools.
Subagents
Subagents are specialist agents exposed as tools.
Current examples:
subagent_exploresubagent_debuggersubagent_codebase_investigatorsubagent_code_reviewersubagent_test_plannersubagent_architect- project-defined subagents from config
The built-in subagents are read-only by default. They can inspect files, grep, glob, and list directories, but they do not edit files. This makes them useful for safe delegation before adding full swarm orchestration.
Subagents are useful for bounded delegation:
Parent agent -> subagent(goal) -> isolated specialist loop -> result
They are not the same as swarm mode. Subagents are tool-level delegation; swarm mode is a harness-level orchestration strategy that manages multiple agents, budgets, shared task state, and result merging.
Context Management
The context manager owns:
- the system prompt
- user messages
- assistant messages
- tool results
- token usage
- pruning old tool output
- replacing old history with a compaction summary
Compaction is handled by agentforge_harness/context/compaction.py, which asks the model to produce a continuation summary when context grows too large.
Future improvement: add explicit category budgets:
| Category | Example Budget |
|---|---|
| System prompt | fixed and small |
| Active skills | capped by selected task |
| Recent messages | preserve latest turns |
| Tool results | preserve recent and artifact-bearing results |
| File reads | summarize older reads |
| Memory | compact and user-specific |
| Compaction summaries | preserve phase boundaries |
Modes Roadmap
The project should evolve toward three top-level modes:
| Mode | Purpose | Tool Policy |
|---|---|---|
| Plan | Inspect, reason, and design an approach | Read-only tools; block mutations |
| Build | Implement, test, and verify | Normal tools through approval policy |
| Swarm | Coordinate multiple agents for large tasks | Orchestrated workers with scoped tools |
stateDiagram-v2
accTitle: Planned Agent Modes
accDescr: Plan mode blocks mutations, Build mode executes changes through approvals, and Swarm mode coordinates multiple scoped workers for large tasks.
[*] --> Build
Build --> Plan: /plan
Plan --> Build: /build
Build --> Swarm: /swarm
Swarm --> Build: merge results
Plan --> Swarm: parallel investigation
Swarm --> Plan: summarize findings
Plan Mode
Plan mode should:
- inspect files
- search the codebase
- ask clarifying questions
- produce a plan
- block mutating tools at the registry layer
This must be enforced by the harness, not only by prompt text.
Build Mode
Build mode should:
- create a checkpoint before first mutation
- edit files
- run tests and checks
- summarize changed files
- report verification results
Swarm Mode
Swarm mode should start as read-only.
The first useful version:
/swarm investigate "why shell commands sometimes hang"
The orchestrator can spawn multiple read-only agents with different goals, then merge findings.
Write-capable swarm mode should wait until workspace rollback, file ownership, cancellation, and deterministic replay are in place.
Skills Roadmap
Skills should be implemented using progressive disclosure.
flowchart LR
accTitle: Skill Progressive Disclosure
accDescr: The agent first sees a compact skill index. It then loads metadata, full skill content, and references only when the task needs them.
index["Skill Index<br/>tiny and always available"]
metadata["Skill Metadata<br/>loaded when relevant"]
body["Full SKILL.md<br/>loaded when selected"]
refs["References<br/>loaded on demand"]
prompt["Prompt Context"]
index --> metadata
metadata --> body
body --> refs
index --> prompt
body --> prompt
refs --> prompt
Current root detection happens during config loading. AgentForge detects:
- project-level skills in
.agentforge/skills - user-home skills in
~/.agents/skills - user config skills in
agentforge/skills - extra configured roots from
skill_roots
Only root paths are stored in config at this stage. Full SKILL.md bodies should be loaded later by the skill manager only after a skill is selected.
Recommended project layout:
.agentforge/
`-- skills/
|-- debugging/
| |-- SKILL.md
| `-- references/
|-- tdd/
| `-- SKILL.md
`-- code-review/
`-- SKILL.md
The global user skill directory is ~/.agents/skills, and it follows the same folder shape.
If you want to keep a standalone .skills directory somewhere else, add it explicitly:
skill_roots = [".skills"]
The internal skill folder shape stays the same:
skills/
|-- debugging/
| |-- SKILL.md
| `-- references/
|-- tdd/
| `-- SKILL.md
`-- code-review/
`-- SKILL.md
Skill loading should follow this rule:
Keep the full skill index local to the harness.
Show the user skill discovery and activation in the TUI.
Inject only selected skill bodies into the model prompt.
Load reference files only when the selected skill asks for them.
Automatic skill matching is intentionally conservative:
- exact skill names win first, so
frontend design skillloads onlyfrontend-design - aliases, command names, display names, and folder names are matched as skill metadata
- inferred matches load at most one skill per user message
- low-confidence overlap is ignored instead of bloating the prompt
- the TUI shows the matched skill, reason, source file, and loaded line count
- inactive skill names and descriptions stay out of the system prompt
Persistence, Checkpoints, and Replay
AgentForge now has a first version of transcript persistence. The implementation lives in agentforge_harness/agent/persistence.py and is wired through agentforge_harness/agent/session.py and the interactive commands in agentforge.
Persistence is split into three surfaces:
| Surface | Status | Purpose |
|---|---|---|
| Session snapshot | implemented | Resume an interactive session after saving |
| Event log | implemented | Inspect what happened during a run |
| Checkpoint | implemented | Restore chat/context state to a saved point |
| Deterministic replay | planned | Re-run a recorded trace without calling the model |
| Workspace rollback | planned | Restore file state, not only chat/context state |
Session Snapshots
Snapshots are stored under the platform data directory for agentforge in sessions/.
Each snapshot stores:
- schema version
- session ID
- created/updated timestamps
- turn count
- working directory
- redacted config snapshot
- message history with tool call metadata
- latest and total token usage
- active tool names
- MCP server names
- todo state
- event sequence
- mode placeholder
Snapshot writes are atomic and saved files are restricted to owner-only permissions.
Event Logs
Every agent event handled by the CLI is appended to JSONL under events/.
{
"schema_version": 1,
"session_id": "uuid",
"turn": 3,
"sequence": 42,
"type": "tool_call_complete",
"timestamp": "2026-05-21T12:00:00Z",
"payload": {}
}
This is the foundation for replay, debugging, audit trails, and UI trace inspection.
Checkpoints
Checkpoints are currently session snapshots stored under checkpoints/. They restore chat/context state, usage, todos, and session metadata.
Current checkpoint state includes:
- message history
- token usage
- redacted config snapshot
- working directory
- active tools and MCP server names
- todos
- event sequence
Still planned:
- changed-file snapshots
- git diff capture
- checkpoint reasons, such as manual, before mutating tool, or before dangerous command
- workspace restore
- deterministic replay from event logs
Configuration
Configuration is loaded from:
.env- user config directory from
platformdirs - project-local
.agentforge/config.toml
Environment Variables
| Variable | Purpose |
|---|---|
OPENROUTER_API_KEY |
OpenRouter API key |
OPENAI_API_KEY |
OpenAI API key |
ANTHROPIC_API_KEY |
Anthropic API key |
API_KEY |
Generic fallback key for custom/OpenAI-compatible providers |
OPENROUTER_BASE_URL |
Optional OpenRouter-compatible base URL override |
OPENAI_BASE_URL |
Optional OpenAI-compatible base URL override |
ANTHROPIC_BASE_URL |
Optional Anthropic-compatible base URL override |
BASE_URL |
Generic fallback base URL for custom providers |
Example:
OPENROUTER_API_KEY=sk-or-v1-...
AgentForge supports these model providers:
| Provider | SDK path | Typical model name |
|---|---|---|
openrouter |
OpenAI-compatible | openrouter/free |
openai |
OpenAI SDK | gpt-4o-mini |
anthropic |
Anthropic SDK | claude-3-5-sonnet-latest |
custom |
OpenAI-compatible | local/model |
Project Config
Example .agentforge/config.toml:
hooks_enabled = true
approval = "on-request"
max_turns = 100
output_hygiene_enabled = true
redaction_enabled = true
prompt_injection_protection_enabled = true
skills_enabled = true
# Optional extra roots. `.agentforge/skills` is detected automatically.
skill_roots = [".skills"]
[model]
provider = "openrouter"
name = "deepseek/deepseek-v4-flash:free"
temperature = 1.0
context_window = 256000
max_output_tokens = 4096
fallbacks = ["openai/gpt-4o-mini", "anthropic/claude-sonnet-4"]
# For custom providers, set either model.base_url here or BASE_URL in the environment.
# provider = "custom"
# base_url = "http://localhost:11434/v1"
# Self-healing: after 3 consecutive errors on a model,
# its circuit breaker opens for 60s, then the agent
# tries the next fallback in the chain automatically.
[[subagents]]
name = "code-explainer"
description = "Explains how specific code works"
goal_prompt = "You are a code explanation specialist."
allowed_tools = ["read_file", "glob", "list_dir"]
max_turns = 10
timeout_seconds = 120
[[hooks]]
name = "test_before_tool"
trigger = "before_tool"
command = "python3 ./scripts/test_tool.py"
CLI Usage
Run a single prompt:
agentforge run "read the current project and explain the agent loop"
Start interactive mode:
agentforge
Use a different working directory:
agentforge run --cwd /path/to/project
Check local readiness:
agentforge doctor
agentforge doctor --json
Print the latest saved session report without starting the agent:
agentforge report
agentforge report --json
agentforge report --session-id <session_id>
Interactive commands:
| Command | Status | Purpose |
|---|---|---|
/help |
implemented | Show commands |
/exit, /quit |
implemented | Exit interactive mode |
/new |
implemented | Start a fresh session |
/reload |
implemented | Reload config from disk in-place |
/version |
implemented | Show AgentForge version |
/retry |
implemented | Resend the last user message |
/history [n] |
implemented | Show last N messages (default 10) |
/report |
implemented | Show session summary report (/report --json for machine-readable output) |
/clear |
implemented | Clear conversation history |
/config |
implemented | Show configuration (Rich Table) |
/doctor |
implemented | Check config, provider keys, skill roots, MCP commands, and safety flags |
/doctor fix |
implemented | Apply safe doctor fixes |
/provider [name] |
implemented | Show or switch provider for current session |
/models [--page N] [--limit N] |
implemented | List model suggestions for the current provider |
/model list |
implemented | Alias for /models |
/model [name] |
implemented | Show or change model for current session |
/fallbacks |
implemented | Show or edit fallback model chain |
/paths |
implemented | Show config, env, data, sessions, checkpoints, skills, and cwd paths |
/compact |
implemented | Force context compaction |
/errors [n] |
implemented | Show recent model/tool errors |
/approval <mode> |
implemented | Change approval policy |
/stats |
implemented | Show token/session stats |
/todos |
implemented | Show active todos |
/todos --clear |
implemented | Clear all todos |
/tools |
implemented | Show registered tools |
/skills |
implemented | List available skills |
/skill <name> |
implemented | Activate a skill |
/unskill <name> |
implemented | Deactivate a skill |
/mcp |
implemented | Show MCP server status |
/name |
implemented | Show or set session name |
/save |
implemented | Save current session snapshot |
/checkpoint |
implemented | Create checkpoint from current session |
/restore <checkpoint_id> |
implemented | Restore checkpoint state |
/checkpoints [--page N] [--limit N] |
implemented | List saved checkpoints |
/sessions [--page N] [--limit N] |
implemented | List saved sessions |
/resume <session_id> |
implemented | Resume saved session |
/plan |
implemented | Switch to plan mode (read-only tools) |
/build |
implemented | Switch to build mode (all tools) |
/export |
implemented | Export session as markdown or HTML (/export html) |
/stats |
implemented | Show session statistics |
/swarm |
planned | Run swarm orchestration |
Development
Install dependencies:
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
Build the package locally:
python -m build
python -m twine check dist/*
Compile-check the codebase:
python3 -m compileall -q agentforge_harness tests main.py scripts
Run a focused syntax check:
python3 -m py_compile agentforge_harness/agent/agent.py agentforge_harness/tools/registry.py agentforge_harness/context/manager.py
Extension examples:
- Documentation index
- Getting started
- Architecture
- Configuration
- CLI reference
- Skills
- Persistence
- Extending AgentForge
- Provider setup
- Security model
- Tool reliability standard
- Release checklist
- Manual pre-release test
- Examples index
- Custom tool example
- Skill example
- Hook example
- Subagent example
Release smoke:
python3 scripts/release_smoke.py
Recommended future test layout:
tests/
|-- test_agent_loop.py
|-- test_tool_registry.py
|-- test_approval.py
|-- test_context_compaction.py
|-- test_loop_detector.py
|-- test_transcript_replay.py
`-- test_checkpoints.py
Learning Roadmap
See ROADMAP.md for the release roadmap. The short version:
- v1.0: stable learning harness with polished docs, reliable tools, release hygiene, and clear safety notes.
- v1.1: Skills v2 with better ranking, validation, TUI explanations, and reference loading.
- v1.2: deterministic replay and trace debugging.
- v1.3: local evals.
- v1.4: browser-assisted local QA.
- v1.5: read-only swarm.
- v2.0: isolated write-capable orchestration.
Before v1, avoid adding large new systems. The release should first make the existing harness easier to install, understand, verify, and extend.
Current Status
Implemented:
- Streaming LLM client
- OpenAI-compatible API support
- Rich TUI
- Tool registry and Pydantic schemas
- Built-in file/search/shell/web/memory/todos tools
- Dynamic local tool discovery from
.agentforge/tools grepwith context lines parameter- Approval manager with 6 modes
- Hook system (before/after agent, before/after tool, on error)
- MCP tool adapter
- Subagents with configurable allowed_tools
- Context manager with compaction and pruning
- Persistent user memory
- Loop detector (repeated action + cycle detection)
- Circuit breaker + model fallback chain
- Session snapshots, event JSONL logs, checkpoints
- Resume, restore, and checkpoint commands
- Plan/build modes with tool filtering
- Config hot-reload (
/reload) - Skill system with progressive disclosure, auto-activation, and body token limit
- Context budget estimation (70% warning, 80% auto-compress)
- Per-tool error isolation
- Observation fields on all tools (summary, next_actions, artifacts, recovery_hint)
- CLI commands:
new,reload,version,retry,history,report,export,todos --clear,configpretty-print - Package metadata for
agentforge-harnesswithagentforgeCLI entry point
In progress or planned:
- Cost tracking (
/cost) - Secret scanning
- Prompt injection test suite
- Web browser tool (Playwright)
- Git tools
- Deterministic replay
- Swarm mode
- Workspace rollback for checkpoints
Design Principles
- Keep tools schema-first and explicit.
- Keep system prompt small and stable.
- Load large guidance through skills on demand.
- Treat tool outputs as observations, not just strings.
- Enforce safety in the harness, not only in prompts.
- Record enough state to replay and debug failures.
- Add orchestration only after persistence and checkpoints exist.
License
MIT
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi