evo

agent
Guvenlik Denetimi
Uyari
Health Gecti
  • License — License: Apache-2.0
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 362 GitHub stars
Code Uyari
  • process.env — Environment variable access in plugins/evo/skills/discover/references/inline_instrumentation.js
Permissions Gecti
  • Permissions — No dangerous permissions requested
Purpose
This agent acts as a plugin for Claude Code and Codex that autonomously optimizes your codebase. It discovers metrics to measure, instruments benchmarks, and runs parallel experiments using a tree-search structure to keep only the code changes that improve performance.

Security Assessment
Risk: Medium. Because this tool is an autonomous agent designed to modify code, it inherently executes shell commands and heavily alters your local file system. It actively manages git worktrees, runs experiments, and changes code without requiring step-by-step manual approval. The scanner flagged a minor warning regarding environment variable access inside an instrumentation script, but no hardcoded secrets or dangerous broad permissions were found. The primary risk stems from granting an autonomous system the ability to execute commands and alter your repository.

Quality Assessment
The project appears active and well-maintained. It received a push very recently, indicating ongoing development. Backed by 362 GitHub stars, it has garnered a solid level of community trust and validation. Furthermore, it uses the standard Apache-2.0 license, making it safe and clear for both personal and commercial use.

Verdict
Use with caution. While it is a legitimate, high-quality open-source tool, its core function requires granting an autonomous agent broad permissions to run commands and rewrite code, which you should only execute in isolated or non-production environments.
SUMMARY

A plugin for Claude Code and Codex that turns your codebase into an autoresearch loop — discovers what to measure, instruments the benchmark, then runs tree search with parallel subagents.

README.md

evo banner

evo

A plugin for Claude Code and Codex that optimizes code through experiments. You give it a codebase. It discovers metrics to optimize, sets up the evaluation, and starts running experiments in a loop -- trying things, keeping what improves the score, throwing away what doesn't.

Inspired by Karpathy's autoresearch -- where an LLM runs training experiments autonomously to beat its own best score. Autoresearch is a pure hill climb: try something, keep or revert, repeat on a single branch. Evo adds structure on top of that idea:

  • Tree search over greedy hill climb. Multiple directions can fork from any committed node, so exploration doesn't collapse to one path.
  • Parallel semi-autonomous agents. Spawn multiple subagents and run them simultaneously, each in its own git worktree. Each subagent reads traces, formulates hypotheses, and can run multiple iterations within its branch.
  • Shared state. Failure traces, annotations, and discarded hypotheses are accessible to every agent before it decides what to try next.
  • Gating. Regression tests or safety checks can be wired up as a gate. Experiments that don't pass get discarded.
  • Observability. A dashboard to monitor your experiments.
  • Benchmark discovery. evo:discover explores the repo, figures out what to measure, and instruments the evaluation.

Install

Common requirements: Python 3.12+, git, uv.

Claude Code

/plugin marketplace add evo-hq/evo
/plugin install evo@evo-hq-evo

Reload Claude Code. /evo:discover and /evo:optimize become available in any repo.

Codex

Codex needs the evo CLI installed globally. Install once, outside Codex:

uv tool install evo-hq-cli
# or: pipx install evo-hq-cli
evo --version   # should print: evo-hq-cli x.x.x

Then add the plugin (requires Codex 0.121.0-alpha.2 or newer for the marketplace add command -- npm install -g @openai/codex@alpha if you're still on 0.120.0 stable):

codex marketplace add evo-hq/evo

Open Codex, run /plugins, find evo, install. Skills become available as $evo discover and $evo optimize.

Usage

Two skills:

  • evo:discover -- explores the repo, instruments the benchmark, runs baseline
  • evo:optimize -- runs the optimization loop with parallel subagents until interrupted

Invocation syntax differs by host:

  • Claude Code: /evo:discover, /evo:optimize
  • Codex: $evo discover, $evo optimize

evo:optimize accepts optional parameters:

Parameter Default Description
subagents 5 Number of parallel subagents per round
budget 5 Max iterations each subagent can run within its branch
stall 5 Consecutive rounds with no improvement before auto-stopping

Example: /evo:optimize subagents=3 budget=10 stall=3 (or $evo optimize subagents=3 budget=10 stall=3 on Codex).

Typical flow:

you: evo:discover
evo: explores repo, instruments benchmark, runs baseline

you: evo:optimize
evo: spawns 5 subagents in parallel, each exploring a different direction
     each subagent can run up to 5 iterations within its branch
     orchestrator collects results, prunes dead branches, adjusts strategy
     repeats until interrupted or stalled

Under the hood, each experiment gets its own git worktree branching from its parent. If the score improves and the gate passes, the experiment is committed. Otherwise it's discarded and the worktree is cleaned up.

Architecture

Orchestrator (main agent)
  - reads state, identifies failure patterns cross-cutting the tree
  - writes a structured brief per subagent (objective, parent, boundaries, pointer traces)
  - collects results, prunes dead branches, adjusts strategy for next round

  Subagent 1 (background, budget: 5 iterations)
    - reads traces, analyzes failures in its focus area
    - formulates hypothesis, edits target, runs benchmark
    - if budget remains and sees a follow-up, iterates on its branch
    - returns: what it tried, what worked, what it learned

  Subagent 2 (background, budget: 5 iterations)
    ...up to N subagents in parallel

Dashboard

The dashboard starts automatically when you run evo:discover (or evo init). When it comes up, the agent surfaces the URL in the chat:

Dashboard live: http://127.0.0.1:8080 (pid 12345)

If 8080 is busy, evo auto-increments (8081, 8082, ...) and prints the actual port. You can also start it manually:

uv run --project /path/to/evo evo dashboard --port 8080

The chosen port is persisted to .evo/dashboard.port so repeat runs re-use it.

Dev install

For working on evo itself (not just using it):

git clone https://github.com/evo-hq/evo
uv run --project /path/to/evo evo status

uv run resolves dependencies on first use -- no pip install step.

TODO

  • Distributed evaluation via Harbor -- run benchmarks in containers instead of locally, use Harbor's cloud providers to parallelize.

License

Licensed under the Apache License 2.0.

Yorumlar (0)

Sonuc bulunamadi