dojo.md

mcp
Security Audit
Warn
Health Warn
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Pass
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose
This MCP server acts as a training environment for AI agents. It runs models through scenario-based courses and automatically generates portable skill documents that teach the agent how to handle specific tasks reliably.

Security Assessment
Overall risk: Medium. The tool relies on external network requests, specifically requiring an OpenRouter API key to route queries to various LLMs and to run the automated LLM judge. While a light code scan of 12 files found no hardcoded secrets or dangerous code patterns, and the tool requests no dangerous system permissions, integrating it requires exposing your OpenRouter API credentials to the application. Additionally, the tool works by dynamically generating and writing files to your local project directories, which inherently modifies your codebase.

Quality Assessment
The project is actively maintained, with its most recent push happening today. It uses the permissive and standard MIT license. However, community visibility and trust are currently very low, as indicated by having only 5 GitHub stars. Because it is a new and untested project in the broader open-source community, there is limited user feedback available to verify long-term reliability or catch edge-case bugs.

Verdict
Use with caution — the tool appears safe from a code perspective, but its newness, requirement for external API keys, and local file modifications warrant careful testing.
SUMMARY

University for AI agents. 92 courses, 4400+ scenarios, any model via OpenRouter. Auto-training loops generate per-model SKILL.md documents. Works with Claude Code, OpenClaw, Cursor, Windsurf. No fine-tuning required.

README.md

dojo.md

dojo.md — Training Arena for AI Agents

Your agent demos well. It fails in production. dojo.md fixes that.

Train any model through scenario-based courses. Graduate with a SKILL.md — portable expertise that makes agents reliable. No fine-tuning. No weight modification. Just knowledge, distilled and proven.

npm CI License: MIT

Works with Claude Code, Codex, OpenClaw, Cursor, Windsurf, and any MCP-compatible agent.

"Hey Claude, train yourself on cold-email-b2b and loop until you hit 90"

  Iteration 1: 31/100 — doesn't know subject line rules, no personalization
  Iteration 2: 58/100 — SKILL.md injected, learns "under 50 chars, no caps"
  Iteration 3: 74/100 — gets structure right, still weak on CTAs
  Iteration 4: 86/100 — nails the pain→solution→ask framework
  Iteration 5: 91/100 — ✓ target reached

  SKILL.md → .claude/skills/cold-email-b2b/SKILL.md

Now every cold email it writes follows the framework. Permanently.

What Just Happened

You told Claude Code to train itself. It ran through 50 scenarios of progressively harder cold emails — bad subject lines, wrong tone, missing personalization, weak CTAs. An LLM judge scored every attempt. After 5 iterations, it extracted everything it learned into a SKILL.md that lives in your project forever.

Next time you say "write a cold email to this VP of Engineering", it loads the SKILL.md automatically. It knows the rules now.

This works for anything:

# Your agent writes terrible Google Ads? Fix it in 5 minutes
dojo train ad-copy-google-ads --target 85

# Support agent keeps giving wrong refund info? Train it
dojo train stripe-refunds --target 90

# Code reviews are too vague? There's a course for that
dojo train code-review-feedback-writing --target 85

# Incident postmortems are weak? Train on 50 real scenarios
dojo train incident-postmortem-writing --target 80

# Your agent can't write a proper RFC? Now it can
dojo train technical-rfc-writing --target 85

How the loop works

Scenarios → Mock Services → LLM Judge → Failure Patterns → SKILL.md → Re-inject → Repeat

Each iteration: the agent gets smarter. The SKILL.md compounds. It stops when it hits the target or plateaus.

Per-model skills

Claude, GPT, DeepSeek — they all fail differently. Each gets its own SKILL.md:

.claude/skills/cold-email-b2b/
├── anthropic--claude-sonnet-4-6/SKILL.md    # Was too formal, learned casual tone
├── openai--gpt-4o/SKILL.md                  # Was too long, learned brevity
└── deepseek--deepseek-v3.2/SKILL.md         # Missed personalization hooks

Quick Start

Option 1: Zero-cost with Claude Code or Codex (recommended)

Already paying for Claude Code or Codex? Training costs $0 extra. The agent trains AND judges itself — no API keys needed.

Just paste this into Claude Code:

Install dojo.md as an MCP server, then train yourself on cold-email-b2b
using autopilot mode. Loop until you hit 90.

Or add dojo as an MCP server manually:

{
  "mcpServers": {
    "dojo": { "command": "npx", "args": ["dojo.md", "mcp"] }
  }
}

Then tell your agent what to train on. It handles the rest.

CLI (dojo train) Autopilot (Claude Code / Codex)
Cost ~$0.50–5 per run $0 extra
Agent API calls Your subscription
Judge API calls Agent self-judges
Setup API keys required Just MCP config

Option 2: CLI with any model via OpenRouter

npm install -g dojo.md
export OPENROUTER_API_KEY=sk-or-...

# Train DeepSeek on Google Ads copy for $0.03
dojo train ad-copy-google-ads --model deepseek/deepseek-v3.2 --target 85

# Train GPT-5 on incident response, judged by Claude
dojo train incident-response --model openai/gpt-5.2 --judge claude-sonnet-4-6 --target 90

# Train Gemini on customer support escalation
dojo train customer-support-escalation --model google/gemini-3-flash-preview --target 80

Arena — Model Benchmarking

Compare models head-to-head on the same course. Same judge, same scenarios, no SKILL.md — raw capability only.

dojo arena ad-copy-google-ads --level 1

═══ Arena Leaderboard ════════════════════════
  1st  Claude Opus 4.6    █████████████████░░░  84
  2nd  Claude Sonnet 4.6  █████████████████░░░  84
  3rd  GPT-5.2            ████████████████░░░░  82
  4th  GLM 5              ████████████████░░░░  79
  5th  Gemini 3 Flash     ███████████████░░░░░  76
══════════════════════════════════════════════

Above 70, every point gets exponentially harder — like ELO, small gaps mean big differences. See the live leaderboard.

Any Model

200+ models via OpenRouter:

dojo train cold-email-b2b --model openai/gpt-4o
dojo train cold-email-b2b --model google/gemini-2.5-pro
dojo train cold-email-b2b --model deepseek/deepseek-v3.2
dojo train cold-email-b2b --model x-ai/grok-4.1-fast
dojo train cold-email-b2b --model meta-llama/llama-3.3-70b-instruct

125 Pre-Built Courses (6,250+ Scenarios)

Domain Examples Courses
Customer Support Stripe refunds, escalation, churn prevention, SLA breaches, onboarding 14
Marketing & Content Google Ads, Meta ads, SEO blogs, email sequences, social media, UGC 18
Sales & Revenue Cold email B2B, objection handling, proposals, battlecards, lead scoring 9
Engineering & DevOps Incident response, Docker, Kubernetes, CI/CD, AWS Lambda, security 17
Writing & Docs Technical RFCs, postmortems, SOPs, newsletters, Twitter/X threads 16
Data & Analytics A/B testing, cohort analysis, segmentation, funnel analysis, forecasting 9
Design & UX Accessibility audits, design systems, user personas, journey mapping 9
Education Quiz creation, study guides, workshop facilitation, training materials
Legal & Compliance Contract review, compliance checklists, clause summarization
Real Estate Listing descriptions, open house promos, buyer inquiry response
Healthcare Appointment reminders, intake review, billing inquiries, pre-auth
dojo list                    # See all 125 courses
dojo generate "Handle Zendesk ticket routing and priority assignment"  # Create your own

Works With Everything

dojo.md generates AgentSkills-standard SKILL.md files. Train once, use everywhere.

Claude Code

dojo.md is an MCP server — train from inside your IDE:

{
  "mcpServers": {
    "dojo": {
      "command": "npx",
      "args": ["dojo.md", "mcp"]
    }
  }
}

MCP tools: dojo_discover, dojo_train, dojo_tool, dojo_submit, dojo_results, dojo_skill, dojo_apply

OpenClaw

Drop your graduated SKILL.md into OpenClaw's skill directory. dojo.md skills follow the same AgentSkills standard — cross-compatible by design.

ClawHub has 13,000+ community skills. The difference: dojo skills are earned, not written. Every SKILL.md has a training score, validated scenarios, and failure patterns it addresses. It's a diploma, not a blog post.

Cursor, Windsurf, and any MCP agent

Same MCP config. Same SKILL.md output. Portable.

The SKILL.md Standard

Generated skills follow the AgentSkills open standard:

---
name: stripe-refunds
description: >-
  Handle Stripe refund requests correctly. Use when processing
  refunds, duplicate charges, or customer disputes.
---

## Domain Knowledge
[Non-obvious insights distilled from training curriculum]

## Quick Start
[Most common failure, corrected]

## Core Rules
[Freedom-calibrated: ALWAYS/step-by-step/prefer]

## Decision Tree
[If/then branching logic]

## Edge Cases
[Every trap, with correct handling]

## Anti-Patterns
[DON'T X. Instead, Y.]

The description triggers loading — ~100 tokens idle, ~5,000 tokens when activated. Progressive disclosure keeps context clean.

CLI Reference

Command Description
dojo train <course> Run training session
dojo train <course> -m openai/gpt-4o -j claude-sonnet-4-6 -t 85 Full multi-model auto-loop
dojo retrain <course> Auto-loop with defaults (target 90, max 5)
dojo arena <course> Benchmark multiple models head-to-head
dojo arena <course> --models m1,m2,m3 Arena with specific models
dojo results [course] Show latest results
dojo list List installed courses
dojo generate <skill> Generate a course from description

Train Options

Flag Description Default
-m, --model Agent model claude-sonnet-4-6
-j, --judge Judge model claude-sonnet-4-6
-t, --target Target score (enables auto-loop)
--max-retrain Max loop iterations 5
--level Run specific level only all
--report Save detailed report

Arena Options

Flag Description Default
--models Comma-separated model list top 5 models
-j, --judge Shared judge model claude-opus-4-6
-l, --level Run specific level only all
-o, --output Output JSON path auto-generated

Scenario Format

meta:
  id: simple-refund
  level: 1
  course: stripe-refunds
  description: Process a straightforward refund
  type: tool

state:
  customers:
    - id: cus_001
      email: [email protected]
      name: Alice Johnson
  charges:
    - id: ch_001
      amount: 5000
      customer: cus_001
      status: succeeded

trigger: >
  Customer Alice Johnson (cus_001) is requesting
  a refund for charge ch_001 ($50.00).

assertions:
  - type: api_called
    tool: stripe_customers_retrieve
    description: Verify customer identity
  - type: api_called
    tool: stripe_refunds_create
    params: { charge: ch_001 }
    description: Create the refund
  - type: llm_judge
    criteria: >
      Agent confirms refund was processed and explains
      the 5-10 business day timeline for the credit
      to appear on the customer's statement.
    description: Communicate success with timeline

Development

git clone https://github.com/edholofy/dojo.md
cd dojo.md
npm install
npm run build
npm test       # 116 tests

# Dev mode
npm run dev -- train stripe-refunds

Mission

Turn experience into expertise for AI agents.

Today: Author courses, train models, graduate with SKILL.md.
Tomorrow: Production feedback loops that generate scenarios from real failures.
Future: The open knowledge layer for agent expertise — proven, portable, model-agnostic.

License

MIT

Reviews (0)

No results found