dojo.md
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
This MCP server acts as a training environment for AI agents. It runs models through scenario-based courses and automatically generates portable skill documents that teach the agent how to handle specific tasks reliably.
Security Assessment
Overall risk: Medium. The tool relies on external network requests, specifically requiring an OpenRouter API key to route queries to various LLMs and to run the automated LLM judge. While a light code scan of 12 files found no hardcoded secrets or dangerous code patterns, and the tool requests no dangerous system permissions, integrating it requires exposing your OpenRouter API credentials to the application. Additionally, the tool works by dynamically generating and writing files to your local project directories, which inherently modifies your codebase.
Quality Assessment
The project is actively maintained, with its most recent push happening today. It uses the permissive and standard MIT license. However, community visibility and trust are currently very low, as indicated by having only 5 GitHub stars. Because it is a new and untested project in the broader open-source community, there is limited user feedback available to verify long-term reliability or catch edge-case bugs.
Verdict
Use with caution — the tool appears safe from a code perspective, but its newness, requirement for external API keys, and local file modifications warrant careful testing.
University for AI agents. 92 courses, 4400+ scenarios, any model via OpenRouter. Auto-training loops generate per-model SKILL.md documents. Works with Claude Code, OpenClaw, Cursor, Windsurf. No fine-tuning required.
dojo.md
Your agent demos well. It fails in production. dojo.md fixes that.
Train any model through scenario-based courses. Graduate with a SKILL.md — portable expertise that makes agents reliable. No fine-tuning. No weight modification. Just knowledge, distilled and proven.
Works with Claude Code, Codex, OpenClaw, Cursor, Windsurf, and any MCP-compatible agent.
"Hey Claude, train yourself on cold-email-b2b and loop until you hit 90"
Iteration 1: 31/100 — doesn't know subject line rules, no personalization
Iteration 2: 58/100 — SKILL.md injected, learns "under 50 chars, no caps"
Iteration 3: 74/100 — gets structure right, still weak on CTAs
Iteration 4: 86/100 — nails the pain→solution→ask framework
Iteration 5: 91/100 — ✓ target reached
SKILL.md → .claude/skills/cold-email-b2b/SKILL.md
Now every cold email it writes follows the framework. Permanently.
What Just Happened
You told Claude Code to train itself. It ran through 50 scenarios of progressively harder cold emails — bad subject lines, wrong tone, missing personalization, weak CTAs. An LLM judge scored every attempt. After 5 iterations, it extracted everything it learned into a SKILL.md that lives in your project forever.
Next time you say "write a cold email to this VP of Engineering", it loads the SKILL.md automatically. It knows the rules now.
This works for anything:
# Your agent writes terrible Google Ads? Fix it in 5 minutes
dojo train ad-copy-google-ads --target 85
# Support agent keeps giving wrong refund info? Train it
dojo train stripe-refunds --target 90
# Code reviews are too vague? There's a course for that
dojo train code-review-feedback-writing --target 85
# Incident postmortems are weak? Train on 50 real scenarios
dojo train incident-postmortem-writing --target 80
# Your agent can't write a proper RFC? Now it can
dojo train technical-rfc-writing --target 85
How the loop works
Scenarios → Mock Services → LLM Judge → Failure Patterns → SKILL.md → Re-inject → Repeat
Each iteration: the agent gets smarter. The SKILL.md compounds. It stops when it hits the target or plateaus.
Per-model skills
Claude, GPT, DeepSeek — they all fail differently. Each gets its own SKILL.md:
.claude/skills/cold-email-b2b/
├── anthropic--claude-sonnet-4-6/SKILL.md # Was too formal, learned casual tone
├── openai--gpt-4o/SKILL.md # Was too long, learned brevity
└── deepseek--deepseek-v3.2/SKILL.md # Missed personalization hooks
Quick Start
Option 1: Zero-cost with Claude Code or Codex (recommended)
Already paying for Claude Code or Codex? Training costs $0 extra. The agent trains AND judges itself — no API keys needed.
Just paste this into Claude Code:
Install dojo.md as an MCP server, then train yourself on cold-email-b2b
using autopilot mode. Loop until you hit 90.
Or add dojo as an MCP server manually:
{
"mcpServers": {
"dojo": { "command": "npx", "args": ["dojo.md", "mcp"] }
}
}
Then tell your agent what to train on. It handles the rest.
CLI (dojo train) |
Autopilot (Claude Code / Codex) | |
|---|---|---|
| Cost | ~$0.50–5 per run | $0 extra |
| Agent | API calls | Your subscription |
| Judge | API calls | Agent self-judges |
| Setup | API keys required | Just MCP config |
Option 2: CLI with any model via OpenRouter
npm install -g dojo.md
export OPENROUTER_API_KEY=sk-or-...
# Train DeepSeek on Google Ads copy for $0.03
dojo train ad-copy-google-ads --model deepseek/deepseek-v3.2 --target 85
# Train GPT-5 on incident response, judged by Claude
dojo train incident-response --model openai/gpt-5.2 --judge claude-sonnet-4-6 --target 90
# Train Gemini on customer support escalation
dojo train customer-support-escalation --model google/gemini-3-flash-preview --target 80
Arena — Model Benchmarking
Compare models head-to-head on the same course. Same judge, same scenarios, no SKILL.md — raw capability only.
dojo arena ad-copy-google-ads --level 1
═══ Arena Leaderboard ════════════════════════
1st Claude Opus 4.6 █████████████████░░░ 84
2nd Claude Sonnet 4.6 █████████████████░░░ 84
3rd GPT-5.2 ████████████████░░░░ 82
4th GLM 5 ████████████████░░░░ 79
5th Gemini 3 Flash ███████████████░░░░░ 76
══════════════════════════════════════════════
Above 70, every point gets exponentially harder — like ELO, small gaps mean big differences. See the live leaderboard.
Any Model
200+ models via OpenRouter:
dojo train cold-email-b2b --model openai/gpt-4o
dojo train cold-email-b2b --model google/gemini-2.5-pro
dojo train cold-email-b2b --model deepseek/deepseek-v3.2
dojo train cold-email-b2b --model x-ai/grok-4.1-fast
dojo train cold-email-b2b --model meta-llama/llama-3.3-70b-instruct
125 Pre-Built Courses (6,250+ Scenarios)
| Domain | Examples | Courses |
|---|---|---|
| Customer Support | Stripe refunds, escalation, churn prevention, SLA breaches, onboarding | 14 |
| Marketing & Content | Google Ads, Meta ads, SEO blogs, email sequences, social media, UGC | 18 |
| Sales & Revenue | Cold email B2B, objection handling, proposals, battlecards, lead scoring | 9 |
| Engineering & DevOps | Incident response, Docker, Kubernetes, CI/CD, AWS Lambda, security | 17 |
| Writing & Docs | Technical RFCs, postmortems, SOPs, newsletters, Twitter/X threads | 16 |
| Data & Analytics | A/B testing, cohort analysis, segmentation, funnel analysis, forecasting | 9 |
| Design & UX | Accessibility audits, design systems, user personas, journey mapping | 9 |
| Education | Quiz creation, study guides, workshop facilitation, training materials | — |
| Legal & Compliance | Contract review, compliance checklists, clause summarization | — |
| Real Estate | Listing descriptions, open house promos, buyer inquiry response | — |
| Healthcare | Appointment reminders, intake review, billing inquiries, pre-auth | — |
dojo list # See all 125 courses
dojo generate "Handle Zendesk ticket routing and priority assignment" # Create your own
Works With Everything
dojo.md generates AgentSkills-standard SKILL.md files. Train once, use everywhere.
Claude Code
dojo.md is an MCP server — train from inside your IDE:
{
"mcpServers": {
"dojo": {
"command": "npx",
"args": ["dojo.md", "mcp"]
}
}
}
MCP tools: dojo_discover, dojo_train, dojo_tool, dojo_submit, dojo_results, dojo_skill, dojo_apply
OpenClaw
Drop your graduated SKILL.md into OpenClaw's skill directory. dojo.md skills follow the same AgentSkills standard — cross-compatible by design.
ClawHub has 13,000+ community skills. The difference: dojo skills are earned, not written. Every SKILL.md has a training score, validated scenarios, and failure patterns it addresses. It's a diploma, not a blog post.
Cursor, Windsurf, and any MCP agent
Same MCP config. Same SKILL.md output. Portable.
The SKILL.md Standard
Generated skills follow the AgentSkills open standard:
---
name: stripe-refunds
description: >-
Handle Stripe refund requests correctly. Use when processing
refunds, duplicate charges, or customer disputes.
---
## Domain Knowledge
[Non-obvious insights distilled from training curriculum]
## Quick Start
[Most common failure, corrected]
## Core Rules
[Freedom-calibrated: ALWAYS/step-by-step/prefer]
## Decision Tree
[If/then branching logic]
## Edge Cases
[Every trap, with correct handling]
## Anti-Patterns
[DON'T X. Instead, Y.]
The description triggers loading — ~100 tokens idle, ~5,000 tokens when activated. Progressive disclosure keeps context clean.
CLI Reference
| Command | Description |
|---|---|
dojo train <course> |
Run training session |
dojo train <course> -m openai/gpt-4o -j claude-sonnet-4-6 -t 85 |
Full multi-model auto-loop |
dojo retrain <course> |
Auto-loop with defaults (target 90, max 5) |
dojo arena <course> |
Benchmark multiple models head-to-head |
dojo arena <course> --models m1,m2,m3 |
Arena with specific models |
dojo results [course] |
Show latest results |
dojo list |
List installed courses |
dojo generate <skill> |
Generate a course from description |
Train Options
| Flag | Description | Default |
|---|---|---|
-m, --model |
Agent model | claude-sonnet-4-6 |
-j, --judge |
Judge model | claude-sonnet-4-6 |
-t, --target |
Target score (enables auto-loop) | — |
--max-retrain |
Max loop iterations | 5 |
--level |
Run specific level only | all |
--report |
Save detailed report | — |
Arena Options
| Flag | Description | Default |
|---|---|---|
--models |
Comma-separated model list | top 5 models |
-j, --judge |
Shared judge model | claude-opus-4-6 |
-l, --level |
Run specific level only | all |
-o, --output |
Output JSON path | auto-generated |
Scenario Format
meta:
id: simple-refund
level: 1
course: stripe-refunds
description: Process a straightforward refund
type: tool
state:
customers:
- id: cus_001
email: [email protected]
name: Alice Johnson
charges:
- id: ch_001
amount: 5000
customer: cus_001
status: succeeded
trigger: >
Customer Alice Johnson (cus_001) is requesting
a refund for charge ch_001 ($50.00).
assertions:
- type: api_called
tool: stripe_customers_retrieve
description: Verify customer identity
- type: api_called
tool: stripe_refunds_create
params: { charge: ch_001 }
description: Create the refund
- type: llm_judge
criteria: >
Agent confirms refund was processed and explains
the 5-10 business day timeline for the credit
to appear on the customer's statement.
description: Communicate success with timeline
Development
git clone https://github.com/edholofy/dojo.md
cd dojo.md
npm install
npm run build
npm test # 116 tests
# Dev mode
npm run dev -- train stripe-refunds
Mission
Turn experience into expertise for AI agents.
Today: Author courses, train models, graduate with SKILL.md.
Tomorrow: Production feedback loops that generate scenarios from real failures.
Future: The open knowledge layer for agent expertise — proven, portable, model-agnostic.
License
MIT
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found