hplan

agent
Guvenlik Denetimi
Uyari
Health Uyari
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 6 GitHub stars
Code Gecti
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions — No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

Product Build Gate (hplan) + 5 agent-PM lifecycle plugins — 6 plugins, 43 skills, 18 commands for PMs who decide what, why, and how to build AI agents

README.md

AI_PM_Skills

43 skills for PMs who decide what, why, and how to build AI agents — starting from the evidence gate that runs before everything

License: MIT
Skills
Plugins
PRs Welcome
한국어

🆕 v0.5 — hplan plugin added: the evidence + COGS + decision gate that runs before you commit to building. Mandatory interview evidence, executable COGS sentinel, append-only Do-Not-Build registry, self-eval decision log, and multi-target handoff to Spec-Kit / Kiro / GStack / Claude Code.

If you're a PM building AI agents, star this repo — it's the only skillset designed for the full agent product lifecycle.

AI_PM_Skills Demo — opp-tree skill auto-triggered

🆕 New to Claude Code?forge/claude-md scans your project, auto-generates CLAUDE.md, and recommends the right AI_PM_Skills plugins. The fastest way to onboard.


The Problem

In 2026, PMs are being asked to "build an agent" — but existing PM skills don't prepare you for that.

General PM skills teach you to use AI as a tool — write PRDs faster, generate OKRs, analyze competitors. But when you're building agents as products, the questions are fundamentally different:

  • "What would it cost to run this agent at 1,000 users/day?"
  • "How does an agent recover from hallucination?"
  • "How do I orchestrate multiple agents together?"
  • "How do I encode 3 months of operational judgment into the agent's instructions?"

This project turns those questions into 36 production-grade skills across the full agent lifecycle.


Quick Start (30 seconds)

# 1. Install the plugin
/plugin marketplace add kimsanguine/AI_PM_Skills
/plugin install oracle@kimsanguine-AI_PM_Skills

# 2. Just describe your task — the right skill loads automatically
"We handle 500 support tickets/day. Which parts should an agent handle?"
# → opp-tree skill auto-loads → opportunity mapping starts

The Agent PM Journey — 6 Stages

This isn't a random collection of skills. It's a complete lifecycle — the same path every agent PM walks. Starting in v0.5, hplan is Stage 0 — the evidence gate that decides whether the thing should be built at all.

게이트(Gate) → 발견(Discover) → 설계(Architect) → 실행(Ship) → 운영(Operate) → 학습(Learn)
   hplan          oracle            atlas            forge          argus          muse
  7 skills       6 skills          7 skills        12 skills       8 skills       3 skills
     ↑                                                                              │
     └────────────────── Accumulated TK feeds back into next agent ─────────────────┘
Stage Plugin The Question Key Skills
Gate hplan "Should we build this at all?" evidence-rubric · interview-synthesis · exclusions · cogs-sentinel · ost · decision-log · handoff
Discover oracle "What agent should we build?" opp-tree · assumptions · build-or-buy · cost-sim · hitl · agent-gtm
Architect atlas "How should we structure it?" 3-tier · orchestration · router · memory-arch · moat · growth-loop · biz-model
Ship forge "How to spec and ship it?" claude-md · prd · instruction · prompt · ctx-budget · okr · stakeholder-map · agent-plan-review + 4 comms tools
Operate argus "How to measure and improve?" kpi · reliability · premortem · burn-rate · north-star · agent-ab-test · cohort · incident
Learn muse "How to make agents smarter over time?" pm-framework · pm-decision · pm-engine

What makes hplan different from the other 5

Other plugins are prompt-driven thinking — LLM ponders, you decide.
hplan adds deterministic measurement — Python scripts calculate p50/p90 COGS margins, append-only registries persist exclusions and decisions across runs, an MCP server lets Cursor/Windsurf/Kiro/Codex call hplan primitives, and a PreToolUse hook blocks PRD/spec writes until the human approves the gate. It is paired with oracle/atlas/forge/argus/muse, not a replacement.

Each skill auto-loads from natural language — describe your task and the right skill fires. Skills also route across plugins: burn-rate (argus) detects a cost spike → suggests router (atlas) for model change → triggers cost-sim (oracle) for re-simulation.


Why This Is Different — 6 Things No Other Skillset Does

① Complete Agent Lifecycle, Not Random Tools

36 skills map to 5 stages of agent product development. This isn't "AI tools for PMs" — it's a structured methodology for building agents as products, from discovery to self-improving agents.

② Two-Layer Architecture — Platform and Content Separation

We separate how Claude finds skills (Platform Layer — Skills 2.0 spec) from what goes inside each skill (Content Layer). The Content Layer defines the Trigger Gate (Use/Route/Boundary) pattern that prevents skill collisions, plus domain-specific context in each skill's context/domain.md. Result: 97.9% trigger accuracy across 96 test queries.

┌─ Platform Layer ──── Skills 2.0 Spec ──────────────────────┐
│  frontmatter · auto-invocation · subagent · hooks · evals   │
├─ Content Layer ──── AI_PM_Skills Pattern ──────────────────┤
│  Core Goal → Trigger Gate → Failure Handling                │
│  → Quality Gate → Examples · context/domain.md              │
└─────────────────────────────────────────────────────────────┘

③ Data Flywheel — PM Tacit Knowledge That Accumulates

muse is the moat. It structures your operational judgment into TK (Tacit Knowledge) units, then injects them into agent instructions. The more you use it, the smarter your agents get — and that knowledge stays yours.

PM 판단 기록 → /extract → TK-NNN 구조화 → PM-ENGINE-MEMORY.md 축적
  → /tk-to-instruction → 에이전트 시스템 프롬프트 업데이트 → 반복

This creates switching cost: competitors can copy the framework, but they can't copy your accumulated TK.

④ Eval-Driven ROI — Proof, Not Promises

Every skill is measured. 10 quality tests with 54 assertions prove what skills add vs base Claude. Result:

With Skill Without Skill Delta
Pass Rate 100% 88% +12%

pm-framework without skill drops to 40%. cost-sim with skill adds +46.6% output. This is data-driven proof that the skills work.

⑤ Good/Bad Examples for Data-Driven Improvement

Every skill includes examples/good-01.md and examples/bad-01.md — concrete right/wrong output pairs. Plus references/test-cases.md with edge case tables. These aren't decorative; they're training signals that make skill quality measurable and continuously improvable.

⑥ Skills 2.0 Full Spec + Instant Onboarding

Built on Claude Code's latest platform spec: auto-invocation, context: fork, allowed-tools, model field, dynamic !command injection, marketplace, and eval system. New users start with the PM-ENGINE-MEMORY Starter Kit — 5 seed TK entries so the value is immediate, not "someday when I accumulate enough data."


Plugins — Full Skill List

1. oracle — What agent to build? (6 skills, 2 commands)
Skill What it does When to use
opp-tree Build an opportunity tree scored by repeat frequency, automation fit, and judgment dependency "We have 10 automation candidates — which one first?"
assumptions Extract riskiest assumptions across 4 axes (Value/Feasibility/Reliability/Ethics) and design 2-day validation experiments "What's the biggest risk before we start building?"
build-or-buy Score Build vs Buy vs No-code across 6 axes (differentiation, speed, cost, customization, maintenance, domain) "Should we use Intercom's bot or build our own agent?"
hitl Set automation levels (1-5) and escalation triggers via reversibility × error-impact matrix "Can the agent decide refunds, or must a human approve?"
cost-sim Simulate monthly costs at 1→10→100→1,000 users by model pricing × call patterns "Sonnet at 500 calls/day — what's the monthly bill?"
agent-gtm Score beachhead segments (5 criteria) + design Shadow→Co-pilot→Auto→Delegation trust sequence "How do we roll this agent out to B2B customers?"

Commands: /discover · /validate

2. atlas — How to architect it? (7 skills, 2 commands)
Skill What it does When to use
3-tier Design Prometheus (strategy) → Atlas (coordination) → Worker (execution) roles, comms, and delegation "I need 5 agents — who controls whom?"
orchestration Compare Sequential/Parallel/Router/Hierarchical patterns by latency, error rate, and cost "Should my doc pipeline run serial or parallel?"
biz-model Design per-use / subscription / outcome-based pricing + variable cost analysis targeting >70% margin "Per-API-call billing or monthly flat fee?"
router Auto-route tasks to T1-T4 models by complexity + fallback chains for 40-80% cost reduction "Simple FAQ → Haiku, complex analysis → Opus — auto?"
memory-arch Design Working/Episodic/Semantic/Procedural memory layers + token-budget-aware retrieval "How does today's session recall yesterday's context?"
moat Diagnose 6 moat types: data flywheel, workflow lock-in, network effects, switching costs, specialization, brand "A competitor ships a GPT clone — what's our defense?"
growth-loop Design usage→data→improvement→re-use loops + cold-start solutions + anti-loop identification "How do we make recommendations improve with every use?"

Commands: /architecture · /strategy-review

3. forge — How to spec and ship it? (12 skills, 3 commands)

Onboarding (1): claude-md
Core Spec (7): instruction · prd · prompt · ctx-budget · okr · stakeholder-map · agent-plan-review
Communication (4): gemini-image-flow · infographic-gif-creator · pptx-ai-slide · agent-demo-video

Skill What it does When to use
claude-md Scan project structure → auto-generate CLAUDE.md → recommend matching AI_PM_Skills plugins "New project — set up Claude Code context and find the right skills"
instruction Define Role/Context/Goal/Tools/Memory/Output/Failure with least-privilege tool access "What goes in (and out of) the system prompt?"
prd 7-section agent spec: Instruction/Tools/Memory/Triggers/Output/Failure with dual narrative (tech + biz) "I need a PRD that covers hallucination recovery and tool permissions"
prompt CRISP framework (Context/Role/Instruction/Scope/Parameters) + Why-First principle + 7 failure pattern avoidance "Longer prompts make my agent behave worse"
ctx-budget Estimate per-file token usage → classify Essential/Conditional/Excluded → 70% threshold alerts "How do I fit 5 RAG docs + chat history into 128K?"
okr Dual-axis OKRs: Business Impact + Operational Health with mandatory cost KR "Is 95% accuracy enough, or do I need cost metrics too?"
stakeholder-map Power-Interest matrix + blocker response strategies + internal champion cultivation "Legal is blocking the agent rollout — how do I get buy-in?"
agent-plan-review 4-axis review + failure mode matrix (5+ types) + Mermaid output "Find the holes in this design before we start coding"
gemini-image-flow End-to-end Gemini API image pipeline with model tier selection "Build a sketch→code pipeline"
infographic-gif-creator Convert architecture / workflow into HTML/CSS → GIF/MP4 animations "Show the multi-agent flow to execs"
pptx-ai-slide Story-driven slide decks (pitch / review / investor variants) "Board presentation — 10 slides max"
agent-demo-video Screen recordings + animations + narration via Remotion "Show non-technical stakeholders what the agent does"

Commands: /write-prd · /set-okr · /sprint

4. argus — How to measure and improve? (8 skills, 2 commands)
Skill What it does When to use
kpi Define 5-7 operational + business metrics with leading/lagging split "What goes on the agent dashboard?"
reliability Quantify P95/P99 worst cases + design safeguards + set SLA tiers "3 out of 100 responses hallucinate — acceptable?"
premortem Score 10-15 failure modes by Severity × Likelihood × Detection Difficulty "Give me a 'this must not break' list"
burn-rate Visualize token costs by model/task + spike detection + budget caps "Token costs jumped 40% — what caused it?"
north-star Select one metric via 5 criteria + set anti-metrics "Team doesn't know which KPI matters most"
agent-ab-test Calculate MDE + concurrent experiments + control for LLM nondeterminism "Prompt A vs B — real difference or noise?"
cohort Track performance by deployment cohort (4-week minimum, n≥100) "Did v2.1 actually improve over v2.0?"
incident Detect silent failures + triage + contain blast radius + 5 Whys "Agent silent for 30 min — no alerts fired"

Commands: /health-check · /cost-review

5. muse ⭐ — Turn PM tacit knowledge into agent assets (3 skills, 3 commands)
Skill What it does When to use
pm-framework Convert implicit judgment into TK-NNN units with activation/deactivation conditions + knowledge graph linking "3 years of agent ops experience is stuck in my head"
pm-decision Build a pattern library of recurring PM decisions with context, criteria, and known failures "I've seen this situation before — why did I decide that way?"
pm-engine Agents dynamically query TK knowledge graph at runtime + auto-extract 1 TK/day + auto-update instructions "I want my agents to leverage my operational know-how automatically"

Commands: /extract · /decide · /tk-to-instruction

Start with the PM-ENGINE-MEMORY Starter Kit — 5 seed TK entries to get going immediately.

The framework is open-source; your data (PM-ENGINE-MEMORY.md) is your own asset.


Installation

Option 1: GitHub Marketplace (Recommended)

/plugin marketplace add kimsanguine/AI_PM_Skills
/plugin install oracle@kimsanguine-AI_PM_Skills   # or atlas, forge, argus, muse

Option 2: Clone Locally

git clone https://github.com/kimsanguine/AI_PM_Skills.git
claude --plugin-dir ./AI_PM_Skills/oracle   # pick what you need

First time with Claude Code? → Start with forge/claude-md — it scans your project and recommends the right skills.
Not sure which agent to build yet? → Start with oracle.
Already know what to build? → Start with forge.

Other AI Tools

Tool Skills Commands How to use
Gemini CLI Copy to .gemini/skills/
Cursor Copy to .cursor/skills/
Codex CLI Copy to .codex/skills/
Kiro Copy to .kiro/skills/

📐 Architecture Deep-Dive — Two Layers, Skills 2.0, Trigger Gate, Commands

Auto-Invocation

You don't call skills by name. Describe your task in natural language, and Claude matches it against each SKILL.md's description field to auto-load the best fit. Trigger accuracy: 97.9% across 96 test queries.

Cross-Plugin Routing

The Trigger Gate's "Route" field enables routing between plugins:

From Trigger Condition Route To
opp-tree "Validate assumptions for top opportunity" assumptions
burn-rate "Need model routing change" router
prd "Need instruction design" instruction
pm-framework "Convert TK to agent instruction" pm-engine

Command Chaining

Command Chained Skills Plugin
/discover opp-tree → assumptions → build-or-buy oracle
/architecture orchestration → 3-tier → memory-arch atlas
/write-prd prd → instruction → ctx-budget forge
/health-check kpi → reliability → burn-rate argus
/tk-to-instruction pm-engine → instruction muse+forge

Skills 1.0 vs Skills 2.0

Feature 1.0 (2025) 2.0 (2026) AI_PM_Skills
Auto-invocation ✅ 97.9%
Subagent (context: fork) ✅ 5 skills
Tool restriction ✅ 3-tier
Marketplace + Evals ✅ Full
Dynamic injection ✅ 5 skills
Hooks ⚠️ Spec-ready

⚠️ hooks have a known issue (#17688). Fallback validate_*.sh scripts available in references/.

File Structure

AI_PM_Skills/
├── oracle/           # Discovery (6 skills, 2 commands)
├── atlas/            # Architecture (7 skills, 2 commands)
├── forge/            # Execution (12 skills, 3 commands)
├── argus/            # Monitoring (8 skills, 2 commands)
├── muse/             # Knowledge (3 skills, 3 commands)
├── evals/            # Quality + trigger evals
├── docs/images/      # Diagrams
├── validate_plugins.py
└── CONTRIBUTING.md

Skill Anatomy — What's Inside Each Skill

Every skill follows a consistent internal structure. This isn't just Skills 2.0 spec compliance — it's a content architecture designed for measurable quality and continuous improvement:

oracle/skills/opp-tree/           ← example skill
├── SKILL.md                      ← Core: frontmatter (name, description,
│                                    argument-hint, allowed-tools) +
│                                    Trigger Gate (Use/Route/Boundary) +
│                                    Failure Handling + Quality Gate
├── context/
│   └── domain.md                 ← Domain knowledge injected at runtime
│                                    (e.g., agent economics, industry benchmarks)
├── examples/
│   ├── good-01.md                ← ✅ Reference output — "this is what great looks like"
│   └── bad-01.md                 ← ❌ Anti-pattern — "this is what to avoid and why"
└── references/
    ├── test-cases.md             ← Edge cases, boundary conditions, eval criteria
    └── troubleshooting.md        ← Common failures + recovery patterns

Why this matters:

Component Purpose Impact
SKILL.md Trigger Gate Use/Route/Boundary → prevents wrong skill from firing 97.9% trigger accuracy
context/domain.md Domain expertise Claude doesn't have natively +12~46% output quality
examples/good-01.md Concrete "gold standard" output Anchors Claude's generation
examples/bad-01.md Explicit anti-patterns with explanations Prevents common failures
references/test-cases.md Edge cases + assertions Powers eval system (54 assertions)

This pattern repeats across all 36 skills — 130+ supporting files that make each skill measurable, testable, and improvable.

📐 Plugin Lifecycle Diagram

Agent Product Lifecycle


Contributing

See CONTRIBUTING.md for guidelines. New skills, improvements, and translations (EN↔KO) are all welcome.


Author

Sanguine Kim — 20-year PM veteran, AI Agent Builder & Educator

Built and scaled AI Dubbing and AI Avatar products, then led Agentic AI product development. Currently exploring the path of AI Agent PM educator — helping PMs navigate the shift from "using AI" to "building agents as products."

📬 For training, consulting, or workshop inquiries: [email protected]

If you're using this project for corporate training or educational content, I'd appreciate a quick note. Customized consulting and co-teaching are welcome.

  • References: Teresa Torres (Continuous Discovery Habits), Anthropic ("Building Effective Agents"), Steve Yegge (Gas Town parallel agent design), Byeonghyeok Kwak (MCP-Skills hierarchy), Michael Polanyi (The Tacit Dimension)

Related

Repo What Link
AI_PM Claude Code guide for PMs — learn the why and how github.com/kimsanguine/AI_PM
AI_PM_Skills Ready-to-use agent skillset — the tools (this repo) github.com/kimsanguine/AI_PM_Skills

AI_PM teaches the thinking. AI_PM_Skills gives you the tools.


License

MIT — LICENSE

Yorumlar (0)

Sonuc bulunamadi