Platform Skills

A production-grade field handbook for platform, DevOps, SRE, and cloud engineers covering Kubernetes, Flux CD, Terraform, GitHub Actions, AWS, OPA/Rego, KEDA, Karpenter, supply chain security, Falco, observability, and more. Use it on GitHub, as a local reference, or with Claude, Codex, Cursor, and Copilot for interactive guidance with blast radius, validation steps, and rollback plans built in.

Works With

Tool	What you get
Claude Code	40 slash commands (`/platform-skills:preflight`, `/platform-skills:kubernetes`, `/platform-skills:secrets`, and more), interactive guidance, automatic activation on relevant files
Codex	Skill invocation with `$platform-skills`, loaded on demand in any Codex session
Cursor	Project rules for Chat and Agent — platform review and generation in every file context
GitHub Copilot plugin	Interactive slash commands in Copilot Chat — install once per user via `copilot plugin install`
GitHub Copilot (team)	Chat instructions committed to your repo — available to your whole team without individual installs
GitHub (no AI tool)	Browse `references/` and `examples/` directly — a standalone field handbook

If this handbook saves you time, give it a star — it helps others find it.
Found a gap or a better pattern? Contributions are welcome — open an issue, improve a reference guide, or add an example.

Why Platform Skills

Platform teams keep rediscovering the same hard lessons: unclear ownership, unsafe IAM, weak Kubernetes defaults, drifting GitOps overlays, CI checks that run too late, and rollback plans that only appear after an incident. Platform Skills turns those lessons into reusable guidance for the tools engineers already use.

Use it when you need a second brain for production platform work:

Review a Terraform, Helm, Kubernetes, Flux, GitHub Actions, or AWS change before it merges
Generate platform assets with security, observability, validation, and rollback already considered
Debug incidents with evidence-first troubleshooting instead of guesswork
Give every developer the same platform engineering baseline in Claude, Codex, Cursor, and Copilot

Install In 60 Seconds

Clone once, then install the integration your team uses:

git clone https://github.com/nitinjain999/platform-skills.git
cd platform-skills

Tool	Best for	Quick install
Claude Code	Interactive plugin workflows and slash commands	`claude plugin marketplace add https://github.com/nitinjain999/platform-skills && claude plugin install platform-skills`
GitHub Copilot plugin	Copilot Chat — interactive slash commands	`copilot plugin marketplace add nitinjain999/platform-skills && copilot plugin install platform-skills@platform-skills`
Codex	Local skill invocation with `$platform-skills`	`./install.sh --codex`
Cursor	Project rules for Chat and Agent	`./install.sh --cursor --target ../your-project`
GitHub Copilot (team)	Team-wide chat instructions committed to the repo	`./install.sh --copilot --target ../your-project`
Everything	Local all-agent setup	`./install.sh --all --target ../your-project`

Need manual setup, global editor rules, or troubleshooting? See INSTALLATION.md.

Try It On Your Repo

See BEFORE_AFTER.md for side-by-side before/after examples across Kubernetes, Terraform, Flux CD, GitHub Actions, OPA/Rego, and PR triage. More copy-paste workflows in PROMPTS.md.

Use $platform-skills to review this Terraform change for IAM scope, replacement risk, validation, and rollback.

Review this Kubernetes Deployment for production readiness: securityContext, resources, probes, HPA, PDB, and NetworkPolicy.

My Flux Kustomization is stuck NotReady. Walk me from evidence to fix to rollback.

Generate a production-ready GitHub Actions workflow with OIDC, pinned actions, cache safety, and least privilege.

What is this?

This repository is a reference handbook for developers, DevOps engineers, SREs, cloud engineers, and platform teams. It is structured in independent layers:

Handbook — references/ and examples/ are the main product. Every domain has a deep-dive guide and working example assets you can copy directly into your project. Use it on GitHub, from a local clone, or as a team knowledge base.
Claude plugin — SKILL.md and .claude-plugin/marketplace.json add an optional routing layer so Claude surfaces the right section of the handbook when you ask platform engineering questions interactively.
Codex skill — the repo root is a self-contained skill folder: SKILL.md provides routing, agents/openai.yaml provides Codex UI metadata, and references/ plus examples/ are loaded on demand.
Cursor rules — .cursorrules and .cursor/rules/*.mdc give Cursor project-level and scoped file rules for platform engineering reviews and generation.
Copilot instructions — .github/copilot-instructions.md lets teams commit the baseline into application and platform repositories.

All layers work independently. Agent integrations are optional.

Navigate

I want to...	Go to
Get started in 5 minutes	QUICKSTART.md
Understand how AI agents and skills work	HOW_IT_WORKS.md
Full installation guide and troubleshooting	INSTALLATION.md
Read a domain guide	references/
Copy a working example	examples/
Copy prompts for Claude, Codex, Cursor, or Copilot	PROMPTS.md
Install as a Claude plugin	Installation
Install as a Codex skill	Installation
Add Cursor rules	Editor integrations
Learn how to use each slash command	COMMANDS.md
Set up VSCode, Copilot, or Cursor	EDITOR_INTEGRATIONS.md
Contribute a pattern	CONTRIBUTING.md

Domains

Domain	Reference guide	What it covers
Kubernetes	references/kubernetes.md	Cluster baseline, workload patterns, network policy, RBAC, pod security — `/platform-skills:kubernetes`
🛡️ Kyverno	references/kyverno.md	Validate/mutate/generate/verifyImages policies, Audit→Deny promotion, PolicyException, PolicyReport, kyverno-cli testing, PSP/Gatekeeper migration
OpenShift	references/openshift.md	SCC diagnosis, Routes TLS, OpenShift GitOps delivery, cluster upgrade validation — `/platform-skills:openshift`
Argo CD	references/argocd.md	App-of-apps design, ApplicationSet, sync control, promotion flows
Flux CD	references/fluxcd.md	Monorepo structure, reconciliation, multi-tenancy, image automation. Deep-dive guides: Sources, HelmRelease, Kustomization, Notifications, Operator, Security, Troubleshooting, Migration, ResourceSets, MCP, Terraform bootstrap — `/platform-skills:fluxcd`
AWS	references/aws.md	IAM least-privilege, IRSA, EKS, resource tagging, cost allocation
AWS CloudFront	references/aws-cloudfront.md	Distributions, OAC, cache policies, security headers, Lambda@Edge, CloudFront Functions, multi-account
AWS WAF	references/aws-waf.md	Web ACLs, managed rule groups, rate limiting, Bot Control, Firewall Manager, Shield Advanced
Azure	references/azure.md	Workload identity, AKS, RBAC, resource tagging, Azure Policy — `/platform-skills:azure`
Terraform	references/terraform.md	Module design, state management, testing, CI/CD integration
Checkov	references/checkov.md	Bootstrap, static + plan scanning, multi-cloud provider detection, fix mode, custom checks
🔍 Trivy	references/trivy.md	Image, fs, repo, SBOM, and cluster CVE scanning; severity gates; Trivy Operator via Flux
GitHub Actions	references/github-actions.md	Security hardening, OIDC, SHA pinning, reusable workflows — `/platform-skills:github-actions`
Composite GitHub Actions	references/composite-actions.md	Composite action scaffolding, review, hardening, testing, release, private repo access
🗺️ Platform model	references/platform-operating-model.md	Ownership boundaries, promotion flows, cross-tool design
🔐 Secrets	references/secrets.md	External Secrets Operator, Sealed Secrets, provider setup, troubleshooting — `/platform-skills:secrets`
Linkerd	references/linkerd.md	mTLS, proxy injection, AuthorizationPolicy, observability, multi-cluster
Linux & Networking	references/linux-networking.md	Linux admin, DNS, load balancing, VPC/VNet design, connectivity troubleshooting
🧠 Platform Mindset	references/platform-mindset.md	DevEx, friction audits, RFC/ADR, incident comms, post-mortems, capacity planning
🔒 Compliance	references/compliance.md	SOC 2 Trust Services Criteria in Terraform: IAM, encryption, detection, audit logging, backup, Checkov enforcement
Helm	references/helm.md	Chart scaffolding, values design, template patterns, security hardening, lint/validation pipeline, GitOps integration
🔌 MCP	references/mcp.md	Model Context Protocol server/client development, TypeScript and Python SDKs, stdio/SSE transports, security, testing
☁️ AWS MCP Profiles	references/aws-mcp-profiles.md	Multi-account AWS MCP server management — SSO, Granted, credential_process, profile discovery, VS Code and Claude Code config generation
Observability	references/observability.md	Structured logging, Prometheus metrics, OpenTelemetry tracing, Grafana dashboards, alerting rules, k6 load testing, capacity planning
📝 Documentation	references/documentation.md	Docstrings (Google/NumPy/JSDoc), OpenAPI 3.1 specs, doc sites (MkDocs/TypeDoc), developer guides
Datadog	references/datadog.md	Agent Helm setup, APM instrumentation, log management, monitors/dashboards/SLOs as Terraform, pup CLI, Datadog Labs skills
🤖 LLM Observability	references/llm-observability.md	Datadog LLMObs instrumentation (Python/Node.js), eval bootstrap, trace RCA, experiment analysis
Dynatrace	references/dynatrace.md	OneAgent Kubernetes Operator, custom metrics, SLOs, dashboards and alerting via Terraform provider
Conventional Commits	references/conventional-commits.md	Message structure, type classification, atomic staging, commitlint/husky/semantic-release tooling
📋 OPA / Conftest	references/opa.md	Rego v1 syntax, rule types, unit tests, fmt/regal/verify validation pipeline, GitHub Actions integration
🔍 PR Review	references/pr-review.md	Cost impact, environment drift, ownership gaps, SOC 2 compliance, deprecated API / version hygiene, rollback feasibility
🧵 PR Comment Triage	commands/triage.md	`/platform-skills:triage` classifies PR comments, applies valid fixes, replies, and resolves review threads
⚡ KEDA	references/keda.md	ScaledObject, ScaledJob, TriggerAuthentication, Prometheus/SQS/Kafka/Redis/Cron/HTTP/Azure scalers, scale-to-zero, IRSA, GitOps integration, troubleshooting — `/platform-skills:keda`
⚙️ Karpenter	references/karpenter.md	EKS node autoscaling — NodePool, EC2NodeClass, NodeClaim, Spot diversity, disruption budgets, ODCR, private clusters, Fargate coexistence, FinOps, CA migration, v0→v1 upgrades — `/platform-skills:karpenter`
🤖 Agent Self-Improvement	references/agent-self-improve.md	`.learnings/` directory setup, LRN/ERR/FEAT entry lifecycle, WAL protocol, working buffer, VFM scoring, ADL decision logic, Six Operating Pillars, heartbeat, reverse prompting, proactive agent behavior — `/platform-skills:self-improve`
🔗 Supply Chain Security	references/supply-chain.md	Cosign keyless signing, Syft SBOM generation and attestation, Trivy/Grype CVE scanning with severity gates, SLSA Level 2 provenance, Kyverno ImageValidatingPolicy enforcement — `/platform-skills:supply-chain`
🦅 Runtime Security	references/runtime-security.md	Falco eBPF deployment on EKS/GKE, custom rule authoring, Falcosidekick alert routing, rule debugging, bridging Falco signals to Kyverno admission enforcement — `/platform-skills:runtime-security`
💥 Chaos Engineering	references/chaos.md	Litmus Chaos v3 and Chaos Mesh v2 fault injection, steady-state hypothesis (httpProbe/promProbe), blast radius scoping, GameDay workflow, recurring schedules, DORA feedback loop — `/platform-skills:chaos`
📊 DORA Metrics	references/dora.md	Deployment Frequency, Lead Time, Change Failure Rate, MTTR — GitHub Actions + Prometheus Pushgateway instrumentation, recording rules, Grafana dashboards, SaaS decision matrix, anti-pattern detection — `/platform-skills:dora`
✨ Awesome Docs	references/awesome-docs.md	Animated GitHub-safe Markdown document generation — any doc type (README, architecture guide, runbook, tutorial, API reference, RFC, post-mortem, or custom), 4 SVG patterns, convert existing docs, diff for staleness, audit quality, local preview, multi-platform export — `/platform-skills:awesome-docs`
🔄 Renovate	references/renovate.md	Dependency update automation — scan repo and generate renovate.json per ecosystem, private registry auth (ECR/GCR/ACR/Harbor/Helm OCI), custom regex managers for internal GitHub modules and private Terraform registries, pre-commit hook, GitHub Actions validation workflow — `/platform-skills:renovate`
🤖 Setup Agents	references/setup-agents.md	Multi-agent AI scaffold for any repo — ranked scan, interview-driven generate/upgrade/add/review modes, GitHub Copilot, Claude Code, Cursor, Codex, and Windsurf configs — `/platform-skills:setup-agents`

Core principles

Every pattern in this handbook follows the same ground rules:

Production-first — patterns are battle-tested, not theoretical
Root-cause over symptom — troubleshooting works backwards from evidence to fix
Explicit blast radius — every risky operation documents scope and rollback
Security by default — least-privilege IAM, restricted pod security, SHA-pinned actions
Rollback plans are mandatory — if you cannot safely undo it, the guide is incomplete

Troubleshooting structure

Every troubleshooting section in the handbook follows this consistent framework — from quick diagnosis to safe resolution:

Step	What it answers
Symptom	Exact error and observable behavior
Evidence	Commands to run: logs, events, status
Hypothesis	Most likely root cause
Diagnosis	Commands that confirm or rule out the hypothesis
Fix	Specific change with justification
Validation	Post-fix verification steps
Prevention	How to avoid it next time
Rollback	Safe undo path if the fix makes things worse

Installation

Browse on GitHub

No installation needed. Navigate directly:

examples/ — copy-paste examples for all domains
references/ — deep-dive domain guides
SKILL.md — core patterns and routing logic

Clone for local templates

git clone https://github.com/nitinjain999/platform-skills.git
cd platform-skills

# Copy examples directly into your project
cp -r examples/flux/basic-monorepo/*          your-gitops-repo/
cp -r examples/terraform/eks-cluster/*         your-terraform-modules/
cp    examples/kubernetes/deployment-baseline.yaml  your-k8s-manifests/

Install as a Claude plugin

The plugin adds interactive guidance on top of the handbook. Claude will reference the right section automatically when you ask platform engineering questions in your editor, terminal, or browser.

From marketplace:

claude plugin marketplace add https://github.com/nitinjain999/platform-skills
claude plugin install platform-skills

From local clone (for customisation):

git clone https://github.com/nitinjain999/platform-skills.git
cd platform-skills
claude plugin install .

Upgrade to latest version:

claude plugins marketplace update platform-skills
claude plugins remove platform-skills
claude plugins install platform-skills

Install as a Codex skill

Codex discovers skills from the local skills directory. Clone this repository as the skill folder so SKILL.md, agents/openai.yaml, references/, and examples/ stay together:

mkdir -p "${CODEX_HOME:-$HOME/.codex}/skills"
git clone https://github.com/nitinjain999/platform-skills.git "${CODEX_HOME:-$HOME/.codex}/skills/platform-skills"

Then ask Codex naturally:

Use $platform-skills to review this Terraform change for ownership, blast radius, validation, and rollback.

Install as a GitHub Copilot plugin

Install from the Copilot plugin marketplace to get platform-skills guidance in GitHub Copilot Chat:

copilot plugin marketplace add nitinjain999/platform-skills
copilot plugin install platform-skills@platform-skills

Verify:

copilot plugin list
# platform-skills  enabled

Upgrade:

copilot plugin uninstall platform-skills
copilot plugin install platform-skills@platform-skills

Install Cursor rules

Copy the Cursor-native rules into your project so every developer gets the same platform guidance in Cursor Chat and Agent:

cp platform-skills/.cursorrules your-project/.cursorrules
mkdir -p your-project/.cursor/rules
cp platform-skills/.cursor/rules/*.mdc your-project/.cursor/rules/

For VSCode, Copilot, Cursor, and JetBrains setup — project level and global level — see EDITOR_INTEGRATIONS.md.

Repository structure

platform-skills/
├── references/                         # Deep-dive guides — one per domain
│   ├── platform-operating-model.md
│   ├── kubernetes.md
│   ├── kyverno.md                      # Kyverno admission policies (v1.11.0)
│   ├── openshift.md
│   ├── argocd.md
│   ├── flux.md
│   ├── aws.md
│   ├── azure.md
│   ├── terraform.md
│   ├── github-actions.md
│   ├── secrets.md
│   ├── linkerd.md
│   ├── linux-networking.md
│   ├── platform-mindset.md
│   ├── compliance.md                   # SOC 2 controls in Terraform (v1.6.0)
│   ├── helm.md                         # Helm chart patterns, lint pipeline, values design
│   ├── pr-review.md                    # PR review: cost, drift, ownership, compliance, upgrade, rollback (v1.12.0)
│   ├── keda.md                         # KEDA event-driven autoscaling (v1.14.0)
│   ├── karpenter.md                    # Karpenter EKS node autoscaling (v1.29.0)
│   ├── llm-observability.md                # Datadog LLMObs: instrumentation, evals, trace RCA (v1.20.0)
│   └── awesome-docs.md                     # Animated SVG doc generation — 4 patterns, GitHub-safe CSS (v1.21.0)
│
├── examples/                           # Working examples and handbook snippets
│   ├── flux/basic-monorepo/            # Complete Flux CD monorepo structure
│   ├── kubernetes/                     # Namespace, deployment, network policy, PDB
│   ├── kyverno/                        # ValidatingPolicy, GeneratingPolicy examples + kyverno-cli test manifest (v1.11.0)
│   ├── openshift/                      # Route, ResourceQuota, LimitRange
│   ├── argocd/app-of-apps/             # Root application manifest
│   ├── aws/iam/                        # Least-privilege IAM policy examples
│   ├── azure/workload-identity/        # Managed identity + federated credential
│   ├── terraform/eks-cluster/          # Production EKS Terraform module
│   ├── github-actions/                 # CI/CD, Flux sync, container build workflows
│   ├── helm/web-service/               # Production Helm chart: Deployment, HPA, PDB, NetworkPolicy, schema
│   ├── triage/                         # PR comment triage scenarios and fixtures (v1.13.0)
│   ├── keda/                           # ScaledObject, ScaledJob, TriggerAuthentication examples (v1.14.0)
│   ├── awesome-docs/                       # Animated SVG templates: arch-flow, lifecycle-loop, field-carousel, timeline-phases (v1.21.0)
│   └── compliance/                     # SOC 2 Terraform examples (v1.6.0)
│       ├── checkov-config.yaml         # Checkov config grouped by SOC 2 criterion
│       ├── .pre-commit-checkov.yaml    # Pre-commit hook template
│       ├── checkov-terraform-plan.sh   # Plan-mode scan script
│       └── custom-checks/             # Custom check scaffold
│       ├── iam/                        # CC6.1/CC6.2: IAM, IRSA, OIDC, SCPs
│       ├── logging/                    # CC7.2: CloudTrail, Config, VPC flow logs
│       ├── network/                    # CC6.6: WAF, security groups, flow logs
│       ├── encryption-data-services/   # CC6.7: DynamoDB, ECR, ElastiCache, OpenSearch, Kinesis, EFS, Redshift
│       ├── vulnerability/              # CC6.8: Inspector v2, ECR scanning, SSM patching
│       ├── detection/                  # CC7.1: GuardDuty, CIS CloudWatch alarms, Security Hub
│       ├── incident-response/          # CC7.3: SNS, EventBridge, PagerDuty
│       └── backup/                     # A1.2/A1.3: Backup Plan, vault lock, cross-region DR
│
├── SKILL.md                            # Agent skill routing and patterns
├── agents/openai.yaml                  # Codex skill UI metadata
├── .cursorrules                        # Cursor project-level rules
├── .cursor/rules/                      # Cursor scoped file rules
├── .claude-plugin/marketplace.json     # Marketplace metadata
├── .github/workflows/                  # Validation and release automation
├── tests/validate-skill.sh             # Skill structure consistency checks
└── renovate.json                       # Automated dependency updates

Roadmap

Current release: v1.36.0 — 40 commands, 40 domain reference guides, 50+ wiki pages.

Full version history is in CHANGELOG.md.

Planned

GCP: landing zone, GKE, Workload Identity, and IAM patterns
Istio: traffic management, mTLS, telemetry (counterpart to Linkerd domain)
SOC 2 for Kubernetes: Kyverno policies mapped to TSC criteria, pod security admission, kube-bench CIS Benchmark integration
OpenShift operator lifecycle: OLM, CatalogSource, operator upgrade patterns
Argo CD ApplicationSet fleet patterns: cluster generators, matrix strategies, progressive rollout
Multi-cloud networking: Transit Gateway, VNet peering, PrivateLink, cross-cloud DNS

Contributing

See CONTRIBUTING.md for how to propose new patterns, the development workflow, and release guidelines.

Related resources

Sponsor

If Platform Skills saves you time, consider sponsoring to help keep it maintained and growing.

Every sponsor directly supports new domains, pattern updates, and the time spent validating every example in real environments.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Nitin Jain}
💻 📖 🚧

_geetika-sv
💻 📖

This project follows the all-contributors specification. Contributions of any kind welcome!

Star History

License

Apache-2.0. See LICENSE for the full text and NOTICE for attribution.

If you create derivative works based on this project, retain the Apache 2.0 license text, existing copyright and attribution notices, and clearly mark any files you changed.

platform-skills