governed-agentic-itsm-blueprint
Health Pass
- License — License: Apache-2.0
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 22 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
Vendor-neutral blueprint for governed agentic AI in ITSM: tool contracts, OPA policies, governance schemas, architecture diagrams, and maturity model
Governed Agentic AI for ITSM — A Practical Blueprint
Start with the control plane, not the agent. The agent is the easy part.
Status: v0.2 — reference blueprint / starter kit. This is a working collection of schemas, policies, diagrams, and examples. It is not a runnable demo stack or production-ready framework. See What this repo is / is not.
A vendor-neutral reference architecture and ready-to-use engineering artefacts for shipping governed agentic AI in IT Service Management — without joining the 40% of agentic AI projects Gartner predicts will be cancelled by 2027.
Who this is for
- ITSM architects and leads designing agentic automation programmes
- Platform engineers building tool contracts and policy gates for AI agents
- Engineering managers evaluating governance requirements before enabling autonomous remediation
If you're looking for a vendor-specific implementation guide (ServiceNow, JSM, BMC), this isn't it — but the artefacts here work with any of those platforms.
What this repo is / is not
| This repo IS | This repo IS NOT |
|---|---|
| A reference architecture with diagrams | A runnable demo or deployed stack |
| Typed tool contracts with safety metadata | A framework or SDK |
| Executable OPA/Rego policies you can test | A complete policy library for all scenarios |
| Example incidents and evidence bundles | Production-ready configurations |
| A maturity model tying autonomy to controls | A certification or compliance checklist |
What's inside
| Artefact | Path | Purpose |
|---|---|---|
| Reference architecture | diagrams/reference-architecture.mermaid |
Agents, control plane, tool layer, evidence store |
| Auto-remediation pipeline | diagrams/auto-remediation-pipeline.mermaid |
Detect → triage → plan → risk-score → approve → execute → validate → rollback |
| Governance layer stack | diagrams/governance-layers.mermaid |
ISO 42001 → NIST AI RMF → EU AI Act → runtime enforcement |
| Workflow declaration | examples/workflow.yaml |
Autonomy boundaries, guardrails, evidence requirements |
| Tool contracts (8) | schemas/tool-*.json |
Typed schemas with safety metadata for restart, certificate renewal, drain, rollback, validation, notification, change record |
| Governance schemas (3) | schemas/*-schema.json |
Evidence bundle, policy decision, validation result |
| Tool contract template | schemas/tool-schema-template.json |
Starting point for your own tool contracts |
| OPA/Rego policies (8) | policies/ |
Main guardrails + 7 focused modules: prohibited tools, risk, blast radius, dry-run, change window, environment, budgets |
| Change risk prompt | examples/change-risk-prompt.md |
Structured prompt template for LLM-driven change risk assessment |
| Example evidence bundle | examples/evidence/ |
Complete evidence package for a cert-renewal auto-remediation |
| Test incidents (3) | examples/test-incidents/ |
Low, medium, high-risk scenarios for policy testing |
| Maturity model | MATURITY.md |
L0–L4 progression from manual ITSM to governed autonomy, with repo artefact mapping |
| CI validation | .github/workflows/validate.yml |
JSON/YAML/Rego syntax, workflow schema, evidence structure, and semantic decision assertions |
Architecture overview
flowchart TB
subgraph DP[Data plane]
TRIAGE[Triage Agent] --> DIAG[Diagnostics Agent] --> PLAN[Remediation Planner]
PLAN --> RUN[Tool Runner] --> MCP[MCP Tool Servers] --> SYS[Enterprise Systems]
SYS --> VAL[Validation] -->|pass| CLOSE[Resolve]
VAL -->|fail| ROLL[Rollback]
end
subgraph CP[Control plane]
POL[Policy Engine] --> APP[Human Approval]
POL --> AUD[Audit Log]
POL --> EVAL[Eval Harness]
end
PLAN --> POL
POL -->|allow| RUN
style CP fill:#e8f0fe,stroke:#2e75b6
style DP fill:#f5f5f5,stroke:#999
Full Mermaid diagrams with complete detail are in diagrams/. GitHub renders .mermaid files natively.
Quick start
1. Review the workflow declaration
examples/workflow.yaml defines autonomy boundaries for a starter scope (certificate expiry, service restart, DNS misconfiguration). Adapt scope.services and scope.allowed_categories to your environment.
2. Define your tool contracts
Use schemas/tool-restart-service.json as a template. For each tool your agents can invoke:
- Define typed
input_schemaandoutput_schema - Add
safetymetadata:idempotent,supports_dry_run,max_calls_per_incident,rollback_tool - Set
dry_run: trueas the default
The repo includes 8 tool contracts covering the most common ITSM operations. Use schemas/tool-schema-template.json to add your own.
3. Test policies against sample incidents
# Install OPA (https://www.openpolicyagent.org/docs/latest/#running-opa)
# macOS:
brew install opa
# Linux:
curl -L -o opa https://openpolicyagent.org/downloads/v1.4.2/opa_linux_amd64_static
chmod 755 opa && sudo mv opa /usr/local/bin/
# Test: low-risk incident (should auto-approve)
opa eval \
--data policies/ \
--input examples/test-incidents/low-risk-cert-expiry.json \
"data.itsm.guardrails.decision"
# Test: medium-risk incident (should require approval)
opa eval \
--data policies/ \
--input examples/test-incidents/medium-risk-multi-service.json \
"data.itsm.guardrails.decision"
# Test: high-risk incident (should require approval — contains "rollback_deployment" tool)
opa eval \
--data policies/ \
--input examples/test-incidents/high-risk-deploy.json \
"data.itsm.guardrails.decision"
4. Calibrate thresholds with offline replay
Before enabling execution in production:
- Collect 2–4 weeks of historical incidents for your target categories
- Run each through the policy engine with the artefacts in this repo
- Measure: approval rate, false-approval rate, missed-automation rate
- Adjust
risk_scorethresholds and allowlists based on the data
5. Instrument with OpenTelemetry
Propagate traceparent through every tool call (the tool contracts include a traceparent field for this). Emit stage-level metrics: triage_duration, policy_decision, tool_execution_time, validation_result, rollback_count.
Policy pack
The policies/ directory contains a main guardrails policy and 7 focused modules:
| Module | File | What it enforces |
|---|---|---|
| Main guardrails | guardrails.rego |
Primary decision policy; aggregates violations from all modules when loaded with --data policies/; includes fallback logic for standalone use |
| Prohibited tools | deny_prohibited_tools.rego |
Hard-block on delete_data, disable_audit, mass_restart |
| Risk threshold | require_approval_by_risk.rego |
Approval if risk_score >= 0.60 |
| Blast radius | require_approval_by_blast_radius.rego |
Approval if services_affected > 1 |
| Dry-run enforcement | enforce_dry_run.rego |
Verify dry-run precedes live execution |
| Change window | enforce_change_window.rego |
Block/escalate outside allowed windows |
| Environment exclusions | enforce_environment_exclusions.rego |
Hard-block on excluded environments |
| Tool call budget | enforce_tool_call_budget.rego |
Cap tool calls, actions, and runtime per incident |
Governance mapping
| Framework | What it gives you | Where it shows up in this repo |
|---|---|---|
| ISO/IEC 42001 | AI management system: roles, lifecycle, continual improvement | Workflow declaration (lifecycle), evidence bundles (documentation) |
| NIST AI RMF | Risk vocabulary: reliability, safety, transparency, controllability | Policy engine (risk scoring), validation contracts (reliability) |
| EU AI Act | Legal requirements: traceability, oversight, penalties | Audit log (traceability), human approval gates (oversight) |
See MATURITY.md for how governance requirements scale with autonomy level.
Repo structure
governed-agentic-itsm-blueprint/
├── README.md
├── MATURITY.md
├── CONTRIBUTING.md
├── SECURITY.md
├── CHANGELOG.md
├── LICENSE (Apache 2.0)
├── .gitignore
├── .github/
│ ├── workflows/
│ │ └── validate.yml (CI: JSON + YAML + Rego validation)
│ └── ISSUE_TEMPLATE/
│ └── adaptation-report.md
├── diagrams/
│ ├── reference-architecture.mermaid
│ ├── auto-remediation-pipeline.mermaid
│ └── governance-layers.mermaid
├── schemas/
│ ├── evidence-bundle.schema.json
│ ├── policy-decision.schema.json
│ ├── validation-result.schema.json
│ ├── tool-restart-service.json
│ ├── tool-restart-service-rollback.json
│ ├── tool-validate-health.json
│ ├── tool-renew-certificate.json
│ ├── tool-drain-connections.json
│ ├── tool-rollback-deployment.json
│ ├── tool-notify-stakeholders.json
│ ├── tool-open-change-record.json
│ └── tool-schema-template.json
├── policies/
│ ├── guardrails.rego
│ ├── deny_prohibited_tools.rego
│ ├── require_approval_by_risk.rego
│ ├── require_approval_by_blast_radius.rego
│ ├── enforce_dry_run.rego
│ ├── enforce_change_window.rego
│ ├── enforce_environment_exclusions.rego
│ └── enforce_tool_call_budget.rego
└── examples/
├── workflow.yaml
├── change-risk-prompt.md
├── evidence/
│ └── example-evidence-bundle.json
└── test-incidents/
├── low-risk-cert-expiry.json
├── medium-risk-multi-service.json
└── high-risk-deploy.json
Contributing
See CONTRIBUTING.md for guidelines. The most valuable contributions are adaptation reports — real-world feedback on what worked and what didn't.
Related work
- Model Context Protocol (MCP) Specification
- Agent2Agent Protocol (A2A)
- Open Policy Agent (OPA)
- OpenTelemetry
- ISO/IEC 42001 Explained
- NIST AI RMF 1.0
- EU AI Act — Entry into force
License
Apache 2.0 — see LICENSE.
Author: Denis Prilepskiy — Agentic AI architect specialising in production-grade multi-agent systems for regulated industries. Senior Enterprise Architect at NTT Data (London). Published in HackerNoon (Top Story) and MIPT Digital (Habr), with IEEE and HBR submissions under review.
HackerNoon @denisp · LinkedIn
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found