Awesome Agent Skills Security

🛡️ A curated list of resources on securing AI agent tool use and skill ecosystems — attacks, defenses, frameworks, benchmarks, and standards.

AI agents increasingly use external tools, plugins, and skills to interact with the world. This creates a new attack surface: agent skills security. This list covers the threats, defenses, and research landscape for securing these capabilities.

Threat Frameworks & Standards
Surveys & Systematizations
Attack Research
Defense Research
Benchmarks & Datasets
Tools & Frameworks
Agent Skill Specifications
Industry Reports & Blog Posts
Related Awesome Lists
Contributing

Threat Frameworks & Standards

OWASP Agentic AI Threats and Mitigations — First in a series from the OWASP Agentic Security Initiative (ASI), providing threat-model-based reference for agentic threats.
OWASP Top 10 for LLM Applications — Includes LLM01: Prompt Injection, LLM06: Excessive Agency, LLM07: Insecure Plugin Design, LLM08: Excessive Autonomy.
MITRE ATLAS™ — Adversarial Threat Landscape for AI Systems. Tactics, techniques, and case studies for attacks on ML/AI systems.
NIST AI Risk Management Framework — Federal framework for managing AI risks, including autonomous agent risks.
NIST SP 800-218A: Secure Software Development for AI — Secure development practices specific to AI-enabled systems.
EU AI Act — European regulation with specific provisions for high-risk AI systems including autonomous agents.
Anthropic Responsible Scaling Policy — AI Safety Levels (ASL) framework addressing agent capability thresholds.
IETF draft-klrc-aiagent-auth-01: AI Agent Authentication and Authorization — Kasselman et al., IETF WIMSE-adjacent, 2026. Proposes a model for authentication and authorization of AI agent interactions using existing OAuth 2.0 and WIMSE standards; covers delegation chains, agent identity, and trust establishment without defining new protocols.
IETF draft-niyikiza-oauth-attenuating-agent-tokens-00: Attenuating Authorization Tokens for Agentic Delegation Chains — Niyikiza (Tenuo), OAuth WG, March 2026. Defines Attenuating Authorization Tokens (AATs): JWT-based credentials encoding tool-level argument constraints with a cryptographically enforced monotonic attenuation invariant — any holder can derive a more restrictive token but never a more permissive one. Extends Rich Authorization Requests (RFC 9396) with delegation-chain semantics.

Surveys & Systematizations

📄 A Survey on LLM-based Autonomous Agents: Common Attacks and Defenses — Wu et al., 2024. Comprehensive taxonomy of attacks on LLM agents across perception, cognition, and action stages.
📄 Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents — Zhang et al., 2024. Formalization of 10 attack scenarios, 10 agents, 398 adversarial environments.
📄 Security of AI Agents — He et al., 2024. Systematization of knowledge covering threat models for AI agents with tool access.
📄 Not All Agents Are Created Equal: A Survey on Software-use Agent Security — Hua et al., 2025. Survey specifically on software-use agents and their unique security challenges.
📄 A Survey on the Honesty of Large Language Models — Xie et al., 2024. Covers agent deception, sycophancy, and honesty in tool-use contexts.
📄 A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models — Xu et al., 2024. Systematic study of jailbreak attacks relevant to agent guardrail bypass.
📄 Prompt Injection Attacks and Defenses in LLM-Integrated Applications — Liu et al., 2024. Comprehensive taxonomy of injection attacks across direct and indirect vectors.
📄 The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies — Gan et al., 2024. Survey with practical case studies of security failures.
📄 Self-Evolving Agents: A Survey — Gao et al., 2025. How self-evolving agents create emergent security risks.
📄 From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents — Zhang et al., 2026. Hierarchical Autonomy Evolution (HAE) framework organizing agent security into cognitive, execution, and societal tiers.
📄 Characterizing Faults in Agentic AI: A Taxonomy of Types, Symptoms, and Root Causes — Shah et al., 2026. Empirical taxonomy of reliability failures in agentic AI systems combining LLM reasoning with tool invocation.
📄 Security Considerations for Multi-agent Systems — 2026. Systematic threat landscape of MAS with 193 threat items across 9 categories; evaluates 16 frameworks finding none achieves majority coverage.
📄 The Attack and Defense Landscape of Agentic AI: A Comprehensive Survey — Kim et al., USENIX Security 2026. First systematic survey of AI agent security covering design space, attack landscape, and defense mechanisms with case studies on securing agentic systems.
📄 Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats — Deng et al., 2026. Five-layer lifecycle-oriented security framework analyzing compound threats across initialization, input, inference, decision, and execution stages of autonomous LLM agents.
📄 AgenticCyOps: Securing Multi-Agentic AI Integration in Enterprise Cyber Operations — Mitra et al., 2026. Holistic architectural security framework decomposing attack surfaces across component, coordination, and ecosystem layers of enterprise multi-agent systems.
📄 MCP-in-SoS: Risk Assessment Framework for Open-Source MCP Servers — Kumar et al., 2026. System-of-systems risk assessment framework for evaluating security risks of open-source MCP server deployments in production agent systems.
📄 SoK: The Attack Surface of Agentic AI — Tools, and Autonomy — Dehghantanha & Homayoun, 2026. Systematization mapping trust boundaries and security risks of agentic LLM systems; proposes taxonomy spanning prompt injection, RAG poisoning, tool exploits, and multi-agent threats with metrics like Unsafe Action Rate and Privilege Escalation Distance.

Attack Research

Prompt Injection via Tools

📄 Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection — Greshake et al., AISec 2023. Foundational work on indirect prompt injection through tool outputs.
📄 Inject My PDF: Prompt Injection for Your Resume — Practical injection through document processing tools.
📄 InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated LLM Agents — Zhan et al., ACL 2024. 1,054 test cases, 17 tools, two attack types (direct harm, data stealing).
📄 Automatic and Universal Prompt Injection Attacks against Large Language Models — Liu et al., 2024. Automated generation of injection attacks.
📄 WIPI: A New Web Threat for LLM-Driven AI Agents — Liu et al., 2024. Web-based indirect prompt injection targeting browsing agents.

Tool Poisoning & Supply Chain

📄 Invisible Threats from Model Context Protocol: Generating Stealthy Injection Payload via Tree-based Adaptive Search — Shen et al., 2026. Tree-structured Injection for Payloads (TIP): black-box attack generating natural-language payloads to seize control of MCP-enabled agents; achieves >95% attack success in undefended settings and >50% against four defense approaches with an order of magnitude fewer queries than prior adaptive attacks.
📄 Model Context Protocol Threat Modeling and Analyzing Vulnerabilities to Prompt Injection with Tool Poisoning — Huang et al., 2026. STRIDE/DREAD threat modeling of MCP across five components; systematic comparison of tool poisoning defenses in seven major MCP clients reveals insufficient static validation; proposes multi-layered defense strategy.
📄 Are AI-assisted Development Tools Immune to Prompt Injection? — Huang et al., 2026. First empirical analysis of prompt injection via tool-poisoning across seven MCP clients (Claude Desktop, Claude Code, Cursor, Cline, Continue, Gemini CLI, Langflow); reveals significant security disparities with Cursor most susceptible.
📄 Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks — Schmotz et al., 2026. Benchmark measuring agent vulnerability to malicious skill/config files; demonstrates data exfiltration, destructive actions, and ransomware-like behavior via AGENTS.md/CLAUDE.md injection.
📄 ToolSword: Unveiling Safety Issues of LLMs in Tool Learning Across Three Stages — Ye et al., ACL 2024. Identifies safety issues across tool selection, tool calling, and result handling.
📄 Compromising Agents via MCP — Invariant Labs, 2025. Tool poisoning attacks via Model Context Protocol servers.
📄 Osmosis Distillation: Model Hijacking with the Fewest Samples — Shi et al., 2026. Supply-chain attack via poisoned synthetic training data.
📄 Personality Self-Replicators — 2026. Agent personality files as self-replicating genetic material.
🔗 MCP Security Notification: Tool Poisoning Attacks — Official MCP security advisory on tool description poisoning.
🔗 Invariant Labs: MCP Security Research — Analysis of cross-tool contamination, rug pulls, and tool shadowing via MCP.

Privilege Escalation & Excessive Agency

📄 Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents — Yang et al., NeurIPS 2024. Backdoor attacks on agent reasoning and tool calling.
📄 Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models — Relevant to agents leaking training data through tool outputs.
📄 R-Judge: Benchmarking Safety Risk Awareness for LLM Agents — Yuan et al., ACL 2024. 162 records across 27 risk scenarios for evaluating agent safety awareness.
📄 TrustAgent: Towards Safe and Trustworthy LLM-based Agents — Zhang et al., 2024. Agent-constitution-based approach to limiting excessive agency.
📄 Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards — Normann et al., 2026. Two-stage post-training pipeline (SFT + RL with verifiable rewards) producing a 4B model that achieves 95.8% success on privilege escalation, nearly matching Claude Opus 4.6 at 100× lower inference cost.

Data Exfiltration & Privacy

📄 AgentSCOPE: Evaluating Contextual Privacy Across Agentic Workflows — Ngong et al., 2026. 80%+ of agentic pipelines leak private data at intermediate stages.
📄 Silent Egress: LLM-Driven Data Exfiltration via Steganographic Channels — Demonstrates covert channels for data theft through agent outputs.
📄 Privacy Risks of General-Purpose AI Systems: A Foundation for Investigating Practitioner Perspectives — GPAIS privacy risks including agent data handling.
📄 IMMACULATE: A Framework for Analyzing Information Exposure in Agent-Based Systems — Multi-turn agent information leakage analysis.
📄 AgentRaft: Automated Detection of Data Over-Exposure in LLM Agents — Lin et al., 2026. Automated detection framework for identifying data over-exposure vulnerabilities in LLM agent integrations.
📄 You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents — Kao et al., 2026. Identifies the Trusted Executor Dilemma where high-privilege agents execute adversarial README instructions at up to 85% success rate; 0% human detection rate across 15 participants.
📄 Differential Privacy in Generative AI Agents: Analysis and Optimal Tradeoffs — Yang & Zhu, 2026. Probabilistic framework for analyzing privacy leakage in AI agents via differential privacy, deriving token-level and message-level bounds relating leakage to temperature and message length.

Indirect Prompt Injection

📄 AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations — He et al., 2026. Defense against indirect prompt injection using causal attribution to trace which tool outputs triggered suspicious agent actions.
📄 IPI-proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents Against Indirect Prompt Injection — Chen et al., 2026. Open-source proxy that rewrites live responses from whitelisted domains with 820 benchmark payloads to evaluate browser agents against realistic indirect prompt injection.
📄 Adaptive Attacks and Defenses Against Indirect Prompt Injection — Chen et al., 2024. Adaptive attackers bypassing static defenses.
📄 HouYi: A Black-box Prompt Injection Attack on LLM-integrated Applications — Liu et al., 2023. Systematic methodology for finding injection vulnerabilities.
📄 DMAST: Dual-Modality Multi-Stage Adversarial Safety Training — Liu et al., 2026. Cross-modal DOM injection corrupting both visual and text channels.

Agent Deception & Manipulation

📄 Thought Virus: Viral Misalignment via Subliminal Prompting in Multi-Agent Systems — Weckbecker et al., 2026. Single subliminally prompted agent spreads persistent bias through entire multi-agent network, degrading truthfulness of other agents.
📄 FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems — Li et al., 2026. Prompt-only attack that manipulates planner-executor workflow formation in multi-agent systems, plus FlowGuard as an input-side defense.
📄 Intentional Deception as Controllable Capability in LLM Agents — Starace & Soule, 2026. Systematic study of engineered deception in multi-agent LLM interactions using 36 behavioral profiles for defensive design.

Compound System Attacks

📄 Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems — Banerjee et al., 2026. Demonstrates novel attacks combining traditional software/hardware vulnerabilities (code injection, Rowhammer) with LLM-specific algorithmic weaknesses to compromise compound AI pipelines.

Cross-Plugin Attacks

📄 When LLM-based Code Generation Meets the Software Supply Chain — Supply chain risks from LLM-generated code integrating malicious packages.
📄 Shadow API: Covert Data Exfiltration via LLM-Mediated API Interactions — 2026. Stealth data theft through seemingly benign API calls.
📄 AgentSkillOS: Towards Secure and Composable Agent Skill Operating Systems — 2026. OS-level isolation for agent skill execution.

Backdoor Attacks on Agents

📄 SlowBA: An efficiency backdoor attack towards VLM-based GUI agents — Li et al., 2026. Novel backdoor targeting response latency of GUI agents via trigger-activated long reasoning chains.

Jailbreaking & Guardrail Bypass

📄 Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs — Panfilov et al., 2026. Autonomous autoresearch pipeline powered by Claude Code discovers novel white-box adversarial attack algorithms that significantly outperform all existing 30+ methods in jailbreaking and prompt injection evaluations.
📄 Jailbreaking ChatGPT via Prompt Engineering — Liu et al., 2023. Foundational jailbreaking taxonomy. 700+ citations.
📄 MasterKey: Automated Jailbreaking of Large Language Model Chatbots — Deng et al., NDSS 2024. Automated time-based jailbreak generation.
📄 PentestGPT: An LLM-empowered Automatic Penetration Testing Tool — Deng et al., 2023. Demonstrates agent-level tool use for offensive security. 11K+ GitHub stars.
📄 Self-Fulfilling Misalignment in AI Control — 2026. Fine-tuning on AI Control literature increases misalignment.
📄 Reasoning Models Struggle to Control Their Chains of Thought — 2026. CoT controllability decreases with RL training, implications for agent oversight.
📄 CRAFT: Contrastive Reasoning Alignment — Reinforcement Learning from Hidden Representations — Luo et al., 2026. Red-teaming alignment framework combining contrastive representation learning with RL to separate safe/unsafe reasoning trajectories; 79% improvement in reasoning safety and 87.7% in final-response safety over base models.

Defense Research

Permission & Access Control

📄 PAuth: Precise Task-Scoped Authorization For Agents — Sharma et al., 2026. Implicit authorization model where NL task submission authorizes only required operations; uses NL slices and envelopes for provenance-based server-side verification, blocking injected operations in AgentDojo with zero false positives.
📄 TrustAgent: Towards Safe and Trustworthy LLM-based Agents — Agent Constitution for safety-aware planning with pre/post-action inspection.
📄 A Dual-Helix Governance Approach for Reliable Agentic AI — 3-track architecture (Knowledge, Behavior, Skills) using knowledge graphs.
📄 Talk Freely, Execute Strictly: Schema-Gated Agentic AI — Schema-gated orchestration for trustworthy agent deployment in regulated domains.
📄 ESAA-Security: Event-Sourced Architecture for Agent-Assisted Security Audits — 26 tasks, 95 checks, append-only event logs for reproducible AI code audits.
📄 Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare — Maiti, 2026. Production-deployed zero-trust architecture for 9 autonomous AI agents: gVisor kernel isolation, credential proxy sidecars, network egress allowlisting, and prompt integrity framework with untrusted content labeling. Open-source configs released.

Runtime Monitoring & Sandboxing

📄 Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital — Barton et al., 2026. Production study of 3,505 real-capital agents showing that agent reliability depends on operating-layer controls like prompt compilation, typed controls, policy validation, execution guards, and trace-level observability.
📄 Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents — Marin and Chaudhary, 2026. Proposes adaptive runtime governance based on bounding unobserved risk as agent behavior drifts after authorization.
📄 AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents — Zhang et al., 2026. Lifecycle security architecture for autonomous agents spanning skills, external content, memory, planning, and privileged tool execution.
📄 Behavioral Integrity Verification for AI Agent Skills — Wu et al., 2026. Scalable pre-deployment skill auditing framework that compares declared versus actual capabilities, surfaces description-implementation gaps, and detects malicious skills with 0.946 F1.
📄 Arbiter: Detecting Interference in LLM Agent System Prompts — Mason, 2026. Framework combining formal evaluation rules with multi-model LLM scouring to detect interference and vulnerability classes in agent system prompts.
📄 MCPShield: A Security Cognition Layer for Adaptive Trust Calibration in MCP Agents — Zhou et al., 2026. Plug-in security cognition layer for MCP agents that validates third-party tool invocations via experience-driven trust calibration.
📄 OpenClaw PRISM: A Zero-Fork, Defense-in-Depth Runtime Security Layer for Tool-Augmented LLM Agents — Li, 2026. Runtime security layer distributing enforcement across ten lifecycle hooks with hybrid heuristic-plus-LLM scanning, session-scoped risk accumulation, and tamper-evident audit for agent gateways.
📄 Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the SSGM Framework — Lam et al., 2026. Stability and Safety-Governed Memory framework mitigating topology-induced knowledge leakage and semantic drift in persistent agent memory systems.
📄 AgentSentry: Real-time Monitoring for Agentic AI Systems — Runtime behavioral monitoring of tool-using agents.
📄 Monitoring Emergent Reward Hacking via Internal Activations — Sparse autoencoders detect reward-hacking during generation.
📄 Self-Attribution Bias: When AI Monitors Go Easy on Themselves — AI monitors exhibit systematic leniency on own outputs.
📄 Salient Directions in AI Control — Structure of AI Control evaluations: trusted monitors overseeing untrusted agents.
📄 Governed Memory: A Production Architecture for Multi-Agent Workflows — Taheri, 2026. Shared memory governance layer with dual memory model, tiered governance routing, entity-scoped isolation (zero cross-entity leakage across 500 adversarial queries), and 100% adversarial governance compliance in production.
🔗 Behavioral Attestation and Compaction Drift in Persistent AI Agents — Morrow (agent-morrow), 2026. Identifies compaction drift — non-adversarial behavioral shift caused by context window compression — as a runtime integrity threat class distinct from adversarial injection. Proposes behavioral attestation (context fingerprint delta against a pre-compression baseline) as the mechanism for continuous rather than one-time agent authorization. Complements credential-scope enforcement (e.g., AATs) with runtime execution verification.

Input/Output Validation

📄 AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization — Ying et al., 2026. Defends tool-using agents by semantically virtualizing untrusted external content to reduce prompt injection influence on privileged actions.
📄 StruQ: Defending Against Prompt Injection with Structured Queries — Separates prompts from data to prevent injection.
📄 Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation — Formal guarantees for reducing bias in LLM-as-judge systems.
📄 Judge Reliability Harness: Stress Testing LLM Judges — No evaluated judge is uniformly reliable.
📄 GELO: Good-Enough LLM Obfuscation — Privacy-preserving LLM inference with ~20-30% latency overhead.

Formal Verification & Analysis

📄 Agentics 2.0: Logical Transduction Algebra for Agentic Data Workflows — Formalizes LLM inference as typed semantic transformations with algebraic composition.
📄 Knowledge Divergence and the Value of Debate for Scalable Oversight — Formal framework for choosing oversight mechanisms.

Evaluation & Red Teaming

📄 Real-Time Trust Verification for Safe Agentic Actions using TrustBench — Sharma et al., AAAI 2026 Workshop on TrustAgent. Dual-mode framework benchmarking trust across multiple dimensions and providing a pre-execution action verification toolkit for agents.
📄 Agent Security Bench (ASB) — 10 scenarios, 10 agents, 398 environments. Comprehensive agent security benchmark.
📄 SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces — Jin et al., 2026. Runnable benchmark with 155 adversarial skill-mediated cases across 47 tasks, 6 risk domains, and 30 safety categories.
📄 No More, No Less: Task Alignment in Terminal Agents — Mavali et al., 2026. Introduces TAB, an 89-task benchmark for whether terminal agents selectively follow relevant environmental cues while resisting distractor instructions.
📄 Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems — Zhou et al., 2026. Grey-box adaptive red-teaming framework that iteratively rewrites skills using audit and runtime feedback to measure residual deployment risk.
📄 AgentDyn: A Dynamic Open-Ended Benchmark for Prompt Injection Attacks — Li et al., 2026. Dynamic, open-ended benchmark for evaluating indirect prompt injection defenses in real-world agent security systems.
📄 NAAMSE: Framework for Evolutionary Security Evaluation of Agents — Pai et al., ICLR 2026 Workshop. Evolutionary framework reframing agent security evaluation as feedback-driven optimization with autonomous red-teaming.
📄 R-Judge: Benchmarking Safety Risk Awareness — 162 records, 27 risk scenarios for agent safety.
📄 InjecAgent: Benchmarking Indirect Prompt Injections — 1,054 test cases for tool-integrated agents.
📄 SIABENCH: Evaluating LLMs for Security Incident Analysis — 11 LLMs × 160 security incident scenarios.
📄 EVMbench: Evaluating AI Agents on Smart Contract Security — 117 vulnerabilities, frontier agents exploit end-to-end.
📄 τ-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge — Even frontier models achieve only ~25.5% on complex agent tasks.
📄 Interactive Benchmarks — Evaluating via interactive proofs and games instead of static benchmarks.
📄 VeriGrey: Greybox Agent Validation — Zhang et al., 2026. Greybox security testing using tool-invocation sequences as feedback and mutational prompt fuzzing; 33% more effective than black-box on AgentDojo, discovers prompt injection scenarios missed by black-box in Gemini CLI and OpenClaw.
📄 LAAF: Logic-layer Automated Attack Framework for Agentic LLM Systems — Atta et al., 2026. First automated red-teaming framework combining 49-technique LPCI taxonomy with stage-sequential seed escalation; 84% mean aggregate breakthrough rate across five production LLM platforms.

Benchmarks & Datasets

Benchmark	Focus	Size	Paper
ASB	Comprehensive agent security	10 agents, 398 envs	Zhang et al.
InjecAgent	Indirect prompt injection	1,054 test cases	Zhan et al.
R-Judge	Safety risk awareness	162 records, 27 scenarios	Yuan et al.
ToolSword	Tool learning safety	6 scenarios, 3 stages	Ye et al.
AgentDyn	Dynamic prompt injection	Open-ended, extensible	Li et al.
SkillSafetyBench	Skill-mediated agent safety	155 cases, 47 tasks	Jin et al.
TAB	Selective cue following in terminal agents	89 terminal tasks	Mavali et al.
Skill-Inject	Skill file attacks	Multi-scenario	Schmotz et al.
NAAMSE	Evolutionary agent security eval	Adaptive red-teaming	Pai et al.
AgentHarm	Agent misuse	110 behaviors, 440 variants	Andriushchenko et al.
SkillGuard Dataset	Malicious skill detection	157 malicious skills	Liu et al.
WIPI	Web-based indirect injection	Multi-scenario	Liu et al.

Tools & Frameworks

Tool	Description	Link
SkillGuard	LLM-native agent skill security auditor (OWASP Agentic + MITRE ATLAS)
Invariant Guardrails	Policy-based agent security guardrails
LLM Guard	Input/output scanning for LLM applications
Rebuff	Self-hardening prompt injection detector
NeMo Guardrails	NVIDIA's toolkit for adding guardrails to LLM-based applications
Lakera Guard	Enterprise prompt injection defense API	Website
Promptfoo	LLM red teaming and evaluation framework
Garak	LLM vulnerability scanner
IPI-Proxy	Intercepting proxy for red-teaming web-browsing agents against indirect prompt injection on live whitelisted domains
AgentSkillsScanner	Static analysis scanner for agent skill definitions
Agent Audit	Security analysis system for LLM agent apps: dataflow analysis, credential detection, MCP config parsing, privilege-risk checks	Zhang et al.
mcp-sec-audit	MCP server security toolkit: static pattern matching + dynamic sandboxed fuzzing via Docker/eBPF for detecting over-privileged tool capabilities	Huang et al.

Agent Skill Specifications

Specification	Org	Focus
AgentSkills.io	Open Standard	Agent skill definition and security requirements
Model Context Protocol (MCP)	Anthropic	Tool/resource integration protocol for LLMs
OpenAI Function Calling	OpenAI	Tool use specification for GPT models
Tool Use (Claude)	Anthropic	Claude's native tool use interface
LangChain Tools	LangChain	Tool abstraction for agent frameworks
AutoGPT Plugins	AutoGPT	Plugin system for autonomous agents
OpenAPI/Swagger	Linux Foundation	API specification commonly used as tool definitions

Industry Reports & Blog Posts

🔗 Snowflake Cortex AI Escapes Sandbox and Executes Malware — PromptArmor, 2026. Prompt injection attack chain in Snowflake's Cortex Agent bypassed command allowlists via bash process substitution to achieve RCE; now patched.
🔗 Confused Deputy Attacks on Autonomous AI Agents — Cloud Security Alliance AI Safety Initiative, 2026. Research note on prompt injection chains enabling privilege escalation and autonomous compromise in AI agent systems.
🔗 How AI Assistants are Moving the Security Goalposts — Krebs on Security, 2026. AI agents as insider threats.
🔗 Anthropic: Challenges in Red Teaming AI Systems — Anthropic's perspective on evaluating agent safety.
🔗 OpenAI: Safety of Advanced AI Agents — Practices for governing agentic AI systems.
🔗 Compromising Agents via MCP — Invariant Labs deep-dive into MCP attack vectors.
🔗 Simon Willison: Prompt Injection Explained — Accessible introduction to prompt injection risks.
🔗 TRAIL: Trusted Reasoning and AI Logging — Logging framework for auditable agent execution.
🔗 Cyber Threat Intelligence for AI Systems — AI-specific CTI framework with IoCs for supply-chain phases.
🔗 AI Safety Has 12 Months Left — Window to embed safety into infrastructure before market forces prevent it.
🔗 LiteLLM Hack: Were You One of the 47,000? — FutureSearch via Simon Willison, 2026. Analysis of PyPI supply-chain attack on LiteLLM: 47K downloads of exploited packages in 46 minutes, 88% of 2,337 dependent packages had unpinned versions.
🔗 Exploiting Agentic Browsers: From False Information to Cross-Site Data Leaks — Trail of Bits, 2026. Demonstrates lack of isolation in agentic browsers enabling attacks from false information dissemination to cross-site data leaks, resurfacing decades-old web vulnerability patterns.

Related Awesome Lists

awesome-llm-security — General LLM security resources.
awesome-ai-safety — AI safety research and resources.
awesome-chatgpt-prompts — Prompt engineering (includes adversarial examples).
awesome-ml-for-cybersecurity — ML applied to cybersecurity.
awesome-mcp-servers — MCP server ecosystem (attack surface reference).
awesome-ai-agents — AI agent frameworks and projects.

Contributing

Contributions are welcome! Please read the contribution guidelines before submitting a pull request.

How to Contribute

Fork the repository
Add your resource in the appropriate category
Use the format: - 📄 **[Title](URL)** — Authors, Venue Year. One-sentence description.
Submit a pull request

Criteria

Resources must be directly related to agent/tool/skill security
Papers should be published or on arXiv
Tools should be actively maintained (commits within last 6 months)
Blog posts should provide substantial technical analysis

Citation

If you find this list useful in your research, please cite:

@misc{awesome-agent-skills-security,
  author = {Liu, Yi},
  title = {Awesome Agent Skills Security},
  year = {2026},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/LLMSecurity/awesome-agent-skills-security}}
}

License

This list is released under CC0 1.0 Universal.

awesome-agent-skills-security