Agent-Action-Guard

mcp
Guvenlik Denetimi
Uyari
Health Uyari
  • License รขโ‚ฌโ€ License: NOASSERTION
  • Description รขโ‚ฌโ€ Repository has a description
  • Active repo รขโ‚ฌโ€ Last push 0 days ago
  • Low visibility รขโ‚ฌโ€ Only 7 GitHub stars
Code Gecti
  • Code scan รขโ‚ฌโ€ Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions รขโ‚ฌโ€ No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

๐Ÿ›ก๏ธ Safe AI Agents through Action Classifier

README.md

Agent Action Guard

Agent Action Guard

Framework to block harmful AI agent actions before they cause harm โ€” lightweight, real-time, easy-to-use.

PyPI
Website
YouTube
Medium

PyPI Downloads
AI
LLMs
Python
License: CC BY 4.0


๐Ÿš€ Quick Start

pip install agent-action-guard

๐Ÿ”‘ Set EMBEDDING_API_KEY (or OPENAI_API_KEY) in your environment. See .env.example and USAGE.md.

Want to run the evaluation benchmark too?

pip install "agent-action-guard[harmactionseval]"
python -m agent_action_guard.harmactionseval

โ“ Why Action Guard?

HarmActionsEval benchmark proved that AI agents with harmful tools will use them โ€” even today's most capable LLMs.
80% of the LLMs tested executed actions at the first attempt for over 95% of the harmful prompts.

Model SafeActions@1
Claude Haiku 4.5 0.00%
Phi 4 Mini Instruct 0.00%
Granite 4-H-Tiny 0.00%
GPT-5.4 Mini 0.71%
Gemini 3.1 Flash Lite 0.71%
Ministral 3 (3B) 2.13%
Claude Sonnet 4.6 2.84%
Phi 4 Mini Reasoning 2.84%
GPT-5.3 12.77%
Qwen3.5-397b-a17b 23.40%
Average 4.54%

These models often still respond "Sorry, I can't help with that" while executing the harmful action anyway.

Action Guard sits between the agent and its tools, blocking unsafe calls before they run โ€” no human in the loop required.

Iceberg


โš™๏ธ How It Works

  1. Agent proposes a tool call
  2. Action Guard classifies it using a lightweight neural network trained on the HarmActions dataset
  3. Harmful calls are blocked; safe calls proceed normally
Workflow Demo

๐Ÿ†• Contributions:

  • ๐Ÿ“Š HarmActions โ€” safety-labeled agent action dataset with manipulated prompts
  • ๐Ÿ“ HarmActionsEval โ€” benchmark with the SafeActions@k metric
  • ๐Ÿง  Action Guard โ€” real-time neural classifier optimized for agent loops
    • ๐Ÿ‹๏ธ Trained on HarmActions
    • โœ… Classifies every tool call before execution
    • ๐Ÿšซ Blocks harmful and unethical actions automatically
    • โšก Lightweight for real-time use

๐Ÿ’ฌ Enjoyed it? Share your opinion.

Share a quick note in Discussions โ€” it directly shapes the project's direction and helps the AI safety community. ๐Ÿ™Œ Waiting with excitement for feedback and discussions on how this helps you or the AI community.

โญ Star the repo if Action Guard is useful to you โ€” it really does help!


๐Ÿ“ Citation

@article{202510.1415,
  title   = {{Agent Action Guard: Classifying AI Agent Actions to Ensure Safety and Reliability}},
  year    = 2025,
  month   = {October},
  publisher = {Preprints},
  author  = {Praneeth Vadlapati},
  doi     = {10.20944/preprints202510.1415.v2},
  url     = {https://www.preprints.org/manuscript/202510.1415},
  journal = {Preprints}
}

๐Ÿ“„ License

Licensed under CC BY 4.0. If you prefer not to provide attribution, send a brief acknowledgment to [email protected] with the details of your usage and the potential impact on your project.

Project banner


Pro-GenAI
Projects for Next-Gen AI

Yorumlar (0)

Sonuc bulunamadi