harness-eval-lab

agent
Guvenlik Denetimi
Uyari
Health Uyari
  • License — License: Apache-2.0
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 6 GitHub stars
Code Gecti
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions — No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

Evaluate AI code agent setup. Scores the full configuration (CLAUDE.md, skills, commands, hooks, agents). Available as a CLI and a Claude Code plugin

README.md

setup-eval

CI
PyPI
Python 3.11+
License: Apache 2.0

Evaluate AI code agent setups for best practices, redundancy, security, and cross-component issues.

Available as a CLI tool, a Claude Code plugin, and Cursor commands.

Supports Claude Code and Cursor projects. Auto-detects which tool(s) a project uses.

What it does

Most tools test whether a skill produces correct output. This tool checks the setup itself: CLAUDE.md, skills, commands, hooks, MCP configs, agents, .cursor/rules/*.mdc, .cursorrules.

Four commands, same engine:

Command What it does LLM in CLI LLM in Claude Code / Cursor
setup-eval-lint 43 deterministic rules + system analysis (token budget, trigger overlaps, dependencies). Fast, CI-suitable. No No
setup-eval-review Per-component rubric review with 0-3 scoring per dimension, 21 cross-type checks. KEEP/REVIEW/REMOVE verdicts. Yes (API key) Yes (in-session)
setup-eval-security All security rules + YARA + CVE lookups + semantic review. SAFE/CAUTION/UNSAFE. Scan: no. Semantic review: --review flag Yes (in-session)
eval-skill Deep-evaluate one skill individually and in context of the full setup. Lint: no. Rubric: --rubric flag Yes (in-session)

Install

CLI tool

Install from PyPI and run from the terminal:

pip install setup-eval

setup-eval setup-eval-lint .
setup-eval setup-eval-lint . --watch     # re-run lint automatically on file changes
setup-eval setup-eval-review . --provider gemini
setup-eval setup-eval-security . --review
setup-eval eval-skill ./skills/my-skill --context . --rubric

Requires GEMINI_API_KEY or ANTHROPIC_API_KEY for review/security/skill commands.

setup-eval-security supports optional YARA malware signature scanning. To enable it: pip install setup-eval[yara]

Claude Code plugin

No pip install needed. Install directly from within Claude Code:

/plugin marketplace add redhat-community-ai-tools/harness-eval-lab
/plugin install setup-eval@setup-eval
/reload-plugins

The 4 commands appear in the / menu:

  • /setup-eval:setup-eval-lint
  • /setup-eval:setup-eval-review
  • /setup-eval:setup-eval-security
  • /setup-eval:eval-skill

No API key needed. Claude evaluates in-session.

Updating: Re-run the install command to get the latest rules.

Cursor commands

Requires the CLI tool installed first (Cursor commands call it for the deterministic scan):

pip install setup-eval

Then copy .cursor/commands/ from this repo into your project. The 4 commands appear in Cursor's command palette:

  • /setup-eval-lint
  • /setup-eval-review
  • /setup-eval-security
  • /eval-skill

No API key needed for review/security/skill. Cursor evaluates in-session.

Inspection Rules (43)

Category Rules What they check
Structural 1 SKILL.md exists
Frontmatter 3 Description required/quality, format valid
Content 4 Duplicate detection (TF-IDF), broken references, circular references, token budget
Security 9 Credential access, prompt injection (17 patterns), data exfiltration, obfuscation, reverse shells, AST analysis, taint tracking, MCP least-privilege, tool poisoning
Security (opt-in) 2 YARA signatures, CVE lookups via OSV.dev
Commands 8 Description, script exists, duplicates, credentials, injection, skill overlap, shadows built-in, references nonexistent skill
CLAUDE.md 3 Exists, skill duplication, generic advice detection
Hooks 1 Structure validation, dangerous patterns, network access
Agents 9 Description, skills exist, tool format, constraint matching, credentials, injection, exfiltration, obfuscation, reverse shells

Four presets: recommended (default), strict, security, pre-workflow.

Contributing

See CONTRIBUTING.md for adding rules and submitting PRs.

Changelog

See CHANGELOG.md for release history.

Future Plans

See future-plans/ for planned improvements (SARIF output, security benchmarks, runner abstraction, dynamic workflows, impact measurement).

Yorumlar (0)

Sonuc bulunamadi