AI-skills-bank

agent
Security Audit
Fail
Health Warn
  • No license — Repository has no license file
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Fail
  • spawnSync — Synchronous process spawning in bin/skills-bank.js
  • process.env — Environment variable access in bin/skills-bank.js
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

AI Skills Bank is a unified, multi-tool platform designed to aggregate, manage, and route AI skills across various workflows and AI assistants (such as Antigravity, Claude Code, Cursor, and Copilot).

README.md

skills-bank

High-performance skill aggregation, classification & routing platform for AI agents.

Rust
License
CLI


� Prerequisites

  • Rust 1.70+ (Install)
  • Git (for repository cloning)
  • ~2GB disk space (for aggregated skills cache)

�📖 Overview

skills-bank aggregates skills (workflows, tasks, specialized agents) from 100+ distributed repositories and provides a unified routing system for AI agents to discover, load, and invoke them efficiently.

Core Design Principles

  • Source-of-Truth Loading: Agents load canonical SKILL.md files directly from source repositories, not from catalogs. This eliminates hallucination risks and optimizes token usage.
  • Hybrid Classification: A dual-stage pipeline combines fast keyword rules (Step A) with LLM-powered semantic classification (Step B) to route skills into 12 domain hubs and 40+ sub-hubs.
  • Smart Deduplication: Skills are deduplicated by name OR description — catching both exact collisions and cross-repo clones with different names but identical content.
  • Multi-Tool Support: Skills sync to major AI tools including GitHub Copilot, Claude-code, free-code (claude-code), Hermes, Cursor, Gemini, Antigravity, OpenCode, Codex, and Windsurf.
  • Token Efficiency: Load minimal metadata first, then source files on-demand—not batch-loading entire catalogs.

🚀 Quick Start

1. Build the CLI

cd skills-bank/
cargo build --release
cargo run --release -- aggregate

2. Run the Full Pipeline

# Interactive setup (first run)
cargo run --release

# Or run all steps in sequence
cargo run --release -- run

Example Workflows

First-time setup:

cargo run --release -- setup
cargo run --release -- run

Validate before production sync:

cargo run --release -- doctor
cargo run --release -- release-gate
cargo run --release -- sync

Launches an interactive wizard to configure:

  • Where skills should be synced (global, workspace, or both)
  • Which AI tools to sync to
  • Repository URLs to clone and aggregate
  • Excluded categories

🎮 Commands Reference

Core Pipeline Commands

Command Purpose When to Use
aggregate Collect, deduplicate, classify, and route skills from configured repositories to skills-aggregated/ First run or when repositories change
sync Distribute aggregated skills to configured AI tool directories After aggregation completes
run Execute the full pipeline (aggregate → sync) in sequence Daily updates or automated workflows
setup Configure sync targets, repositories, and exclusions interactively Initial setup only
add-repo <URL> Add a new skill repository to the configuration When onboarding new sources
doctor Validate installation and report repository state Troubleshooting or pre-cleanup inspection
release-gate Validate aggregation output integrity Before releases or production sync
cleanup-legacy-duplicates Remove legacy repository folders from src/ or repos/ (only if matching lib/ exists) Migration from older versions

📁 Project Structure

Source Code & Configuration

  • src/ — Rust source code: TUI, fetcher, aggregator, sync engine, classification logic
  • Cargo.toml — Rust manifest (dependencies, metadata, build targets)
  • .skills-bank-cli-config.json — User configuration file (generated by setup, contains sync targets and repository URLs)
  • .env-example — Environment variable template

Generated Outputs (After Aggregation)

  • skills-aggregated/ — Single source of truth containing:
    • routing.csv — Skill-to-hub/sub-hub routing table
    • subhub-index.json — Hub and sub-hub registry
    • hub-manifests.csv — Master index of all skills
    • .skill-lock.json — Aggregation metadata and timestamps
    • Per-hub directories with skills-manifest.json files

Repository Cache

  • lib/ — Canonical cache for cloned skill repositories (populated by aggregate command)

Testing & Documentation

  • tests/ — Integration test suite for pipeline and TUI
  • archive/ — Legacy PowerShell scripts (original PoC phase)
  • package.json — Node.js manifest for npx distribution
  • readme.md — This file

📁 Repository Management

Cloning & Caching

Cache Location: lib/ (not src/) — This is the canonical directory for all cloned repositories.

Clone Strategy:

  • First clone: Shallow clone with git clone --depth 1 --single-branch --no-tags (faster, smaller disk footprint)
  • Subsequent runs: git pull in existing directories (avoid re-cloning)
  • Deduplication: Normalized remote URLs and repository names prevent duplicate clones

Speed Optimization:

  • Parallel cloning via configurable PARALLEL_JOBS
  • Shallow clones reduce disk I/O by ~80% vs. full clones
  • Incremental updates via git pull

Legacy Repository Cleanup

If you have repositories in older locations (src/ or repos/), migrate them:

# Inspect current state
cargo run --release -- doctor

# Remove legacy folders (safe: only deletes if matching lib/ exists and Git remote matches)
cargo run --release -- cleanup-legacy-duplicates

⚠️ Warning: This is destructive. Always run doctor first to inspect repository state.

⚙️ Output Files & Configuration

Generated during aggregation into skills-aggregated/:

File Purpose
routing.csv Skill-to-hub/sub-hub mappings (name, hub, sub-hub, src_path)
subhub-index.json Complete hub and sub-hub registry
hub-manifests.csv Master index of all skills across all hubs
.skill-lock.json Aggregation metadata (timestamps, repo revisions, dedup stats)
[hub]/[sub-hub]/skills-manifest.json Per-sub-hub skill metadata and LLM classification triggers

These files are used by agents and the TUI for discovery and routing.


🌐 Environment Variables

Copy .env-example to .env to override defaults:

cp .env-example .env

See .env-example for all available options.


🎯 Tool Integration Targets

Sync skills to any of these destinations:

Tool Project Global
Claude .claude/skills/ ~/.claude/skills/
free-code (claude-code) .free-code-config/skills/ ~/.free-code-config/skills/
Hermes .hermes/skills/ ~/.hermes/skills/
Code (Codex) .agents/skills/ ~/.agents/skills/
GitHub Copilot .github/skills/ ~/.copilot/skills/
Cursor .cursor/skills/ ~/.cursor/skills/
Gemini .gemini/skills/ ~/.gemini/skills/
Antigravity .agent/skills/ ~/.gemini/antigravity/skills/
OpenCode .opencode/skills/ ~/.config/opencode/skills/
Windsurf .windsurf/skills/ ~/.codeium/windsurf/skills/

🏗️ Classification Architecture

The aggregation pipeline processes 8000+ SKILL.md files through a multi-stage classification system:

 SKILL.md files (8000+)
        │
        ▼
 ┌──────────────┐
 │  YAML Parse   │  Extract name, description, triggers
 └──────┬───────┘
        │
        ▼
 ┌──────────────┐
 │  Keyword      │  Fast token-based routing to hub/sub-hub
 │  Rules        │  (fallback if LLM unavailable)
 └──────┬───────┘
        │
        ▼
 ┌──────────────┐
 │  Dedup        │  Name OR Description HashSet
 │  (two-key)    │  Catches cross-repo clones
 └──────┬───────┘
        │
        ▼
 ┌──────────────────────────────────┐
 │  Hybrid Exclusion + LLM Classify │
 │  Step A: Keyword pre-filter      │
 │  Step B: LLM semantic classify   │
 │         (can return "excluded")  │
 └──────┬───────────────────────────┘
        │
        ▼
 ┌──────────────┐
 │  Output       │  routing.csv, per-hub manifests,
 │  Artifacts    │  skills-index.json
 └──────────────┘

🔍 Classification Improvements (v2.0+)

The keyword-based classification system includes three critical enhancements to eliminate false negatives and resolve sub-hub conflicts:

1. Repository Name Extraction (Substring Matching)

Problem: Repository names like mukul975-anthropic-cybersecurity-skills were not being matched because the system used exact token matching (e.g., only matching the token "security", not the full repo name).

Solution: Introduced infer_hub_from_repo_name() function that:

  • Extracts the repository directory name from the path (the segment right after lib/ or src/)
  • Uses substring matching to catch domain signals (e.g., "cybersecurity-skills" → matches "security")
  • Runs before other inference logic (highest priority)

Confidence Score: 98% (near-deterministic, reflects author intent)

2. Sub-Hub Conflict Resolution

Problem: When a skill matched multiple sub-hubs (e.g., python AND security simultaneously), language hubs often won due to their anchor keywords, defeating domain-specialist classification.

Solution: Introduced conflict resolution table (CONFLICT_RESOLUTION) that:

  • Defines precedence rules when multiple sub-hubs match: (losing_hub, losing_sub_hub, winning_hub, winning_sub_hub)
  • Ensures domain specialists always win over languages:
    • security > python | javascript | typescript | rust | golang | java
    • testing-qa > python | javascript | typescript | rust
    • code-review > python | javascript
  • Applied in resolve_conflict() function when multiple candidates score within 5 points of the top score
  • Fallback: hub priority ordering if no explicit rule applies

3. Confidence Boost for Path-Based Inference

Problem: Repository name signals (inferred from path) were scored 95%, allowing lower-confidence LLM results (80%) to potentially override them.

Solution: Raised the confidence score for path-based inference from 95 → 98%

  • Score 98 is now treated as near-deterministic (same tier as explicit canonicalize_assignment logic at 100)
  • Only scores ≥ 100 can override it
  • Prevents low-confidence LLM results from contradicting repository metadata

📊 Example Classification Flow

For a skill in lib/mukul975-anthropic-cybersecurity-skills/:

1. apply_rules() called
   ↓
2. canonicalize_assignment() → no match (0% confidence)
   ↓
3. infer_from_path() called
   ├─ infer_hub_from_repo_name() extracts "mukul975-anthropic-cybersecurity-skills"
   ├─ Finds substring match: "cybersecurity"
   └─ Returns ("code-quality", "security") with 98% confidence
   ↓
4. ✓ Final assignment: code-quality / security
   ✗ LLM classification skipped (98% > 80% threshold)

🔧 Troubleshooting

Issue: Skills not aggregating or taking too long

Check repository state:

cargo run --release -- doctor

This validates all repositories, checks Git remotes, and reports cache status.

Increase parallelism:

export PARALLEL_JOBS=16
cargo run --release -- aggregate

Issue: Sync failing with "junction or symlink" errors

Cause: Existing junctions in sync target directories.

Solution: The sync command automatically skips existing junctions. If conflicts persist:

# Inspect sync targets
dir ~/.claude/skills  # Windows
ls ~/.claude/skills   # macOS/Linux

# Remove conflicting junctions/symlinks manually
rmdir /s ~/.claude/skills\[hub-name]  # Windows
rm -rf ~/.claude/skills/[hub-name]    # macOS/Linux

# Retry sync
cargo run --release -- sync

Issue: "Release gate" validation fails

Check output integrity:

cargo run --release -- release-gate

This validates:

  • All SKILL.md files were processed
  • No orphaned or missing references in routing.csv
  • Deduplication stats match cache state

If failures reported, re-run aggregation:

rm -rf skills-aggregated/
cargo run --release -- aggregate

📈 Performance Characteristics

Operation Time Dependencies
First aggregate (120+ repos, 8000+ skills) 10-20 min Network speed, CPU count, LLM latency
Incremental aggregate (repos already cached) 2-5 min LLM classification speed (can skip with --skip-llm)
Sync to tools (10 tools, all hubs) 30-60 sec Disk I/O, junction creation speed
LLM classification (8000 skills) 3-8 min Batch size, LLM throughput

Optimization Tips:

  • Use PARALLEL_JOBS=auto for optimal CPU utilization
  • Set LLM_BATCH_SIZE=100 for faster LLM processing (requires more GPU/API quota)
  • Run on an SSD for 2-3x faster repository cloning
  • Use shallow clones (default) to reduce disk bandwidth

Reporting Issues

When reporting bugs, include:

  1. Output of cargo run --release -- doctor
  2. Contents of .skills-bank-cli-config.json (redact sensitive URLs if needed)
  3. Error message and stack trace (if any)
  4. Steps to reproduce

Extending Classification

To add new domain keywords or refine sub-hub routing:

  1. Edit src/classify.rsCONFLICT_RESOLUTION table or keyword rules
  2. Add test cases in tests/
  3. Run cargo test and cargo run --release -- aggregate
  4. Submit PR with classification examples

📄 License

MIT — See package.json for details.

Reviews (0)

No results found