SkillLoop
Health Uyari
- License — License: NOASSERTION
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Gecti
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Standalone self-improvement harness for agent traces, memory, skills, evaluation, and fine-tuning exports
SkillLoop
SkillLoop is a standalone self-improvement harness for agent systems.
It sits beside an agent runtime, ingests completed agent traces, evaluates them, proposes durable memory and reusable skill updates, and exports fine-tuning-ready datasets. It is deliberately separated from Hermes or any other runtime so it can be reviewed, exported, versioned, or embedded without mutating an existing agent installation.
Why this exists
Most agents can execute tasks, but their learning loop is usually either missing or tightly coupled to one runtime. SkillLoop keeps that loop explicit:
- ingest traces from an agent
- normalize them into a stable schema
- evaluate quality and learning signals
- distill candidate memory and skill updates
- require review before applying anything
- export curated SFT/DPO data for model improvement
The MVP is local-first, stdlib-first, and review-first.
What SkillLoop does
- Normalizes generic JSONL, Hermes-style exports, and Hermes
state.dbsessions - Stores traces, evaluations, and proposals in local SQLite
- Uses a versioned normalized trace schema with runtime/adapter metadata
- Preserves raw trace inputs and records raw + normalized content hashes
- Records span-ready tool-call metadata such as IDs, timings, exit codes, status, error type, and artifact references
- Scores traces through a registered evaluator with versioned provenance and structured evidence
- Detects durable user preferences, corrections, success signals, and reusable workflows
- Creates deduplicated memory and skill proposals instead of silently mutating global state
- Tracks proposal lifecycle from
pendingtoapprovedtoapplied - Applies approved proposals only into the selected project directory
- Exports SFT JSONL and DPO JSONL datasets with optional score gates, split files, manifests, provenance, and count/token stats
- Replays traces through evaluator versions to benchmark score/evidence changes before training
- Generates reviewed training config artifacts for Unsloth, TRL, and Axolotl without running training
- Redacts common secret patterns during ingestion/export
What SkillLoop does not do in v1
- It does not replace an agent runtime
- It does not fine-tune a model directly
- It does not write into
~/.hermes/memories,~/.hermes/skills, or global agent config - It does not require cloud services
- It does not store credentials
Install for local development
git clone <repo-url>
cd skillloop
python -m pip install -e '.[dev]'
SkillLoop requires Python 3.11+.
Quickstart
Run the sample workflow from the repository root:
python -m skillloop.cli --path . init
python -m skillloop.cli --path . ingest generic examples/traces/simple_trace.jsonl
python -m skillloop.cli --path . traces list
python -m skillloop.cli --path . eval latest --evaluator rubric
python -m skillloop.cli --path . distill latest
python -m skillloop.cli --path . review list --verbose
python -m skillloop.cli --path . export sft --out data/sft.jsonl --min-score 70 --splits train=0.8,validation=0.1,test=0.1
python -m skillloop.cli --path . export dpo --out data/dpo.jsonl --min-score 70
The review list output shows proposal IDs. To test the approval/apply path, approve a listed proposal by full ID or unique prefix, then run apply:
python -m skillloop.cli --path . review approve <proposal-id-or-prefix>
python -m skillloop.cli --path . apply
You can also use the console script after installation:
skillloop --path . init
skillloop --path . ingest generic examples/traces/simple_trace.jsonl
CLI overview
skillloop --path <project-root> init
skillloop --path <project-root> setup --connect hermes [--start] [--auto-export]
skillloop --path <project-root> status [--json]
skillloop --path <project-root> ingest generic <jsonl-path>
skillloop --path <project-root> ingest hermes <json-path>
skillloop --path <project-root> ingest hermes-db --latest [--db-path ~/.hermes/state.db]
skillloop --path <project-root> ingest hermes-db --session-id <id> [--db-path ~/.hermes/state.db]
skillloop --path <project-root> traces list
skillloop --path <project-root> traces show <trace-id|latest>
skillloop --path <project-root> eval <trace-id|latest> [--evaluator rubric]
skillloop --path <project-root> distill <trace-id|latest>
skillloop --path <project-root> review list [--verbose]
skillloop --path <project-root> review approve <proposal-id-prefix>
skillloop --path <project-root> review reject <proposal-id-prefix>
skillloop --path <project-root> apply
skillloop --path <project-root> export sft --out <path> [--min-score N] [--splits train=0.8,validation=0.1,test=0.1] [--manifest-out manifest.json]
skillloop --path <project-root> export dpo --out <path> [--min-score N] [--splits train=0.8,validation=0.1,test=0.1] [--manifest-out manifest.json]
skillloop --path <project-root> benchmark [--baseline rubric_legacy] [--candidates rubric] [--out benchmark.json]
skillloop --path <project-root> training-config trl|unsloth|axolotl --dataset-manifest manifest.json --base-model <model> --output-dir <dir> --config-dir <dir>
skillloop --path <project-root> controller run
skillloop --path <project-root> controller history [--limit N]
skillloop --path <project-root> controller show <run-id-or-prefix>
Clean export boundary
SkillLoop writes only under the selected project root by default:
- local state:
.skillloop/skillloop.db - preserved raw trace inputs:
.skillloop/raw_traces/* - approved memory exports:
.skillloop/approved/memory/*.md - approved skill exports:
.skillloop/approved/skill/*.md - training data exports: user-selected paths such as
data/sft.jsonl - dataset manifests: default
<out>.manifest.jsonor--manifest-out <path>
This is intentional. The first version is a clean export layer, not a global self-mutating runtime.
Repository layout
skillloop/
adapters/ Trace ingestion adapters
apply/ Review-approved filesystem exports
distill/ Memory and skill proposal generation
dataset.py Dataset split, manifest, provenance, and stats helpers
eval/ Evaluator registry, deterministic rubric, and structured evidence helpers
export/ SFT and DPO dataset exporters
review/ Proposal review queue helpers
cli.py Command-line interface
schema.py Normalized trace/eval/proposal dataclasses
store.py SQLite persistence layer
training_config.py Unsloth/TRL/Axolotl config generation only
examples/
traces/ Sample input traces
tests/ Pytest coverage for the MVP
docs/ Architecture and usage documentation
Safety model
SkillLoop is review-first:
- Ingested traces are stored locally
- Raw traces are preserved locally with hashes for provenance
- Evaluations carry evaluator name, evaluator version, evidence, and trace schema version
- Distillation creates proposals, not global mutations
- Duplicate active proposals are skipped by content hash
- Human approval is required before
apply - Applied proposals are marked
appliedwith an application timestamp - Dataset exports include trace/evaluation/proposal provenance in record metadata and manifest summaries
- Approved exports stay inside
.skillloop/approved/ .env,.env.*, generated datasets, and local state are gitignored
See docs/safety.md for details.
Development checks
python -m pytest tests/ -q
python -m compileall skillloop tests -q
python -m pip wheel . --no-deps -w /tmp/skillloop-wheel-check
Expected MVP result: all tests pass and the sample workflow exports at least one SFT record.
Proof-of-work status
This repository is an initial proof-of-work for the SkillLoop architecture. It already demonstrates the core loop:
trace ingestion → evaluation → memory/skill proposals → human review → safe local apply → fine-tuning data export
The current proof-of-work also includes the first trustworthy-data layer needed before model training becomes meaningful:
- schema-versioned traces with backward compatibility for old traces
- runtime and adapter metadata on traces
- span-ready tool-call schema
- raw trace preservation and content hashes
- evaluator provenance and structured evidence
- evaluator registry for versioned scoring strategies
- proposal deduplication and applied lifecycle tracking
- dataset manifests, split exports, export metadata, provenance summaries, and deterministic token/count stats
- replay benchmark reports that compare evaluator versions before training
- Unsloth, TRL, and Axolotl config generation with explicit no-auto-training safety flags
See:
docs/architecture.mdfor system designdocs/cli.mdfor commandsdocs/safety.mdfor safety boundariesdocs/trace-schema.mdfor data format
License
Apache-2.0. See LICENSE.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi