workflow-tracker
Health Uyari
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Gecti
- Code scan — Scanned 1 files during light audit, no dangerous patterns found
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Automatic experiment & workflow tracker for ML/AI projects
workflow-tracker
Automatic experiment & workflow tracker for ML/AI projects — no manual logging needed. Detects experiments in real time, stores structured records, generates progress reports on demand.
Why?
Common pains in ML experimentation:
- Ran dozens of experiments, forgot the parameters and conclusions two weeks later
- Notes scattered across chat logs, terminal output, and memory — impossible to aggregate
- Writing weekly reports means digging through all experiment records from scratch
- Paper CHANGELOGs and experiment logs have inconsistent formats
workflow-tracker solves all of these automatically — detects experiment activity and silently records it. Zero extra effort.
Install
npx skills add workflow-tracker -g
Or from a local clone:
git clone https://github.com/MarkD1Zzz/workflow-tracker.git
npx skills add ./workflow-tracker -g
Requires: Node.js ≥ 18, Claude Code or compatible agent.
Features
Auto-Detect & Silent Logging
Triggers automatically on these signals, without interrupting your workflow:
- Running training/evaluation scripts
- Metric changes (accuracy, F1, loss, etc.)
- Parameter changes ("change lr to 0.001")
- Verbal experiment conclusions ("tried X, result was Y")
Dual-Mode Output
| Project Type | Detection Signal | Output Files |
|---|---|---|
| Engineering | data/train/, train.py, pipeline |
workflow.json + workflow.md |
| Paper | tex/, manuscript, figures/ |
CHANGELOG.md + experiment_log.md |
Three-Level Structure
Phase → Task → Experiment
Each experiment auto-extracts: Hypothesis / Method / Parameters / Results (with delta) / Conclusion / Tags
Report Generation
Say "generate report" to produce:
- Paper project: Update CHANGELOG.md + experiment_log.md
- Engineering project: Generate
.docx.json+.pptx.jsonintermediate format (render with any tool later)
Examples
Scenario 1: Engineering — Classifier Swap
You: Swapped Stage 2 MLP for SVM(linear, C=1). Accuracy: 93.75% → 94.79%. SVM is deterministic.
Claude: Recorded. SVM(linear) → SUCCESS, delta +1.04pp.
→ workflow.json + workflow.md updated
Scenario 2: Paper — Ablation Study
You: Finished attention module ablation. SE 94.2%, CBAM 94.8%, FAA 96.1%.
Claude: Paper project detected (F:/paper/).
→ CHANGELOG.md appended with timeline entry
→ experiment_log.md appended with detailed record
Scenario 3: Report Generation
You: Generate a progress report for the last two weeks.
Claude: Generated report_20260614.docx.json + report_20260614.pptx.json
Run node render.js or python render.py to produce final files.
Output Formats
workflow.json (Engineering)
{
"project": "Welding Defect Classification",
"updated": "2026-06-14T14:30",
"phases": [{
"name": "Phase 1: Accuracy Optimization",
"status": "in_progress",
"tasks": [{
"name": "Task 1.1: Classifier Replacement",
"status": "completed",
"experiments": [{
"date": "2026-06-14",
"title": "SVM(linear) replaces MLP",
"method": "SVC(kernel='linear', C=1, class_weight='balanced')",
"params": {"kernel": "linear", "C": 1},
"results": {"baseline": 93.75, "new": 94.79, "delta": 1.04},
"conclusion": "SUCCESS",
"tags": ["classifier", "svm", "breakthrough"]
}]
}]
}]
}
CHANGELOG.md (Paper)
## 2026-06-14 — Attention Module Ablation
### Background
Comparing SE / CBAM / FAA attention modules on NEU-DET.
### Results
| Module | Accuracy | Delta vs SE |
|--------|----------|-------------|
| SE | 94.2% | baseline |
| CBAM | 94.8% | +0.6pp |
| FAA | 96.1% | +1.9pp |
### Conclusion
FAA significantly outperforms SE and CBAM. Ablation validates the attention redundancy hypothesis.
How It Works
- Project Type Detection: Scans directory structure (
tex/→paper,data/train/→engineering) - Signal Detection: Matches experiment keywords + numeric change patterns in conversation
- Batch Writing: Accumulates experiments, writes once per round (avoids excessive IO)
- Delta Auto-Calculation: Computes difference whenever old and new values appear
- Tag Auto-Classification: Assigns tags like
architecture,hyperparameter-tuning,classifier,data-augmentationbased on method type
Use Cases
- Deep learning model training & tuning
- Academic paper ablation study management
- GAN/VAE/Diffusion model iteration
- Computer vision classification/detection/segmentation
- Any ML workflow that needs "what was tried → what happened → what it means" tracking
Sub-Skills
manuscript-check — Paper Manuscript Integrity Checker
Six-step closed-loop verification for academic paper manuscripts. Activated when you question data provenance, architecture naming, ablation authenticity, or narrative consistency.
| Step | Action |
|---|---|
| 1. Source Verification | Trace back to original paper/code as ground truth |
| 2. Impact Analysis | Grep all occurrences across manuscript, estimate blast radius |
| 3. Batch Edit | Sync tex body, tables, figure scripts in one pass |
| 4. Residue Check | Verify old terms reach zero hits post-edit |
| 5. Consistency Audit | Detect contradictions between sections (numeric, terminology, evidence) |
| 6. Memory Persist | Update project memory files with final state |
Trigger signals: "verify X", "was this experiment actually run?", "X is my work not a citation", "X never existed", "sync figures"
Scoped to: F:/论文/ paper project (hardcoded architecture facts for RFS/EAAI context).
Repo Structure
workflow-tracker/
├── SKILL.md # Main skill file (Claude Code entry point)
├── SKILL_EN.md # English skill definition
├── README.md # This file (English)
├── README_zh.md # Chinese documentation
├── LICENSE # MIT
├── evals.json # 6 test cases, 25 assertions
├── .gitignore
└── manuscript-check/ # Sub-skill: paper manuscript integrity checker
└── SKILL.md # Six-step verification workflow
Development
Running Tests
cd workspace/iteration-2
python grade_all.py
Benchmark (v2)
| Metric | Value |
|---|---|
| Avg Response Time | 131s |
| Avg Tokens | 27k |
| Pass Rate (6 evals) | 100% |
| Paper Mode | ✓ |
| JSON Intermediate Format | ✓ |
License
MIT © 2026
Credits
Built on the Claude Code Skills framework. Inspired by real-world experiment management needs from welding defect classification, ConvNeXt-FAA paper research, and spot_welding_gan GAN training projects.
Changelog
v1.1.0 (2026-06-16)
- New:
manuscript-checksub-skill — six-step paper manuscript integrity verification- Source-to-manuscript cross-referencing with grep impact analysis
- Multi-section batch editing (tex + tables + figure scripts)
- Post-edit residue checking + narrative consistency audit
- Automatic memory file persistence
- Improved: Bootstrap CHANGELOG.md + experiment_log.md on first paper project load
v1.0.0 (2026-06-14)
- Initial release
- Auto-detect & silent logging for ML experiments
- Dual-mode output: engineering (
workflow.json+workflow.md) / paper (CHANGELOG.md+experiment_log.md) - Three-level structure: Phase → Task → Experiment
- Report generation:
.docx.json+.pptx.jsonintermediate format
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi