workflow-tracker

workflow
Security Audit
Warn
Health Warn
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Pass
  • Code scan — Scanned 1 files during light audit, no dangerous patterns found
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

Automatic experiment & workflow tracker for ML/AI projects

README.md

workflow-tracker

Automatic experiment & workflow tracker for ML/AI projects — no manual logging needed. Detects experiments in real time, stores structured records, generates progress reports on demand.

License: MIT
Skills.sh

中文文档


Why?

Common pains in ML experimentation:

  • Ran dozens of experiments, forgot the parameters and conclusions two weeks later
  • Notes scattered across chat logs, terminal output, and memory — impossible to aggregate
  • Writing weekly reports means digging through all experiment records from scratch
  • Paper CHANGELOGs and experiment logs have inconsistent formats

workflow-tracker solves all of these automatically — detects experiment activity and silently records it. Zero extra effort.


Install

npx skills add workflow-tracker -g

Or from a local clone:

git clone https://github.com/MarkD1Zzz/workflow-tracker.git
npx skills add ./workflow-tracker -g

Requires: Node.js ≥ 18, Claude Code or compatible agent.


Features

Auto-Detect & Silent Logging

Triggers automatically on these signals, without interrupting your workflow:

  • Running training/evaluation scripts
  • Metric changes (accuracy, F1, loss, etc.)
  • Parameter changes ("change lr to 0.001")
  • Verbal experiment conclusions ("tried X, result was Y")

Dual-Mode Output

Project Type Detection Signal Output Files
Engineering data/train/, train.py, pipeline workflow.json + workflow.md
Paper tex/, manuscript, figures/ CHANGELOG.md + experiment_log.md

Three-Level Structure

Phase → Task → Experiment

Each experiment auto-extracts: Hypothesis / Method / Parameters / Results (with delta) / Conclusion / Tags

Report Generation

Say "generate report" to produce:

  • Paper project: Update CHANGELOG.md + experiment_log.md
  • Engineering project: Generate .docx.json + .pptx.json intermediate format (render with any tool later)

Examples

Scenario 1: Engineering — Classifier Swap

You: Swapped Stage 2 MLP for SVM(linear, C=1). Accuracy: 93.75% → 94.79%. SVM is deterministic.

Claude: Recorded. SVM(linear) → SUCCESS, delta +1.04pp.
       → workflow.json + workflow.md updated

Scenario 2: Paper — Ablation Study

You: Finished attention module ablation. SE 94.2%, CBAM 94.8%, FAA 96.1%.

Claude: Paper project detected (F:/paper/).
       → CHANGELOG.md appended with timeline entry
       → experiment_log.md appended with detailed record

Scenario 3: Report Generation

You: Generate a progress report for the last two weeks.

Claude: Generated report_20260614.docx.json + report_20260614.pptx.json
        Run node render.js or python render.py to produce final files.

Output Formats

workflow.json (Engineering)

{
  "project": "Welding Defect Classification",
  "updated": "2026-06-14T14:30",
  "phases": [{
    "name": "Phase 1: Accuracy Optimization",
    "status": "in_progress",
    "tasks": [{
      "name": "Task 1.1: Classifier Replacement",
      "status": "completed",
      "experiments": [{
        "date": "2026-06-14",
        "title": "SVM(linear) replaces MLP",
        "method": "SVC(kernel='linear', C=1, class_weight='balanced')",
        "params": {"kernel": "linear", "C": 1},
        "results": {"baseline": 93.75, "new": 94.79, "delta": 1.04},
        "conclusion": "SUCCESS",
        "tags": ["classifier", "svm", "breakthrough"]
      }]
    }]
  }]
}

CHANGELOG.md (Paper)

## 2026-06-14 — Attention Module Ablation

### Background
Comparing SE / CBAM / FAA attention modules on NEU-DET.

### Results
| Module | Accuracy | Delta vs SE |
|--------|----------|-------------|
| SE     | 94.2%    | baseline    |
| CBAM   | 94.8%    | +0.6pp      |
| FAA    | 96.1%    | +1.9pp      |

### Conclusion
FAA significantly outperforms SE and CBAM. Ablation validates the attention redundancy hypothesis.

How It Works

  1. Project Type Detection: Scans directory structure (tex/→paper, data/train/→engineering)
  2. Signal Detection: Matches experiment keywords + numeric change patterns in conversation
  3. Batch Writing: Accumulates experiments, writes once per round (avoids excessive IO)
  4. Delta Auto-Calculation: Computes difference whenever old and new values appear
  5. Tag Auto-Classification: Assigns tags like architecture, hyperparameter-tuning, classifier, data-augmentation based on method type

Use Cases

  • Deep learning model training & tuning
  • Academic paper ablation study management
  • GAN/VAE/Diffusion model iteration
  • Computer vision classification/detection/segmentation
  • Any ML workflow that needs "what was tried → what happened → what it means" tracking

Sub-Skills

manuscript-check — Paper Manuscript Integrity Checker

Six-step closed-loop verification for academic paper manuscripts. Activated when you question data provenance, architecture naming, ablation authenticity, or narrative consistency.

Step Action
1. Source Verification Trace back to original paper/code as ground truth
2. Impact Analysis Grep all occurrences across manuscript, estimate blast radius
3. Batch Edit Sync tex body, tables, figure scripts in one pass
4. Residue Check Verify old terms reach zero hits post-edit
5. Consistency Audit Detect contradictions between sections (numeric, terminology, evidence)
6. Memory Persist Update project memory files with final state

Trigger signals: "verify X", "was this experiment actually run?", "X is my work not a citation", "X never existed", "sync figures"

Scoped to: F:/论文/ paper project (hardcoded architecture facts for RFS/EAAI context).


Repo Structure

workflow-tracker/
├── SKILL.md               # Main skill file (Claude Code entry point)
├── SKILL_EN.md            # English skill definition
├── README.md              # This file (English)
├── README_zh.md           # Chinese documentation
├── LICENSE                # MIT
├── evals.json             # 6 test cases, 25 assertions
├── .gitignore
└── manuscript-check/      # Sub-skill: paper manuscript integrity checker
    └── SKILL.md           # Six-step verification workflow

Development

Running Tests

cd workspace/iteration-2
python grade_all.py

Benchmark (v2)

Metric Value
Avg Response Time 131s
Avg Tokens 27k
Pass Rate (6 evals) 100%
Paper Mode
JSON Intermediate Format

License

MIT © 2026


Credits

Built on the Claude Code Skills framework. Inspired by real-world experiment management needs from welding defect classification, ConvNeXt-FAA paper research, and spot_welding_gan GAN training projects.


Changelog

v1.1.0 (2026-06-16)

  • New: manuscript-check sub-skill — six-step paper manuscript integrity verification
    • Source-to-manuscript cross-referencing with grep impact analysis
    • Multi-section batch editing (tex + tables + figure scripts)
    • Post-edit residue checking + narrative consistency audit
    • Automatic memory file persistence
  • Improved: Bootstrap CHANGELOG.md + experiment_log.md on first paper project load

v1.0.0 (2026-06-14)

  • Initial release
  • Auto-detect & silent logging for ML experiments
  • Dual-mode output: engineering (workflow.json + workflow.md) / paper (CHANGELOG.md + experiment_log.md)
  • Three-level structure: Phase → Task → Experiment
  • Report generation: .docx.json + .pptx.json intermediate format

Reviews (0)

No results found