unsloth-buddy

/unsloth-buddy I have 500 customer support Q&As and want to fine-tune a summarization model. I only have a MacBook Air.

English | 简体中文 | 繁體中文

What is this?

A fine-tuning agent that talks like a colleague. Describe your goal, and it asks the right questions, finds or formats your data, picks the right technique and model, trains on your hardware, validates the result, and packages it for deployment.

Runs on NVIDIA GPUs via Unsloth, natively on Apple Silicon via mlx-tune, and on free cloud GPUs via colab-mcp. Part of the Gaslamp AI development platform — docs.

One sentence, one fine-tuned model.

You: Fine-tune a small model on my customer support FAQ. I have a CSV file.

[Phase 0] Creating project: customer_faq_sft_2026_03_17/
[Phase 1] Requirements interview...
           Method: SFT   Model: Qwen2.5-0.5B   Deploy: Ollama
[Phase 2] Data strategy...
           Loading 1,200 rows from faq.csv → reformatting as chat messages
           Saved to data/train.jsonl (validated: messages column ✓)
[Phase 3] Environment: Apple M4 24GB, mlx-tune 0.4.3, Python 3.12
           Ready for training
[Phase 4] Training... 200 steps
           Final loss: 1.42 → saved to outputs/adapters/
[Phase 5] Evaluation (base vs fine-tuned, greedy decoding):
           Q: How do I reset my password?
           [Base]    I can help with that. Which password?
           [Fine-tuned]  Go to the login page → click "Forgot password" → check your email.
[Phase 6] Export → outputs/model-q4_k_m.gguf
           Run: ollama create my-faq-bot -f Modelfile && ollama run my-faq-bot

One conversation, seven phases, one deployable model — and a shareable demo page.

Quick Start

This skill includes sub-skills and utility scripts — install the full repository, not a single file.

Claude Code (recommended)

/plugin marketplace add TYH-labs/unsloth-buddy
/plugin install unsloth-buddy@TYH-labs/unsloth-buddy

Then describe what you want to fine-tune. The skill activates automatically.

Gemini CLI

gemini extensions install https://github.com/TYH-labs/unsloth-buddy --consent

Any agent supporting the Agent Skills standard

git clone https://github.com/TYH-labs/unsloth-buddy.git .agents/skills/unsloth-buddy

How is it different?

Most tools assume you already know what to do. This one doesn't.

Your concern	What actually happens
"I don't know where to start"	A 2-question interview locks in task, audience, and data — then recommends the right model, hardware, and method
"I don't have data, or it's in the wrong format"	A dedicated data phase acquires, generates, or reformats data to exactly match the trainer's required schema
"SFT? DPO? GRPO? Which one?"	Maps your goal to the right technique and explains why in plain language
"Which model? Will it fit in my GPU?"	Detects your hardware, maps to available model sizes, estimates cloud cost if needed
"Unsloth won't install on my machine"	Two-stage environment detection catches mismatches and prints the exact install command for your setup
"I trained it, but does it work?"	Runs the fine-tuned adapter alongside the base model so you can see the difference, not just a loss number
"How do I deploy it?"	You name the target (Ollama, vLLM, HF Hub) — it runs the conversion commands
"How do I reproduce this later — or hand it off?"	Every project gets a `gaslamp.md` roadbook: every kept decision with its rationale, plus 📖 learn blocks on the underlying ML concepts — enough for any agent or person to reproduce end-to-end

How it works

Seven phases, each scoped to an isolated dated project directory that never touches your repo root.

Phase	What happens	Output files
0. Init	Creates `{name}_{date}/` with standard directory structure	`gaslamp.md`, `progress_log.md`
1. Interview	2-question interview — task + data; captures domain/audience	`project_brief.md`
2. Data	Acquires, validates, and formats to trainer schema	`data_strategy.md`
3. Environment	Hardware scan → Python env check → blocks until ready	`detect_env_result.json`
4. Training	Generates and runs `train.py`, streams output to log	`outputs/adapters/`
5. Evaluation	Batch tests, interactive REPL, base vs fine-tuned comparison	`logs/eval.log`
5.5. Demo	Generates a shareable static HTML page — base vs fine-tuned side-by-side	`demos/<name>/index.html`
6. Export	GGUF, merged 16-bit, or Hub push	`outputs/`

customer_faq_sft_2026_03_17/
├── train.py              eval.py
├── data/                 outputs/adapters/
├── logs/
├── gaslamp.md            ← reproducibility roadbook
├── project_brief.md      data_strategy.md
├── memory.md             progress_log.md

Hardware Support

Hardware	Backend	What it can run
NVIDIA T4 (16 GB)	`unsloth`	7B QLoRA, small-scale GRPO
NVIDIA A100 (80 GB)	`unsloth`	70B QLoRA, 14B LoRA 16-bit
Apple M1 / M2 / M3 / M4	`mlx-tune` / `trl`	SFT/DPO: 7B on 10 GB, 13B on 24 GB; GRPO: 1–7B via TRL + PyTorch MPS
Google Colab (T4/L4/A100)	`unsloth` via `colab-mcp`	Free cloud GPU, opt-in

Unsloth is ~2× faster than standard HuggingFace training, uses up to 80% less VRAM, and produces exact gradients.

Supported training methods: SFT, DPO, GRPO, ORPO, KTO, SimPO, Vision SFT (Qwen2.5-VL, Llama 3.2 Vision, Gemma 3)

Training Dashboard

Every local training run automatically opens a real-time dashboard at http://localhost:8080/:

Task-aware panels — pass task_type="sft"|"dpo"|"grpo"|"vision" to unlock the right charts automatically
SSE streaming — instant updates via EventSource, no polling lag
EMA smoothed loss — clear trend line over noisy raw loss, plus running average
Dynamic phase badge — idle → training → completed / error, with colour-coded task-type badge
ETA, elapsed time & epoch — estimated time remaining and current epoch progress
GPU memory breakdown — baseline (model load) vs LoRA training overhead vs total, shown as gauge bars; works on both NVIDIA (CUDA) and Apple Silicon (MPS via driver_allocated_memory / recommended_max_memory)
GRPO panels — reward ± std-dev confidence band + KL divergence chart
DPO panels — chosen vs rejected reward + KL divergence chart
Gradient norm & tokens/sec — live stats row, fades in when data arrives
Completed summary banner — final memory and runtime stats on training end
Terminal UI (Plotext) — scripts/terminal_dashboard.py with --once for CLI snapshots; upgrades to 2×2 layout for DPO/GRPO
Demo server — python scripts/demo_server.py --task grpo --hardware mps|nvidia serves rich mock data so you can preview every panel without a GPU

Works on both NVIDIA (via GaslampDashboardCallback(task_type=...)) and Apple Silicon (via MlxGaslampDashboard(task_type=...)).

Demo Builder

After evaluation, the agent can generate a static HTML demo page that showcases base model vs fine-tuned outputs side-by-side — open it in any browser, no server needed. Great for sharing results with teammates, stakeholders, or in a portfolio.

The demo builder is part of the Gaslamp platform's presentation toolkit. We've simplified it for unsloth-buddy with two built-in themes and automatic domain-specific color customization:

Theme	Best for	Look
crisp-light	Business, healthcare, education, general	Clean, minimal, light background
dark-signal	Code, math, security, DevOps	Bold, high-contrast, monospace output

The accent color is auto-selected based on your model's domain (e.g. teal for healthcare, amber for education, electric cyan for code) — or you can pick your own.

Try the live example: demos/qwen2.5-0.5b-chip2-sft/index.html — download and open in any browser.

Google Colab Training

Apple Silicon users who need larger models or CUDA-only features can offload training to a free Colab GPU:

Install colab-mcp in Claude Code:

uv python install 3.13
claude mcp add colab-mcp -- uvx --from git+https://github.com/googlecolab/colab-mcp --python 3.13 colab-mcp

Open a Colab notebook, connect to a T4/L4 GPU runtime
The agent connects, installs Unsloth, starts training in a background thread, and polls metrics every 30s
Download adapters from the Colab file browser when done

Local mlx-tune remains the default — Colab is opt-in for when you need more power.

Gaslamp

unsloth-buddy works standalone or as part of a larger Gaslamp project — an agentic platform that orchestrates the full ML lifecycle from research to training to deployment. When called via Gaslamp, the project directory and state are shared across skills, and results pass automatically to the next phase.

Every project also gets a gaslamp.md roadbook — a reproducibility record that captures every kept decision with its rationale and 📖 learn blocks on the underlying ML concepts. Any agent or person can hand this file to a fresh session and reproduce the project end-to-end, or use it to understand why each choice was made.

gaslamp.dev/unsloth — gaslamp.dev

OpenClaw

unsloth-buddy is an OpenClaw-compatible skill. Share the repo URL with OpenClaw, describe what you want to fine-tune — it reads AGENTS.md, understands the workflow, and runs everything automatically.

1. Share https://github.com/TYH-labs/unsloth-buddy with OpenClaw
2. OpenClaw reads AGENTS.md → understands the 7-phase fine-tuning lifecycle
3. Say: "Fine-tune a model on my customer support data"
4. Done — OpenClaw runs the interview, formats data, trains, evaluates, and exports

For Claude Code, Gemini CLI, Codex, or any ACP-compatible agent: provide AGENTS.md as context and the agent will automatically follow the same workflow.

Changelog

2026-04-04 — Added Demo Builder (Phase 5.5): after evaluation, generates a static HTML demo page showing base vs fine-tuned outputs side-by-side. Two themes (crisp-light, dark-signal) with automatic domain-specific accent colors. No server needed — open the file in any browser. Part of the Gaslamp presentation toolkit, simplified for unsloth-buddy. Interview simplified from 5-point contract to 2-question format (task + data) that also captures user domain/audience for demo theming.
2026-03-22 — Added gaslamp.md reproducibility roadbook: every project now records all kept decisions with rationale and 📖 ML concept explanations (method, model, data, hyperparameters, eval, export), so any agent or person can reproduce the project end-to-end and understand why each choice was made. Template lives at templates/gaslamp_template.md; auto-generated by init_project.py.
2026-03-21 — Enhanced training dashboard: task-aware panels (SFT/DPO/GRPO/Vision), GPU memory breakdown (baseline vs LoRA vs total), GRPO reward ± std and KL divergence charts, DPO chosen/rejected reward and KL charts, epoch tracking, completed-training summary banner, terminal 2×2 layout for DPO/GRPO, and new scripts/demo_server.py mock server for UI development without a GPU.
2026-03-19 — Added terminal training dashboard (scripts/terminal_dashboard.py): live plotext charts of loss and learning rate in the terminal, with --once mode for Claude Code one-shot progress checks.
2026-03-18 — Added Google Colab training support via colab-mcp: free T4/L4/A100 GPU access from Claude Code, background-thread training with live polling, and adapter download workflow.

License

See LICENSE.txt. Unsloth is MIT licensed, mlx-tune is MIT licensed.