ai-media-generator

零門檻做出專業 AI 影片 / 圖片 / 音樂 — 因為 Claude 幫你套上「資深導演級」提示詞。

Zero-skill cinema. Senior-director prompts on autopilot.

說「做個古代將軍騎馬衝鋒的電影感短片」就好。Claude 會：

挑對平台 —— Seedance 寫實武打？Veo 3.1 要原生音效？Sora 2 要 bodycam 風格？這個 skill 知道
寫對提示詞 —— 不是 cinematic, 8k, beautiful 這種沒用的詞，而是 Deakins、Lubezki、Kodak Vision3 500T、teal-orange grade、Constraints tail（不抖動、不變形…）這類「平台真正吃」的 token
直接操作網站 —— 透過瀏覽器 MCP 把提示詞送上 OiiOii / Flow / Kling / Suno，按完所有按鈕，把成品 fetch 回來

涵蓋 14+ 平台：Midjourney v7、Flux 1.1 Pro、Sora 2、Veo 3.1、Kling 3.0、Seedance 2.0 pro、Suno v5、Runway Gen-4、Ideogram 3、Seedream、Nano Banana Pro、Vidu Q3、Stable Diffusion、OiiOii…

A Claude Code Skill for zero-skill, senior-quality AI media generation across 14+ platforms.

You say "make a cinematic shot of an ancient general charging on horseback." Claude handles the rest — picking the right platform, writing the platform-specific signature prompt (no more beautiful, masterpiece, detailed filler — actual director / DP / film-stock / lens / Constraints-tail vocabulary, calibrated to what the target model actually parses), and driving the browser to submit it.

為什麼需要這個 skill？

因為「提示詞會寫」是門高門檻 — 而且每個平台寫法都不同。

一般人寫的 prompt：

cinematic shot of an ancient general on a horse, 8k, beautiful, masterpiece, detailed

→ 平台吃不到方向，出來的成品永遠像「AI 圖庫」。

這個 skill 寫的 prompt（Seedance 中文戰鬥範例）：

古代中國將軍 Aria 主角，身披金色鱗甲、紅披風在風中翻飛，手持青銅寶劍前舉，
騎黑色戰馬在黎明戰場上緩慢衝鋒。戰場遠處千軍列陣、紅色戰旗獵獵、塵霧漂浮。
金色逆光晨霧、熒光輪廓。低角度手持 tracking 鏡頭沿馬側跟拍，輕微晃動、穩定地平線。
墨色留白東方美學、史詩寫實電影感。慢動作、流暢、連貫、不僵硬、720p 高清。
不抖動、不變形、不多肢、穩定地平線、穩定時間一致性。

→ 平台懂的每個細節都到位：8 維公式（Subject+Action+Scene+Light+Camera+Style+Quality+Constraints），符合 Seedance 中文訓練、單動詞、Constraints tail。

差別不在你聰不聰明 — 在你知不知道每個平台吃什麼 token。 這個 skill 就是把那份「該怎麼寫」的內部知識，從跨平台研究（X / Threads / Reddit / 小紅書 / Bilibili + 官方 cookbook）+ 兩輪 head-to-head benchmark 抓出來，塞給 Claude 用。

Why this exists (English)

Generative-AI prompts are not portable. The same idea sent to Flux vs Midjourney vs Seedance produces wildly different quality, because each model was trained on different captioning conventions:

Midjourney v7 loves comma-chunked detail + --style raw --stylize 750
Flux strips out director names but rewards 80-200 word natural paragraphs with technical photography vocabulary
Seedance 2.0 uses bracketed labels ([Style] [Scene] [Character] [Shot 1: 00:00-00:05]), eats lens focal lengths + format anchors (Sony A7S3, IMAX Fantasy Camera, Unreal Engine 5), supports native 4-modal audio (Sound design:) and bilingual lip-synced dialogue. Chinese preferred for 武打 / 古裝 / romance / MV.
Sora 2 wants "format anchors" (bodycam footage, surveillance) and quoted dialogue
Veo 3.1 is the only model where SFX / Ambient tags actually generate audio

This skill captures those signatures — both from official cookbooks (OpenAI, Google Cloud, fal.ai, BFL) and from real-world community posts on X / Threads / Reddit / 小紅書 / Bilibili — into a single reference Claude can load on demand.

What's inside

ai-media-generator/
├── SKILL.md                      # Top-level skill (auto-pilot + hard rules)
├── references/                   # Platform-specific prompt guides (14 platforms)
│   ├── community-prompt-patterns.md   # ⭐ cross-platform meta + per-model signatures
│   ├── cinematic-direction.md         # advanced director / DP / film-stock vocabulary
│   ├── commercial-direction.md
│   ├── vfx-effects.md
│   ├── sound-design.md
│   ├── editing-transitions.md
│   ├── camera-language.md
│   ├── selector.md                    # which platform for which use case
│   ├── kling.md / seedance.md / sora.md / veo.md / vidu.md / runway.md
│   ├── midjourney.md / flux.md / ideogram.md / seedream.md / stable-diffusion.md
│   ├── nano-banana.md / suno.md / oiioii.md
├── templates/
│   ├── auto-pilot.md             # one-line-to-output pipeline
│   ├── preset-packs.md           # 30+ ready-made preset prompts
│   ├── storyboard.md
│   ├── music-video.md
│   ├── negative-bank.md
│   ├── user-flags.md             # natural-language flag translator
│   └── token-efficient-mode.md   # lazy-load / grep / subagent strategy
└── automation/                   # browser automation protocols
    ├── browser-guide.md
    ├── click-protocol.md         # reliable-click SOP + token optimization
    └── site-profiles/            # deep UI maps for verified platforms
        ├── oiioii.md                  # ✅ Phase 1-3E + Seedance §12.9 deep playbook
        ├── flow.md                    # ✅ Veo 3.1 complete
        ├── kling.md                   # ✅ Kling 3.0 complete
        ├── suno.md                    # ✅ Suno v5 complete
        └── (stubs for midjourney/seedream/runway/sora/vidu/ideogram)

Key concepts

🔴 The Meta Rule (priority order)

Writing the prompt correctly once ≫ submitting it fast 10 times.

A wrong prompt costs ~10 minutes of regeneration + token waste. A slow submit costs ~5 minutes. So the speed-optimization priority is:

Look up the platform signature → references/community-prompt-patterns.md
Optimize single submission → automation/site-profiles/<platform>.md
Wait without polling → Bash run_in_background:true + sleep N

🔴 The Hard Rule

Every prompt must embed 5-8 high-signal tokens from the appropriate vocabulary layer (director / DP / lens / film-stock / lighting / grading / composition / VFX). Generic words (cinematic, 8k, beautiful, masterpiece) are banned — they dilute signal.

Platform-aware exception: Director names work on Midjourney / Sora 2 / Veo but get stripped by Flux / Nano Banana Pro. For Seedance / Wan, individual DP names are weak signal, but art-movement / brand-style names (Pixar / Ghibli / 90s anime / Fast and Furious / Bloom & Wild) do work. The skill bakes this nuance into the selection logic.

🤖 Auto-Pilot mode

When a user says "make me a 15-second cinematic ad of X" the skill:

Parses the request into 9 slots (medium / duration / aspect / subject / style / character / scene / audio / language)
Fills defaults
Selects the right platform via selector.md
Auto-generates the storyboard + prompt
Drives the browser via site-profiles/<platform>.md
Reports back

Installation

Drop the folder into your Claude Code skills directory:

# Per-user
git clone https://github.com/<your-org>/ai-media-generator.git ~/.claude/skills/ai-media-generator

# Or project-level
git clone https://github.com/<your-org>/ai-media-generator.git ./.claude/skills/ai-media-generator

Claude Code auto-discovers SKILL.md files inside skills directories.

Requirements

Claude Code (CLI) or any client that supports the Skills format.
For browser automation: the claude-in-chrome MCP server (extension) connected.
For specific platforms you want to drive: a logged-in account on that platform in the same browser session.

Verified platforms (browser automation)

Platform	Site profile status	Notes
OiiOii.ai	✅ Verified	Phase 1-3E + Seedance 2.0 pro deep playbook
Google Flow (Veo 3.1)	✅ Verified	Fast / Quality / Lite modes mapped
Kling 3.0	✅ Verified	+ Fast-Track + Native Audio
Suno v5	✅ Verified	5-song chain SOP
Midjourney v7	📝 Stub	Reference only
Seedream / Runway / Sora 2 / Vidu / Ideogram	📝 Stub	Prompt guides complete; site profiles pending

Prompt-craft reference files exist for all 14+ platforms regardless of automation status.

Provenance

This skill was built iteratively against real-world tasks (storyboarding, music-video production, cinematic shorts) with two rounds of head-to-head benchmarking. Iteration-1 measured "with-skill vs no-skill" baseline (95% vs 47%). Iteration-2 specifically measured "senior-director thinking" (vocabulary depth) — with-skill won 3/3 head-to-head subjective evaluations.

The community-prompt-patterns.md file consolidates research across:

Official cookbooks: OpenAI Sora 2, Google Cloud Veo 3.1, Runway Gen-4 / Aleph, fal.ai (Kling, Wan, Seedream), BFL (Flux Kontext), Ideogram, Midjourney v7 docs, Google Nano Banana
Community posts: X / Threads / Reddit / 小紅書 / Bilibili
Practitioner write-ups: Atlas Cloud Seedance Library, awesome-seedance-2-prompts (GitHub), 知乎 / 腾讯新聞 / CSDN 即夢 Seedance 手册

License

MIT — see LICENSE.

Contributing

This is an opinionated, evidence-driven skill. Contributions welcome — especially:

New site-profiles/<platform>.md for platforms that don't have one yet
Updates when platforms ship new versions (model signatures change quickly)
Better presets in templates/preset-packs.md

See CHANGELOG.md for version history.

Acknowledgments

Inspired by users who got tired of "cinematic, 8k, masterpiece" prompts and wanted Claude to write like a senior director instead.