visual-skills

agent
Security Audit
Warn
Health Pass
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 21 GitHub stars
Code Warn
  • Code scan incomplete — No supported source files were scanned during light audit
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose
This tool provides structured prompt templates for generating images and videos using AI models like Gemini, GPT Image, and various video generation platforms. It is designed for Claude-compatible agents and IDEs.

Security Assessment
The project appears to contain only markdown files and text-based templates rather than executable code, which is why the automated code scan found no supported source files to analyze. It does not request any dangerous permissions or require elevated system access. There are no indications of hardcoded secrets, network requests, or shell command execution. Because it relies on natural language instructions and static text rather than active scripts, the overall risk is Low.

Quality Assessment
The repository is highly active, with its most recent updates pushed today. It uses the permissive MIT license, making it safe for commercial and private use. Community trust is currently low but growing, indicated by 21 GitHub stars. The documentation is comprehensive and professional, detailing clear use cases and explicit instructions for various AI models. However, users should keep in mind that while this template pack is safe, any external APIs it instructs your agent to interact with will still require your own due diligence regarding data handling.

Verdict
Safe to use — it is a straightforward, well-documented collection of static prompt templates with no executable code or dangerous permissions.
SUMMARY

Professional Claude Skills for AI image and video prompting. Supports Nano Banana (Gemini 3 Pro/Flash), GPT Image 2, Seedance 2.0, Kling 3.0 (multi-shot + native dialogue), Veo. Works with Claude Code, Cursor, Windsurf, OpenCode, Hermes-agent.

README.md

🎨 Visual Skills for Claude — Image & Video Prompting

Claude Skill
License: MIT
image: Nano Banana + GPT Image 2
video: Seedance + Kling + Veo

🇷🇺 Читать на русском

Two professional Claude Skills for AI visual content production. They write production-grade prompts for the leading image and video models — picking the right model for the task, applying its specific syntax, and returning a copy-paste-ready prompt.

This is what a creative director, copywriter, or AI-content team uses instead of "be cinematic, 4k, masterpiece" filler.


✨ Supported Models

🖼️ Image generation models

Model Family Best for Notes
Nano Banana 2 (Flash) Google Gemini 3 Flash Image Default workhorse, fast & cheap ~$0.04/image
Nano Banana Pro Google Gemini 3 Pro Image Complex multi-layered scenes, up to 14 reference images, image grounding (real places/species) ~$0.15/image
GPT Image 2 OpenAI Brand assets, dense text, UI mockups, edits with hard preservation, up to 16 references quality: low / medium / high
GPT Image 1.5 / 1 OpenAI legacy Migration path only
GPT Image 1-mini OpenAI Cheap exploratory batches

🎬 Video generation models

Model Family Best for Notes
Seedance 1.0 / 1.5 / 2.0 Pro ByteDance Multi-shot in one clip, fast montage drama, 1080p, up to 12s --resolution / --duration / --camerafixed, @img1 character lock
Seedance Lite ByteDance Cheaper batch generation, 720p
Kling 1.6 / 2.1 / 2.5 Turbo / 2.6 Pro Kuaishou Character consistency via Element Binding, Motion Brush, Motion Transfer, social verticals Dedicated negative prompt field
Kling 3.0 (pro / standard) Kuaishou Native multi-shot up to 6 shots in one generation, native dialogue + lip-sync, voice tone control, 15s continuous output, in-prompt [Character A: ...] labeling
Veo 3 / Veo (flagship) Google Native dialogue + lip-sync, synchronized SFX, JSON prompts, commercial polish Up to 8s
Runway Gen-4, Luma Dream Machine, Pika 2, Sora misc Generic guidance via universal rules No dedicated reference yet

🤝 Compatible With

These are plain Claude Skills — markdown files plus a packaged .skill archive. They work in any agent or IDE that supports the Claude Skill format:

Tool How
Claude Code Drop image/ or video/ into ~/.claude/skills/ (or run claude install image.skill)
Claude.ai Projects Upload the source folder to your project's knowledge base
Claude Agent SDK Reference the skill folder in your agent definition
Cursor / Windsurf Copy the source folder into your project rules
Cline / Roo Code Same — drop the folder into the agent's context
OpenCode / opencode-ai Add as a skill in the agent config
Hermes-agent Load via the agent's skill loader
Any LLM agent with structured prompt support Works — content is plain markdown, no platform lock-in

The skills work with Claude Opus, Sonnet, Haiku, and degrade gracefully on GPT / Gemini / open-weights agents (the markdown is model-agnostic).


📦 What's in the Repo

visual-skills/
├── image/              # Source folder for the image-prompting skill
├── image.skill         # Packaged skill — drop-in installer
├── video/              # Source folder for the video-prompting skill
├── video.skill         # Packaged skill — drop-in installer
├── README.md / README.ru.md
└── LICENSE             # MIT

🖼️ image — What It Does

Writes prompts for AI image generation. Picks Nano Banana or GPT Image 2 based on the task, applies the model's specific syntax, returns a copy-paste-ready prompt with a header (model, quality, size).

Tasks covered:

  • 📰 Editorial photography, posters, ad creatives
  • 🛍️ Product shots, packaging, mockups
  • 🖥️ UI mockups and product screenshots
  • 📊 Infographics, diagrams, slides
  • ✏️ Edits — try-on, lighting/weather swap, object removal, restoration, localization
  • 👤 Character continuity across multiple images
  • 🎞️ Storyboards, comics, sequential narrative
  • 📐 Sketch-to-photo, wireframes, 2D-to-3D, floor plans

Model split:

Decision cue Use
Real place / species (image grounding) Nano Banana
Extreme aspect ratios (1:8, 8:1, 4:1) Nano Banana
Edit with hard preservation (try-on, swap) GPT Image 2
Small dense text, multi-font, brand assets GPT Image 2 (quality: high)
UI mockup, product screenshot GPT Image 2
Default fast/cheap Nano Banana 2

Reference files inside image/: models.md, nano-banana.md, gpt-image.md, golden-rules.md, prompt-framework.md, creative-direction.md, text-rendering.md, editing.md, characters.md, slides.md, storyboards.md, structural.md, dimensional.md.


🎬 video — What It Does

Writes prompts for AI video generation. Operates as a hybrid Director / Screenwriter / Editor — applies cinematic dramaturgy (scene formula, Murch Rule of Six, blocking, staging) and the model-specific syntax (Seedance multi-shot, Kling Element Binding, Veo JSON / dialogue).

Tasks covered:

  • 🎯 Single 5-second clips and stitched multi-clip stories (15s / 30s / 60s+)
  • 🎞️ Director treatments and shot lists (14-field shot card)
  • 📋 Storyboards from script
  • 🔧 Prompt audits ("here's my prompt, fix it")
  • 📝 Translating scripts and storylines into shot-by-shot prompts
  • 🔗 Continuity across clips (character lock, wardrobe, lighting logic)
  • 🎭 Genre patterns: commercial, music video, drama, action, fashion, UGC, product film

Model split:

Decision cue Use
Multi-shot in one clip, fast montage drama, "Cut to" syntax, no audio needed Seedance
Multi-shot with dialogue + lip-sync, up to 15s, multi-character voice control Kling 3.0
Character consistency across many social clips (no dialogue), Motion Brush, cheaper Kling 2.6 Pro
Dialogue, lip-sync, synchronized SFX, polished voiceover commercial, JSON prompts Veo

Reference files inside video/: dramaturgy.md, universal-rules.md, seedance.md, kling.md, veo.md, role-modes.md, patterns-and-genres.md, camera-lighting-vocabulary.md, fixes-and-skeletons.md.


🚀 Installation

Option A — Install the packaged .skill

Download image.skill and/or video.skill from this repo and load through your Claude client:

# Claude Code
claude install image.skill
claude install video.skill

Option B — Clone the source

git clone https://github.com/smixs/visual-skills.git

Then copy the image/ and/or video/ folders into your skills directory:

# Claude Code
cp -r visual-skills/image  ~/.claude/skills/
cp -r visual-skills/video  ~/.claude/skills/

# Cursor / Windsurf — copy into your project's rules folder
cp -r visual-skills/image  .cursor/rules/

💡 Usage Examples

Image — quick prompts:

"Сделай промпт для постера офисной кружки с надписью BEST DAY EVER, фон #f5f5dc, 16:9"

"Edit this product shot — change the background to plain white, keep the bottle exactly as is"

Image — model-aware:

"Use GPT Image 2 to mock up a Spotify-like UI for a meditation app, quality high"

"Use Nano Banana Pro — cinematic photograph of the Charles Bridge in Prague at golden hour, must be architecturally accurate"

Video — single prompt:

"Напиши промпт для Seedance — голодный мужик ночью находит последнюю сосиску в холодильнике, 5 секунд, мульти-шот"

Video — full breakdown:

"Раскадруй 30-секундный ролик про чувство вины. Главная эмоция — guilt. Опорный объект — телефон с непрочитанным сообщением."

"Audit this prompt: [...]. What's broken, how to fix?"

"Translate this script into 6 × 5-second Seedance prompts."


How It Works (short)

Each SKILL.md is a thin router. The body says "before producing any prompt, load these reference files in this exact order". The actual rules — model-specific syntax, dramaturgy, the Details Law, banned phrases that hurt the model — live only in references/. This forces the agent into the references and prevents lazy generic output.

For video specifically, every shot must own three concrete details: environmental pressure (cold refrigerator light, wet asphalt, flickering fluorescent), physical micro-action (jaw locks, knuckles whiten), and a sound or visual motif. Words like "cinematic", "epic", "stunning", "masterpiece" are banned — they don't render.


Credits & Sources

  • Nano Banana — Google Gemini 3 Pro Image / Flash Image, prompting via fal.ai and Google AI Studio guides.
  • GPT Image 2 — OpenAI, via OpenAI's developers cookbook and fal.ai's GPT Image 2 prompting guide.
  • Seedance — ByteDance Seed, official Seedance 2.0 docs.
  • Kling — Kuaishou, official Kling docs.
  • Veo — Google DeepMind, official Veo docs.
  • Video dramaturgy — Walter Murch (In the Blink of an Eye, Rule of Six), Akira Kurosawa (environment as character), David Fincher (motivated camera), Steven Spielberg (spatial clarity), Jonathan Glazer (one-sentence music video), Bong Joon Ho (storyboarding after locations).

License

MIT — fork it, adapt it, ship better visual content.


Tags: claude · claude-skills · claude-code · claude-agent-sdk · prompt-engineering · ai-image-generation · ai-video-generation · nano-banana · gpt-image · gpt-image-2 · seedance · kling · veo · creative-director · cursor · windsurf · cline · opencode · hermes-agent

Reviews (0)

No results found