visual-skills
Health Pass
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 21 GitHub stars
Code Warn
- Code scan incomplete — No supported source files were scanned during light audit
Permissions Pass
- Permissions — No dangerous permissions requested
This tool provides structured prompt templates for generating images and videos using AI models like Gemini, GPT Image, and various video generation platforms. It is designed for Claude-compatible agents and IDEs.
Security Assessment
The project appears to contain only markdown files and text-based templates rather than executable code, which is why the automated code scan found no supported source files to analyze. It does not request any dangerous permissions or require elevated system access. There are no indications of hardcoded secrets, network requests, or shell command execution. Because it relies on natural language instructions and static text rather than active scripts, the overall risk is Low.
Quality Assessment
The repository is highly active, with its most recent updates pushed today. It uses the permissive MIT license, making it safe for commercial and private use. Community trust is currently low but growing, indicated by 21 GitHub stars. The documentation is comprehensive and professional, detailing clear use cases and explicit instructions for various AI models. However, users should keep in mind that while this template pack is safe, any external APIs it instructs your agent to interact with will still require your own due diligence regarding data handling.
Verdict
Safe to use — it is a straightforward, well-documented collection of static prompt templates with no executable code or dangerous permissions.
Professional Claude Skills for AI image and video prompting. Supports Nano Banana (Gemini 3 Pro/Flash), GPT Image 2, Seedance 2.0, Kling 3.0 (multi-shot + native dialogue), Veo. Works with Claude Code, Cursor, Windsurf, OpenCode, Hermes-agent.
🎨 Visual Skills for Claude — Image & Video Prompting
Two professional Claude Skills for AI visual content production. They write production-grade prompts for the leading image and video models — picking the right model for the task, applying its specific syntax, and returning a copy-paste-ready prompt.
This is what a creative director, copywriter, or AI-content team uses instead of "be cinematic, 4k, masterpiece" filler.
✨ Supported Models
🖼️ Image generation models
| Model | Family | Best for | Notes |
|---|---|---|---|
| Nano Banana 2 (Flash) | Google Gemini 3 Flash Image | Default workhorse, fast & cheap | ~$0.04/image |
| Nano Banana Pro | Google Gemini 3 Pro Image | Complex multi-layered scenes, up to 14 reference images, image grounding (real places/species) | ~$0.15/image |
| GPT Image 2 | OpenAI | Brand assets, dense text, UI mockups, edits with hard preservation, up to 16 references | quality: low / medium / high |
| GPT Image 1.5 / 1 | OpenAI legacy | Migration path only | — |
| GPT Image 1-mini | OpenAI | Cheap exploratory batches | — |
🎬 Video generation models
| Model | Family | Best for | Notes |
|---|---|---|---|
| Seedance 1.0 / 1.5 / 2.0 Pro | ByteDance | Multi-shot in one clip, fast montage drama, 1080p, up to 12s | --resolution / --duration / --camerafixed, @img1 character lock |
| Seedance Lite | ByteDance | Cheaper batch generation, 720p | — |
| Kling 1.6 / 2.1 / 2.5 Turbo / 2.6 Pro | Kuaishou | Character consistency via Element Binding, Motion Brush, Motion Transfer, social verticals | Dedicated negative prompt field |
| Kling 3.0 (pro / standard) | Kuaishou | Native multi-shot up to 6 shots in one generation, native dialogue + lip-sync, voice tone control, 15s continuous output, in-prompt [Character A: ...] labeling |
— |
| Veo 3 / Veo (flagship) | Native dialogue + lip-sync, synchronized SFX, JSON prompts, commercial polish | Up to 8s | |
| Runway Gen-4, Luma Dream Machine, Pika 2, Sora | misc | Generic guidance via universal rules | No dedicated reference yet |
🤝 Compatible With
These are plain Claude Skills — markdown files plus a packaged .skill archive. They work in any agent or IDE that supports the Claude Skill format:
| Tool | How |
|---|---|
| Claude Code | Drop image/ or video/ into ~/.claude/skills/ (or run claude install image.skill) |
| Claude.ai Projects | Upload the source folder to your project's knowledge base |
| Claude Agent SDK | Reference the skill folder in your agent definition |
| Cursor / Windsurf | Copy the source folder into your project rules |
| Cline / Roo Code | Same — drop the folder into the agent's context |
| OpenCode / opencode-ai | Add as a skill in the agent config |
| Hermes-agent | Load via the agent's skill loader |
| Any LLM agent with structured prompt support | Works — content is plain markdown, no platform lock-in |
The skills work with Claude Opus, Sonnet, Haiku, and degrade gracefully on GPT / Gemini / open-weights agents (the markdown is model-agnostic).
📦 What's in the Repo
visual-skills/
├── image/ # Source folder for the image-prompting skill
├── image.skill # Packaged skill — drop-in installer
├── video/ # Source folder for the video-prompting skill
├── video.skill # Packaged skill — drop-in installer
├── README.md / README.ru.md
└── LICENSE # MIT
🖼️ image — What It Does
Writes prompts for AI image generation. Picks Nano Banana or GPT Image 2 based on the task, applies the model's specific syntax, returns a copy-paste-ready prompt with a header (model, quality, size).
Tasks covered:
- 📰 Editorial photography, posters, ad creatives
- 🛍️ Product shots, packaging, mockups
- 🖥️ UI mockups and product screenshots
- 📊 Infographics, diagrams, slides
- ✏️ Edits — try-on, lighting/weather swap, object removal, restoration, localization
- 👤 Character continuity across multiple images
- 🎞️ Storyboards, comics, sequential narrative
- 📐 Sketch-to-photo, wireframes, 2D-to-3D, floor plans
Model split:
| Decision cue | Use |
|---|---|
| Real place / species (image grounding) | Nano Banana |
| Extreme aspect ratios (1:8, 8:1, 4:1) | Nano Banana |
| Edit with hard preservation (try-on, swap) | GPT Image 2 |
| Small dense text, multi-font, brand assets | GPT Image 2 (quality: high) |
| UI mockup, product screenshot | GPT Image 2 |
| Default fast/cheap | Nano Banana 2 |
Reference files inside image/: models.md, nano-banana.md, gpt-image.md, golden-rules.md, prompt-framework.md, creative-direction.md, text-rendering.md, editing.md, characters.md, slides.md, storyboards.md, structural.md, dimensional.md.
🎬 video — What It Does
Writes prompts for AI video generation. Operates as a hybrid Director / Screenwriter / Editor — applies cinematic dramaturgy (scene formula, Murch Rule of Six, blocking, staging) and the model-specific syntax (Seedance multi-shot, Kling Element Binding, Veo JSON / dialogue).
Tasks covered:
- 🎯 Single 5-second clips and stitched multi-clip stories (15s / 30s / 60s+)
- 🎞️ Director treatments and shot lists (14-field shot card)
- 📋 Storyboards from script
- 🔧 Prompt audits ("here's my prompt, fix it")
- 📝 Translating scripts and storylines into shot-by-shot prompts
- 🔗 Continuity across clips (character lock, wardrobe, lighting logic)
- 🎭 Genre patterns: commercial, music video, drama, action, fashion, UGC, product film
Model split:
| Decision cue | Use |
|---|---|
| Multi-shot in one clip, fast montage drama, "Cut to" syntax, no audio needed | Seedance |
| Multi-shot with dialogue + lip-sync, up to 15s, multi-character voice control | Kling 3.0 |
| Character consistency across many social clips (no dialogue), Motion Brush, cheaper | Kling 2.6 Pro |
| Dialogue, lip-sync, synchronized SFX, polished voiceover commercial, JSON prompts | Veo |
Reference files inside video/: dramaturgy.md, universal-rules.md, seedance.md, kling.md, veo.md, role-modes.md, patterns-and-genres.md, camera-lighting-vocabulary.md, fixes-and-skeletons.md.
🚀 Installation
Option A — Install the packaged .skill
Download image.skill and/or video.skill from this repo and load through your Claude client:
# Claude Code
claude install image.skill
claude install video.skill
Option B — Clone the source
git clone https://github.com/smixs/visual-skills.git
Then copy the image/ and/or video/ folders into your skills directory:
# Claude Code
cp -r visual-skills/image ~/.claude/skills/
cp -r visual-skills/video ~/.claude/skills/
# Cursor / Windsurf — copy into your project's rules folder
cp -r visual-skills/image .cursor/rules/
💡 Usage Examples
Image — quick prompts:
"Сделай промпт для постера офисной кружки с надписью BEST DAY EVER, фон #f5f5dc, 16:9"
"Edit this product shot — change the background to plain white, keep the bottle exactly as is"
Image — model-aware:
"Use GPT Image 2 to mock up a Spotify-like UI for a meditation app, quality high"
"Use Nano Banana Pro — cinematic photograph of the Charles Bridge in Prague at golden hour, must be architecturally accurate"
Video — single prompt:
"Напиши промпт для Seedance — голодный мужик ночью находит последнюю сосиску в холодильнике, 5 секунд, мульти-шот"
Video — full breakdown:
"Раскадруй 30-секундный ролик про чувство вины. Главная эмоция — guilt. Опорный объект — телефон с непрочитанным сообщением."
"Audit this prompt: [...]. What's broken, how to fix?"
"Translate this script into 6 × 5-second Seedance prompts."
How It Works (short)
Each SKILL.md is a thin router. The body says "before producing any prompt, load these reference files in this exact order". The actual rules — model-specific syntax, dramaturgy, the Details Law, banned phrases that hurt the model — live only in references/. This forces the agent into the references and prevents lazy generic output.
For video specifically, every shot must own three concrete details: environmental pressure (cold refrigerator light, wet asphalt, flickering fluorescent), physical micro-action (jaw locks, knuckles whiten), and a sound or visual motif. Words like "cinematic", "epic", "stunning", "masterpiece" are banned — they don't render.
Credits & Sources
- Nano Banana — Google Gemini 3 Pro Image / Flash Image, prompting via fal.ai and Google AI Studio guides.
- GPT Image 2 — OpenAI, via OpenAI's developers cookbook and fal.ai's GPT Image 2 prompting guide.
- Seedance — ByteDance Seed, official Seedance 2.0 docs.
- Kling — Kuaishou, official Kling docs.
- Veo — Google DeepMind, official Veo docs.
- Video dramaturgy — Walter Murch (In the Blink of an Eye, Rule of Six), Akira Kurosawa (environment as character), David Fincher (motivated camera), Steven Spielberg (spatial clarity), Jonathan Glazer (one-sentence music video), Bong Joon Ho (storyboarding after locations).
License
MIT — fork it, adapt it, ship better visual content.
Tags: claude · claude-skills · claude-code · claude-agent-sdk · prompt-engineering · ai-image-generation · ai-video-generation · nano-banana · gpt-image · gpt-image-2 · seedance · kling · veo · creative-director · cursor · windsurf · cline · opencode · hermes-agent
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found