youtube-shorts-generator
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 6 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
Automated YouTube Shorts pipeline - discovers strange historical events, generates scripts with Claude AI, creates images, records voiceover, assembles videos, and uploads to YouTube daily.
Unreal History Bot
An autonomous YouTube Shorts channel about strange real history. Run it on a daily schedule and the channel runs itself — topics, scripts, images, voiceover, subtitles, upload.
📺 See it in action: @ThatActuallyHappened11 on YouTube — the live channel this repo runs.
You set it up once. From then on, one command per day publishes one short. Topics are picked from a Claude-generated queue scored for virality, scripts use proven hook formulas, images come from FLUX, the voice is Kokoro TTS, and YouTube performance data feeds back into the next batch of topics so the channel learns what works.
Highlights
- Virality-scored topic queue — Claude rates 25 topic ideas 1–10; only ≥7 ship, sorted best-first.
- Content-safety pre-check — every script is evaluated against YouTube's demotion rules before any image-generation spend; failed scripts halt the pipeline and are marked failed in the queue.
- Scene-aware image prompts — role + preset system, plus per-beat
scene_visualsso the 5 images describe distinct moments (wide → close-up → portrait → action → aftermath) instead of 5 near-identical takes on one theme. - Manual image A/B mode —
--manual-imagespauses the pipeline after scene planning so you can drop in images from Midjourney / DALL-E / Sora / etc. and resume; lets you A/B test external generators against the FLUX baseline. See manual-usage.md. - Multi-language metadata — title + description auto-translated to Spanish / Portuguese / Hindi / Indonesian via Haiku 4.5 and attached to the upload as YouTube
localizations. - Real caption tracks — SRT uploaded via
captions.insert, so the CC button in the YouTube player exposes a selectable English track (not just burned-in pixels). - Whisper-timed subtitles — burned-in 3-word cards synced to the audio, with a CTA overlay in the last 3 seconds.
- Analytics feedback loop —
--analyticssummarises which keywords and hook types perform best; those signals are injected into the next--refresh-topicsprompt. - Per-run cost + timing tracker —
output/<slug>/cost.json+ a chronologicalcost_ledger.txtwith running totals. - Fully resumable — every step caches its output. Re-run after any failure and the pipeline picks up where it stopped.
Table of Contents
- Quick Start
- How It Works
- Requirements
- Setup
- CLI
- Topic Virality Scoring
- Hook Formulas
- Daily Automation
- Image / Voice Generation
- Scene Planning & Presets
- Output Format & Structure
- Built With Claude Code
Quick Start
git clone https://github.com/poyrazemun/youtube-shorts-generator && cd youtube-shorts-generator
py -3.12 -m pip install -r requirements.txt
cp .env.example .env # then add ANTHROPIC_API_KEY + REPLICATE_API_TOKEN
py -3.12 orchestrator.py --refresh-topics # generate the topic queue
py -3.12 orchestrator.py --auto # publish one video
Full setup (ffmpeg, espeak-ng, YouTube OAuth) is in Setup below.
The commands use
py -3.12(the Windows Python launcher). On macOS or Linux, replace it withpython3.12everywhere.
How It Works
The pipeline is a six-step CLI: pick a topic from the scored queue, write the script, generate images, synthesise voice, burn captions, upload. Every step caches its output, so re-running picks up where it stopped.
Cost per video — measured from a real run (output/<slug>/cost.json):
| Component | Cost |
|---|---|
| Claude (event + script + content safety, Sonnet 4.6) | ~$0.03 |
| Claude localizations (title + description × 4 languages, Haiku 4.5) | ~$0.005 |
Image generation — HuggingFace FLUX.1-schnell (HUGGINGFACE_API_TOKEN set) |
Free |
| Image generation — Replicate FLUX.1-dev (5 × $0.025) | $0.125 |
| Kokoro TTS, captions, ffmpeg assembly, YouTube upload | Free |
| Total: HuggingFace path | ~$0.03 |
| Total: Replicate path | ~$0.16 |
Switching to HuggingFace saves about 80% per video. The Replicate path is the safer fallback (no rate limits, consistent quality on FLUX.1-dev) but you pay per image.
Detailed pipeline diagram--refresh-topics Claude generates 25 topics, scores each for viral potential (1-10),
discards anything below 7, filters out already-used keywords,
resets stale runs, sorts by score — best topics queued first
↓
--auto (run daily via Windows Task Scheduler)
↓
Step 1 event_discovery.py Claude → 1 strange historical event
Step 2 script_generator.py Claude + DuckDuckGo research → viral script with hook formula + rehook + loopable ending + per-beat scene_visuals + SEO metadata
Step 2b localizer.py Haiku 4.5 → es/pt/hi/id title + description, cached in scripts.json
Step 2.5 content_safety.py Claude evaluates the script vs. YouTube demotion rules; halts before image spend on a fail
Step 3 image_generator.py FLUX (HuggingFace schnell, or Replicate dev) → 5 cinematic 9:16 images
Step 4 tts_generator.py Kokoro neural TTS → narration audio (fallback: Piper → Coqui → Edge TTS)
Step 5a captions.py Whisper / estimation → word-timed subtitles
Step 5b video_assembler.py ffmpeg → 1080×1920 MP4 with burned captions + CTA overlay + background music
Step 6 youtube_uploader.py YouTube Data API v3 → upload video + SRT caption track + localized metadata
↓
--analytics Fetches view counts, feeds performance data back into topic generation
Requirements
- Python 3.12 (Kokoro TTS requires 3.10–3.12)
- ffmpeg on PATH
- espeak-ng on PATH (required by Kokoro)
- Anthropic API key (Claude)
- YouTube Data API v3 OAuth credentials
- Replicate API token (for image generation, ~$0.025/image with FLUX.1-dev — see Cost)
pip install -r requirements.txt
# Windows
winget install ffmpeg
# macOS
brew install ffmpeg
# Linux
sudo apt install ffmpeg
Setup
1. Environment
cp .env.example .env
ANTHROPIC_API_KEY=sk-ant-...
REPLICATE_API_TOKEN=r8_... # ~$0.025/image with FLUX.1-dev (5 images = $0.125 per video)
YOUTUBE_PRIVACY=private
2. YouTube API credentials
- Google Cloud Console → Enable YouTube Data API v3
- Create OAuth 2.0 Client ID (Desktop app) → download as
client_secrets.json - Place in project root
3. First run (opens browser for YouTube OAuth)
py -3.12 orchestrator.py --analytics
CLI
# Automated mode — picks highest-scoring topic from queue, runs full pipeline
py -3.12 orchestrator.py --auto
# Regenerate topic queue (Claude scores + filters 25 topics; run weekly to keep queue fresh)
py -3.12 orchestrator.py --refresh-topics
# View current topic queue with virality scores
py -3.12 orchestrator.py --list-topics
# Fetch YouTube analytics + print performance summary
py -3.12 orchestrator.py --analytics
# Manual mode
py -3.12 orchestrator.py --topic "Strange War Stories" --keyword "battle" [--count N] [--no-upload]
# Pick a specific topic from the queue by ID (shown in --list-topics)
py -3.12 orchestrator.py --pick a3f2
# Wipe entire queue and generate a fresh one (asks for confirmation)
py -3.12 orchestrator.py --clear-topics
# Remove a single topic by ID (asks for confirmation)
py -3.12 orchestrator.py --delete-topic a3f2
# Dry run — validate pipeline wiring end-to-end with zero API spend
# (skips Claude, forces PIL images, skips YouTube upload; topic/keyword optional)
py -3.12 orchestrator.py --dry-run
# Manual image mode — pause after scene planning so you can drop in images
# generated externally (Midjourney / DALL-E / Sora / etc.), then re-run the
# same command to resume from TTS through upload. See manual-usage.md.
py -3.12 orchestrator.py --auto --manual-images
# Flags available on all modes
--no-upload skip YouTube upload, save videos locally
--no-edit skip prompt editing pause (automation mode)
--verbose DEBUG-level console logging
--dry-run skip Claude + force PIL images + skip upload (no API spend)
--manual-images pause after scene planning; resume once you have dropped img_0..N.png into the printed folder
Topic Virality Scoring
Every time --refresh-topics runs, Claude rates each generated topic on a 1–10 virality scale:
| Score | Meaning |
|---|---|
| 9–10 | Sounds completely fake but is true. Debunks a widely-held belief. Famous person in shocking context. |
| 7–8 | Genuinely surprising with strong hook potential. |
| < 7 | Discarded — not queued |
Topics are sorted highest score first, so --auto always produces the most viral-potential video available.
Deduplication
Keywords from already-uploaded videos (tracked in video_registry.json) are automatically excluded during topic generation — both via the Claude prompt and a post-generation filter. This prevents the pipeline from regenerating videos on topics you've already covered.
Stale Run Recovery
If a pipeline run crashes or is interrupted, the topic stays in_progress. On the next --refresh-topics, any in_progress entry older than 2 hours is automatically reset to failed and replaced with fresh topics.
Content Safety Check (Step 2.5)
Between script generation and image generation, every script is run through one Claude call that evaluates it against growth/youtube-restriction-rules.md. If the script would be demoted, age-restricted, demonetized, or removed by YouTube — most importantly Rule 4 (conspiracy framing), Rule 5 (forbidden categories: suicide methods, sexual violence, harm to minors, terrorism glorification, false claims about living people), and Rule 6 (graphic gore as focal point) — the pipeline halts before spending on image generation. The topic is marked failed in the queue with the violated rule, so re-running --auto picks the next pending topic.
The full per-script verdict is saved to output/<slug>/safety.json for audit. Safety check failures on infrastructure (Claude API down, malformed JSON response) fail-open with a warning — only a successful "fail" verdict from Claude halts the run, so an offline Claude doesn't block your pipeline. The check is skipped under --dry-run (zero API spend stays zero).
Cost & Timing Tracking
Every successful run records per-step wall-clock + Claude token usage + image-generation counts (per provider) and writes two artifacts:
output/<slug>/cost.json— full breakdown for that one video (steps, timings, tokens, image counts, USD totals).output/cost_ledger.txt— chronological one-line-per-video append-only log with aTOTALfooter recomputed each run. Re-running the same slug replaces the existing row instead of duplicating it.
Both files are gitignored. After each successful run a one-line summary prints to the console:
Pipeline finished in 200s, ~$0.1552 spend (Claude $0.0302, images 5×replicate $0.1250)
Updating pricing when rates change
Pricing rates live in config.py:
CLAUDE_PRICING—{model_id: {"input": $/MTok, "output": $/MTok}}. If Anthropic changes prices or you switch models, edit the dict directly. Unknown models are recorded as$0(with a warning logged), so an unset model won't crash a run.IMAGE_PRICING—{provider: $/image}. Replicate and HuggingFace rates are also env-overridable viaIMAGE_COST_REPLICATEandIMAGE_COST_HUGGINGFACEin.env, so you can tweak rates without touching code.
Switching image providers
If you swap providers (e.g. Replicate → HuggingFace, or add a new one), update both:
- The actual provider call in
pipeline/image_generator.py(detect_backend()and the per-backend functions). IMAGE_PRICINGinconfig.py— add the new provider's per-image cost so cost tracking stays accurate. The provider key recorded intocost.jsonis whatever stringimage_generatorpasses totracker.record_image(), so keep the names aligned.
Analytics Feedback Loop
--analytics fetches view/like counts for every uploaded video and saves them to output/analytics.json along with two derived signals (each requires ≥ 2 videos to be considered, so single uploads can't poison the ranking):
- Top / worst keywords — average views grouped by
keyword - Hook type performance — average views grouped by the
hook_typeClaude tagged on each script
The next time --refresh-topics runs, those signals are flattened into a plain-English hint string and injected into Claude's topic-generation prompt — for example: "Top performing keywords by average views: napoleon (12,400 avg, 3 videos). Hook type performance: FALSE_ASSUMPTION (9,800 avg, 4 videos). Prefer FALSE_ASSUMPTION hooks when it fits the story." Claude is told to bias new topics toward winning patterns and avoid losing ones.
The exact hint string Claude received is persisted into topics_queue.json as performance_hints_used, so you can audit afterwards which signal shaped the queue. Both --analytics and --refresh-topics now print the hint string they are about to send so the loop is visible from the CLI.
Hook Formulas
Every script uses one of 5 proven hook formulas chosen by Claude for that specific event:
| Formula | Example |
|---|---|
| SHOCKING_FACT | "A man once sold the Eiffel Tower — twice." |
| FALSE_ASSUMPTION | "Everyone thinks Einstein failed math. He didn't — but his teachers still wanted him gone." |
| CONSEQUENCE_FIRST | "This one telegram started World War One." |
| SPECIFIC_NUMBER | "In 1518, 400 people danced non-stop for 2 months — and couldn't stop." |
| DIRECT_ADDRESS | "You've used this invention today — but its creator was executed for making it." |
Hard-banned openers: "Did you know", "In [year]...", any visual reference.
Scripts now follow a 5-beat retention structure: Hook → Context → Rehook → Twist → Ending fact.
The rehook is designed to reset curiosity midway through the short, and the ending fact is prompted to connect back to the opener for better loopability.
Daily Automation (Windows Task Scheduler)
Run --auto daily and --refresh-topics weekly via Windows Task Scheduler. See HOW_TO_USE.md for the full setup guide.
Image Generation (Priority Order)
| Priority | Backend | Requirement |
|---|---|---|
| 1 | HuggingFace (FLUX.1-schnell) | HUGGINGFACE_API_TOKEN in .env |
| 2 | Replicate (FLUX.1-dev) | REPLICATE_API_TOKEN in .env (~$0.025/img) |
| 3 | PIL placeholder | Always available (offline fallback) |
Each image is retried up to 3 times with exponential backoff before falling back to the next backend.
Voice Generation (Priority Order)
| Priority | Backend | Requirement |
|---|---|---|
| 1 | Kokoro (open-weight neural TTS) | pip install kokoro>=0.9.4 soundfile + espeak-ng + Python 3.10–3.12 |
| 2 | Piper TTS (local) | Binary on PATH |
| 3 | Coqui TTS (local) | pip install TTS |
| 4 | Edge TTS | Included in requirements.txt (always available fallback) |
Default Kokoro voice: bm_george (British Male) at KOKORO_SPEED=1.1.
Customising Voice
Set in .env — no code changes needed:
KOKORO_VOICE=bm_george # af_heart / am_echo / bf_emma / bm_george / am_liam ...
KOKORO_LANG_CODE=b # a=American EN, b=British EN, e=Spanish, f=French, h=Hindi, i=Italian, p=Portuguese
KOKORO_SPEED=1.1 # 0.5=slow · 1.1=default Shorts pacing · 1.5=fast
Voice and lang code must match — e.g.
bm_georgerequiresKOKORO_LANG_CODE=b. See HOW_TO_USE.md for the full voice list.
Scene Planning & Presets
Between script generation and rendering, a lightweight scene planning layer
converts each script into an explicit, inspectable ScenePlan. Each narrative
beat becomes its own scene with a role-aware image prompt and its own duration.
script → scene plan → image / render inputs → video assembly
Videos are rendered as clean static slideshows — no zoompan/motion, and no
on-screen text beyond the burned-in subtitles and the subscribe CTA in the
final seconds.
Scene roles
The five semantic parts of every script each become a scene with a distinct
visual treatment:
| Role | Intent |
|---|---|
hook |
Strong, immediate establishing frame |
context |
Explanatory, informative |
rehook |
Mid-story curiosity reset |
twist |
Heightened contrast / drama |
ending |
Clean closing frame with negative space for CTA |
Each scene carries: role, text, duration, image_prompt, andvisual_hints. Plans are saved to output/<slug>/scene_plans/<idx>.json
and can be hand-edited between runs (the video step reads them back).
Presets (--preset)
Presets bundle per-role prompt-style tokens and duration weights.
| Preset | Feel |
|---|---|
documentary_clean |
Archival, restrained palette (default) |
dramatic_history |
Chiaroscuro, bold contrast, cinematic shadows |
viral_fact_card |
Saturated, TikTok-style punchy grading |
python orchestrator.py --auto --preset dramatic_history
python orchestrator.py --topic "Strange Moments" --keyword war --preset viral_fact_card
Omitting --preset uses config.DEFAULT_SCENE_PRESET (defaults todocumentary_clean). All existing CLI flags continue to work unchanged.
Extending
- New preset — add a
Presettopipeline/presets.pyand register it inPRESETS. It's picked up by--presetautomatically.
Output Format
- Resolution: 1080×1920 (9:16 vertical)
- Codec: H.264, AAC 128kbps, 24fps
- Duration: 20–30 seconds
- Subtitles: Burned in, white bold text, semi-transparent background box, positioned above YouTube Shorts UI. Whisper captions use shorter 3-word cards for faster pacing, while the estimation fallback keeps larger cards for smoother reading.
- CTA Overlay: "Follow @ThatActuallyHappened11" — white text, top-center, appears in last 3 seconds
- Thumbnail: auto-selected by YouTube from a video frame
Output Structure
output/
<slug>/
events.json step 1 — discovered event
scripts.json step 2 — script + SEO metadata + hook_type + scene_visuals + localizations (es/pt/hi/id)
scene_plans/ step 2.5 — per-event ScenePlan JSON (role-aware scenes + prompts)
images/ step 3 — one PNG per scene + img_N.txt sidecar with the exact prompt sent to the backend
audio/ step 4 — narration WAV
subtitles/ step 5a — .ass + .srt caption files
video/ step 5b — assembled MP4
uploads.json step 6 — video IDs + URLs
state.json per-step completion ledger
topics_queue.json topic queue with virality scores (local-only, git-ignored)
video_registry.json persistent record of all uploaded videos (local-only, git-ignored)
logs/ daily rotating logs (14-day retention)
assets/music/ drop royalty-free .mp3 files here for background music
growth/ marketing strategy guides (Reddit strategy, etc.)
Built With Claude Code
This project was built entirely using Claude Code:
- Agents — Plan agents for architecture design, Explore agents for codebase analysis, and general-purpose agents for parallelising research across multiple files simultaneously
- Multi-session memory — persistent
MEMORY.mdtracking architecture decisions, tier completion status, and implementation patterns across all development sessions
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found