Bumblebee

agent
Security Audit
Warn
Health Pass
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 2 days ago
  • Community trust — 15 GitHub stars
Code Warn
  • network request — Outbound network request in src/playphrase_search.py
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose
This agent turns any text into a video montage of movie and TV clips. It works by greedily splitting a phrase into fragments, downloading matching video clips, and splicing them together locally.

Security Assessment
Overall risk: Medium. The tool claims to be 100% local with no API keys required, and it does not request dangerous system permissions. However, it does make outbound network requests to third-party services (yarn.co and playphrase.me) to fetch video clips. A notable security consideration is its use of a custom network library (curl_cffi) specifically designed to bypass Cloudflare protections on these sites. While no hardcoded secrets were found and no sensitive personal data is accessed, the automated scraping and anti-bot bypass behavior could potentially violate the terms of service of the targeted platforms.

Quality Assessment
The project is active and reasonably maintained, with its last push occurring just two days ago. It uses the standard, permissive MIT license and has a clear description. Community trust is currently low but present, with 15 GitHub stars. The tool relies on a local installation of faster-whisper for on-device speech recognition, which is a secure approach to audio processing.

Verdict
Use with caution: the tool itself is safe, but developers should be aware that it relies on automated web scraping and anti-bot bypass techniques to source video clips.
SUMMARY

Turns any text into a video montage of movie clips featuring the specified text

README.md

Bumblebee

Available on skills.sh
License: MIT
Python 3.9+

Surgical phrase splicing from real movies and TV shows via yarn.co.

Give it any long phrase — it greedily cuts the line into the longest possible runs of words that were actually spoken on screen, then assembles a final clip from those pieces. Classic fragmovie genre, automated to the millisecond.

100% local, no API keys, no cloud calls. Speech recognition runs on-device with faster-whisper.

Demo

"Sentient is the best company in the world."

Four variants of the same phrase, each spliced word-by-word from real movies
and TV shows. Every clip pulled from yarn.co + playphrase.me and normalised
to a single 854x480 / 48 kHz container so the concat is seamless.

https://github.com/user-attachments/assets/aaac7153-b528-4906-828e-984418a5f2c9

https://github.com/user-attachments/assets/cd7296c2-028d-43eb-b81a-1458bb7afb10

https://github.com/user-attachments/assets/d9fcd4b9-a751-4ed7-b60a-60c9367c7cc8

https://github.com/user-attachments/assets/23299a9b-47f7-4f91-8417-abd410d4ee8a

Install as an Agent Skill

npx skills add solyanviktor-star/Bumblebee

This installs Bumblebee into your agent's skills directory (Claude Code, Cursor, GitHub Copilot, and other compatible agents). The agent will automatically activate it on prompts like "splice a fragmovie of <phrase>" or "build a video where actors say <phrase>".

How it works

"I don't get it, why does my Claude keep getting banned. I'm sick of buying new accounts."
        |
        v   split on .!? -> each sentence handled independently
        |
[ greedy splitter — chunks of up to 6 words ]
        |
   "I don't get it why"          --+
   "does"                        --+   for each chunk:
   "Claude"                      --+     getyarn.io -> 8 candidates  (curl_cffi, bypasses CF)
   "keep getting"                --+     download mp4
   "I'm sick of"                 --+     faster-whisper word-timestamps (local, no API)
   "new accounts"                --+     word_matcher: exact match
   ...                              |    FFmpeg cut to the millisecond
                                    v
                          concat into output/final.mp4
                          (with short audio fades at every splice +
                          a ~180ms breathing pause between sentences)

Words that nobody ever said in any clip are skipped.

Mix mode: multiple takes on one phrase

python bumblebee.py "Sentient is the best company" --variants 4 -o sentient.mp4

Generates 4 files (sentient_v1.mp4, _v2, _v3, _v4) where every variant avoids clips already used by previous ones. You get different cuts with different actors, different movies, sometimes even different segmentation of the same phrase.

Install

git clone https://github.com/solyanviktor-star/Bumblebee.git
cd Bumblebee
pip install -r requirements.txt

You also need FFmpeg on PATH (or set FFMPEG_BIN to its path).

Strongly recommended: playwright + Chromium

Bumblebee uses playphrase.me as an automatic
secondary source whenever yarn.co fails to cover a chunk. playphrase has
10x-1000x more clips per phrase, so installing it dramatically improves
coverage on rare words and longer phrases. Without it, those words get
skipped and Bumblebee asks the orchestrator to substitute a synonym.

pip install playwright
playwright install chromium    # one-time, ~120 MB

The Chromium bootstrap (~10-15s) runs lazily — only on the first yarn miss
of a given run, never if yarn covers the whole phrase. If you skip this
step, Bumblebee still works in yarn-only mode; you can also force-disable
playphrase per run with --no-playphrase.

That's it. No API keys, no .env, nothing else to configure. The first run
downloads the Whisper model (~244 MB for small.en) into the HuggingFace
cache; every run after that is fully offline.

Requirements

  • Python 3.9+
  • FFmpeg
  • ~250 MB free disk space for the speech model
  • Recommended: playwright + Chromium (~120 MB) for the playphrase fallback

No GPU required. If you have a CUDA GPU, set WHISPER_DEVICE=cuda for a roughly 5x speedup on transcription.

Usage

# One video from one phrase
python bumblebee.py "I am your father"

# Several phrases — each is processed and they're concatenated in order
python bumblebee.py "I am your father" "Houston we have a problem"

# 5 different cuts of the same phrase, no clip reuse
python bumblebee.py "Sentient is the best" -o sentient.mp4 --variants 5

The final file lands in output/<name>.mp4.

Second source: playphrase.me (automatic)

yarn.co's public HTML is hard-capped at 20 unique clips per phrase. When yarn
fails to cover a chunk, Bumblebee automatically falls back to
playphrase.me, which often has 10x-1000x more
matches (73,000 clips for "open" vs yarn's 20). Its API delivers
word-timestamps natively, so playphrase clips skip the faster-whisper step
entirely.

The fallback is lazy: the headless Chromium bootstrap (~10-15s, one-time
per run) only runs if yarn actually misses. Phrases that yarn covers fully
never touch playwright. See the Install section for the one-time
playwright + Chromium setup; once installed, every run uses playphrase as
needed. Pass --no-playphrase to force-disable it for a single run.

Optional environment variables

Variable Default Purpose
WHISPER_MODEL small.en Model name. Use base.en for speed, medium.en for accuracy.
WHISPER_DEVICE cpu Set to cuda if you have an NVIDIA GPU.
WHISPER_COMPUTE_TYPE int8 (cpu) / float16 (cuda) Inference quantization.
FFMPEG_BIN ffmpeg Path to ffmpeg binary if not on PATH.

Project layout

Bumblebee/
|- bumblebee.py             <- CLI entry point
|- SKILL.md                 <- Claude Code skill manifest
|- src/
|  |- phrase_splitter.py    <- greedy longest-match with optional shuffling/exclusion
|  |- yarn_search.py        <- phrase -> clip_ids (curl_cffi, bypasses Cloudflare)
|  |- downloader.py         <- clip_id -> local mp4 (curl_cffi, bypasses CF on y.yarn.co)
|  |- transcriber.py        <- faster-whisper word-timestamps + cache
|  |- word_matcher.py       <- exact start/end of target words with apostrophe-fuzz
|  |- cutter.py             <- FFmpeg cut + audio fade at splice points
|  |- concat.py             <- concat demuxer
|- cache/                   <- downloaded clips and transcripts (reused across runs)
|- output/                  <- final reels and intermediate parts in _parts/

Known limitations

  • yarn.co indexes English-language media only.
  • Whisper sometimes transcribes short tokens like "I", "a", "my" as part of a longer word, so single short words tend to get skipped.
  • Word order is strict: "can we" and "we can" are different matches (a swap-fuzzy is on the TODO list).
  • yarn.co sits behind Cloudflare. Solved with curl_cffi and impersonate='chrome' (which replays a real Chrome TLS fingerprint).

License

MIT — see LICENSE.

Built end-to-end with Claude Code.

Reviews (0)

No results found