# firered-director

AI agent for natural language video editing and directing

## motivation

Video editing is tedious. You know what you want the final video to look like, but getting there means clicking through timelines, adjusting clips, and fighting with keyframes. This project lets you describe edits in plain English and have an AI agent figure out the implementation details. Think "cut to the wide shot_v2 when she starts laughing" instead of manually scrubbing through footage.

## architecture

```mermaid
graph TD
    A[Natural Language Input] --> B[Intent Parser]
    B --> C[Scene Understanding]
    C --> D[Action Planner]
    D --> E[Video Editor Backend]
    E --> F[Output Video]
    
    G[Video Assets] --> C
    H[Project State] --> D
    E --> H
    
    style B fill:#f9f,stroke:#333
    style D fill:#bbf,stroke:#333

getting started

install

pip install firered-director

quickstart

from firered_director import Director, Project

# Initialize project with video files
project = Project.from_files([
    "interview_wide.mp4",
    "interview_closeup.mp4",
    "broll_office.mp4"
])

# Create director agent
director = Director(project)

# Natural language editing
director.edit("Start with the wide shot_v2 for 3 seconds")
director.edit("Cut to closeup when the speaker mentions 'innovation'")
director.edit("Add b-roll of the office during the transition")
director.edit("Fade to black at the end")

# Export final video
project.export("output.mp4")

how it works

The system parses natural language commands into structured editing intents using an LLM. It analyzes video content (speech, visual features, scene changes) to understand what's available. The action planner converts high-level intents into specific editing operations: cuts, transitions, effects. Finally, it executes those operations through FFmpeg and other video processing tools.

The agent maintains state across multiple commands, so you can iteratively refine edits. It handles temporal references ("after that", "when she laughs") by building a semantic timeline of the project.

configuration

Set your preferred LLM backend and video processing options:

director = Director(
    project,
    llm_provider="openai",  # or "anthropic", "local"
    model="gpt-4",
    video_backend="ffmpeg",  # or "moviepy"
    analysis_cache=".cache"  # cache video analysis results
)

Environment variables:

OPENAI_API_KEY: OpenAI API key
ANTHROPIC_API_KEY: Anthropic API key
FIRERED_CACHE_DIR: Directory for analysis cache

faq

What video formats are supported?
Anything FFmpeg can read: MP4, MOV, AVI, MKV, etc.

How accurate is the natural language understanding?
It works well for common editing operations. Complex creative directions may require multiple iterations or more specific instructions.

Can it automatically find good moments to cut?
Yes, use commands like "find an interesting moment in the first 30 seconds" or "cut on action". Results vary based on video content.

Does it work offline?
Video processing is local, but you need an LLM API (or local model) for intent parsing.

What about audio editing?
Basic audio operations (volume, fade, sync) are supported. Music selection and complex audio mixing are not yet implemented.

license

MIT