Pilipili-AutoVideo
๐ฌ ๅ จ่ชๅจ AI ่ง้ขไปฃ็ ยท ไธๅฅ่ฏ็ๆๅธฆๅญๅนๆ็ ยท Fully Automated AI Video Agent ยท Local Deployment
๐ฌ Pilipili-AutoVideo ยท ๅผๅฉๅผๅฉ
Fully Automated AI Video Agent ยท Local Deployment ยท One Sentence to Final Cut
็ฎไฝไธญๆ ยท English ยท ็น้ซไธญๆ ยท ๆฅๆฌ่ช ยท ํ๊ตญ์ด
๐น Demo โ Replace this line with a GIF or video recording of the full workflow: topic input โ scene review โ final video output.
docs/demo.gif(to be recorded โ see Contributing)
๐ Overview
Pilipili-AutoVideo (ๅผๅฉๅผๅฉ) is a fully local, end-to-end AI video agent. Describe your video in one sentence โ the system automatically handles script planning โ keyframe image generation โ TTS voiceover โ video clip generation โ FFmpeg assembly โ subtitle burning, delivering a complete MP4 with subtitles and a CapCut/JianYing draft project for final human touch-ups.
Key differentiators from similar tools (LibTV, Huobao Drama):
- Absolute Audio-Video Sync: TTS voiceover is generated first and its exact millisecond duration is measured, then used to control video
durationโ audio and video are always perfectly aligned - Keyframe Lock Strategy: Nano Banana generates a 4K keyframe image first, then Image-to-Video (I2V) produces the clip โ ensuring consistently high visual quality with no subject drift
- Digital Twin Memory: Mem0-powered memory system learns your style preferences over time, injecting your creative habits into every new generation
- Skill Integration: The entire workflow is packaged as a standard Skill, callable by any AI Agent
๐ฏ Core Features
- ๐ค Natural Language Driven: One sentence โ full video, no manual node operations required
- ๐จ Premium Visual Quality: Nano Banana keyframe lock + Kling 3.0 / Seedance 1.5 dual-engine, exceptional subject consistency
- ๐ Perfect Audio-Video Sync: Measure voiceover duration first, control video duration accordingly โ never misaligned
- โ๏ธ CapCut/JianYing Draft Export: AI handles 90%, you fine-tune the last 10% in CapCut
- ๐ง Gets Smarter Over Time: Mem0 memory system learns your aesthetic preferences with every project
- ๐ Agent-Callable: Packaged as a standard Skill, seamlessly integrates into larger automation workflows
๐ ๏ธ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Pilipili-AutoVideo Architecture โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Frontend React 19 + TailwindCSS ยท 3-panel Studio ยท WS โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ API Layer FastAPI ยท WebSocket ยท REST ยท LangGraph Workflow โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโค
โ Brain Layer โ Vision Layerโ Motion Layerโ Voice Layer โ
โ DeepSeek โ Nano Banana โ Kling 3.0 โ MiniMax TTS โ
โ Kimi โ (Gemini 3 โ Seedance โ Speech 2.8 HD โ
โ MiniMax LLM โ Pro Image) โ 1.5 Pro โ โ
โ Gemini โ โ โ โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโค
โ Assembly Python + FFmpeg ยท xfade transitions ยท WhisperX โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Draft Layer pyJianYingDraft ยท Auto CapCut/JianYing Draft โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Memory Mem0 ยท Local SQLite ยท Style Preference Twin โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Layer | Technology | Description |
|---|---|---|
| Brain (LLM) | DeepSeek / Kimi / MiniMax / Gemini | Script generation, scene breakdown, metadata |
| Vision (Image) | Nano Banana (Gemini 3 Pro Image) | 4K keyframe lock, subject consistency foundation |
| Motion (Video) | Kling 3.0 / Seedance 1.5 Pro | Dual-engine smart routing, I2V generation |
| Voice (TTS) | MiniMax Speech 2.8 HD | Best-in-class Chinese TTS, voice cloning support |
| Assembly | Python + FFmpeg + WhisperX | xfade transitions + subtitle burning + audio mix |
| Draft | pyJianYingDraft | Auto-generate CapCut/JianYing draft projects |
| Memory | Mem0 (local SQLite / cloud sync) | Style preference digital twin |
| Backend | Python 3.10+ + FastAPI + LangGraph | Async workflow orchestration, WebSocket push |
| Frontend | React 19 + TailwindCSS + Wouter | 3-panel studio, no mock data |
๐ Quick Start
๐ Requirements
| Software | Version | Notes |
|---|---|---|
| Python | 3.10+ | Backend runtime |
| Node.js | 18+ | Frontend build |
| FFmpeg | 4.0+ | Video assembly (required) |
| Docker | 20.0+ | Container deployment (optional) |
Install FFmpeg
macOS:
brew install ffmpeg
Ubuntu / Debian:
sudo apt update && sudo apt install ffmpeg
Windows: Download from ffmpeg.org and add to PATH. Verify:
ffmpeg -version
Clone & Install
# 1. Clone the repository
git clone https://github.com/OpenDemon/Pilipili-AutoVideo.git
cd Pilipili-AutoVideo
# 2. Install Python dependencies
pip install -r requirements.txt
# 3. Copy config template
cp configs/config.example.yaml configs/config.yaml
Configure API Keys
Edit configs/config.yaml:
llm:
provider: deepseek # deepseek | kimi | minimax | gemini
api_key: "sk-xxxx"
image_gen:
provider: nano_banana
api_key: "AIzaSy-xxxx" # Google AI Studio Key
video_gen:
default_engine: kling # kling | seedance | auto
kling:
api_key: "xxxx"
api_secret: "xxxx"
seedance:
api_key: "xxxx"
tts:
provider: minimax
api_key: "xxxx"
group_id: "xxxx"
memory:
provider: local # local | mem0_cloud
# mem0_api_key: "m0-xxxx" # Fill in for cloud sync
๐ก You can also configure API keys visually at
http://localhost:3000/settingsโ no YAML editing required.
Option 1: CLI (Recommended for debugging)
# Basic usage
python cli/main.py run --topic "Cyberpunk Mars colony, 60 seconds, cold color palette"
# Specify engine
python cli/main.py run \
--topic "Ancient palace romance story" \
--engine seedance \
--duration 90 \
--add-subtitles
# List past projects
python cli/main.py list
# Help
python cli/main.py --help
Option 2: Web UI (Recommended for daily use)
# Start backend
python cli/main.py server
# In another terminal, start frontend
cd frontend
pnpm install && pnpm dev
# Visit http://localhost:3000
Option 3: Docker Compose (Recommended for production)
# Copy environment variables
cp .env.example .env
# Edit .env with your API keys
# Start all services
docker-compose up -d
# Visit http://localhost:3000
๐ฆ Project Structure
Pilipili-AutoVideo/
โโโ api/
โ โโโ server.py # FastAPI backend + WebSocket
โโโ cli/
โ โโโ main.py # Click CLI entrypoint
โโโ core/
โ โโโ config.py # Global config (Pydantic Settings)
โโโ modules/
โ โโโ llm.py # LLM script generation (multi-provider)
โ โโโ image_gen.py # Nano Banana keyframe generation
โ โโโ tts.py # MiniMax TTS + duration measurement
โ โโโ video_gen.py # Kling 3.0 / Seedance 1.5 I2V
โ โโโ assembler.py # FFmpeg assembly + subtitle burning
โ โโโ jianying_draft.py # CapCut/JianYing draft generation
โ โโโ memory.py # Mem0 memory system
โโโ frontend/ # React 19 frontend (3-panel studio)
โโโ skills/
โ โโโ SKILL.md # Skill packaging spec
โโโ configs/
โ โโโ config.example.yaml # Config template
โ โโโ config.yaml # Local config (gitignored)
โโโ tests/
โ โโโ test_pipeline.py # Unit tests (18 test cases)
โโโ data/
โ โโโ outputs/ # Generated videos and drafts
โ โโโ memory/ # Memory database
โโโ docker-compose.yml
โโโ Dockerfile.backend
โโโ requirements.txt
โโโ pyproject.toml
๐ฌ Workflow Deep Dive
The core workflow is orchestrated by LangGraph in the following stages:
User Input
โ
โผ
โ Script Generation (LLM)
โ DeepSeek/Kimi expands one sentence into a structured storyboard
โ Each scene: voiceover text, visual description, motion description,
โ duration, transition, camera motion
โ
โผ
โก Scene Review (optional human step)
โ Web UI shows scene list; user can edit each scene before confirming
โ CLI mode: auto-approved
โ
โผ
โข Parallel Generation (Keyframe Images + TTS Voiceover)
โ Nano Banana generates 4K keyframe images for each scene in parallel
โ MiniMax TTS generates voiceover for each scene, measuring exact ms duration
โ
โผ
โฃ Video Generation (Image-to-Video)
โ Uses keyframe as first frame, voiceover duration as video duration
โ Kling 3.0 (action/product) or Seedance 1.5 (narrative/multi-character)
โ
โผ
โค Assembly (FFmpeg)
โ xfade transitions + background music mixing + WhisperX subtitle burning
โ
โผ
โฅ Draft Export (CapCut/JianYing)
โ Auto-generates draft project preserving all scene assets and timeline
โ
โผ
โฆ Memory Update (Mem0)
After user rating, system learns style preferences for future generations
๐ Comparison
| Dimension | LibTV | Huobao Drama | Pilipili |
|---|---|---|---|
| Interaction | Node canvas, manual trigger | Form-based, step-by-step | Natural language, one sentence |
| Audio-Video Sync | Manual editing | Not explicitly supported | Measure TTS duration โ control video duration |
| Subject Consistency | Prompt guidance | Reference image upload | Nano Banana keyframe lock + Kling Reference API |
| Final Delivery | Manual import to CapCut | MP4 export | Auto CapCut draft + MP4 dual output |
| Memory System | None | None | Mem0 digital twin, learns your style |
| Agent Integration | None | None | Standard Skill, callable by any Agent |
| Deployment | Cloud SaaS | Cloud SaaS | Local deployment, full data ownership |
๐งช Testing
# Run all unit tests (no API keys required)
python -m pytest tests/test_pipeline.py -v -m "not api and not e2e"
# Run API integration tests (real API keys required)
python -m pytest tests/test_pipeline.py -v -m "api"
# Run full E2E tests
python -m pytest tests/test_pipeline.py -v -m "e2e"
Current test coverage: 18 unit tests, all passing.
๐ Skill Integration
Pilipili-AutoVideo is packaged as a standard Skill, callable by any AI Agent:
# In an Agent session
Please generate a 60-second science explainer video about "The History of AI Chips",
blue-purple tech aesthetic.
The Agent will automatically read skills/SKILL.md and invoke Pilipili to complete the entire workflow.
๐ FAQ
Q: FFmpeg not found?
A: Ensure FFmpeg is installed and in your PATH. Run ffmpeg -version to verify.
Q: Video generation is slow โ is that normal?
A: Video generation relies on cloud APIs (Kling/Seedance), typically 2-5 minutes per scene. This is an API-side constraint, not a local performance issue.
Q: How do I switch LLM providers?
A: Edit llm.provider in configs/config.yaml, or use the Settings page in the Web UI.
Q: Where is the CapCut/JianYing draft?
A: After generation, the draft project is at data/outputs/{project_id}/draft/. Copy the entire folder to CapCut's draft directory to open it.
Q: What aspect ratios are supported?
A: 9:16 (portrait, TikTok/Reels), 16:9 (landscape, YouTube), 1:1 (square, Instagram).
๐ค Contributing
Issues and Pull Requests are welcome!
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'feat: add amazing feature' - Push the branch:
git push origin feature/amazing-feature - Open a Pull Request
๐ License
This project is licensed under the MIT License.
Pilipili-AutoVideo ยท ๅผๅฉๅผๅฉ ยท Local Deployment ยท Fully Automated AI Video Agent
If this project helps you, please give it a โญ Star!
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found