MadMax
Health Warn
- No license — Repository has no license file
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 12 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
Real-time voice AI agent with persistent memory, built on Gemini Live API and Python
😎 MadMax Live Agent
A voice agent for devices, powered by Google Gemini Live API with a local wake-word detector and long-term memory.
Capabilities:
- 🗣️ Realtime Speech-to-Speech dialogue via Gemini Live API
- 🔍 Fresh information from the internet via Google Search (grounding)
- 💰 Budget-friendly: offline mode with local wake-word (Vosk) and auto-shutdown timer
- 🧠 Managed long-term memory based on JSON files: people, places, facts, goals, experience, episodes, reflections and persona
- 🛠️ Tool calling in live mode
Technical highlights:
- 🔒 Transactional memory isolation: backup + rollback on errors, single-use guard
- 🧪 LLM Surgeon: automatic memory conflict resolution via LLM
- 📝 Auto-save all sessions to daily markdown
- 🔄 Automatic recovery of missed sessions
- ⏱️ Graceful shutdown with configurable timeouts
- 📊 Latency diagnostics for all LLM calls in logs
In development:
- 📷 Photo and video stream processing
- 🧹 Smart long-term memory cleanup
- 🤖 Integration with ROS2 modules for robot control
- 🔧 Other integrations and improvements
🚀 Key Features
💬 Realtime Voice Loop
- Local wake-word detection via Vosk (no LLM costs)
- Instant transition to live mode on the wake word
- Speech-to-Speech dialogue with minimal latency
🧠 Post-Session Memory Pipeline
- Automatic extraction: facts, goals, experience, episodes, reflections and persona after every session
- LLM Surgeon: memory conflict resolution (UPDATE / MERGE / APPEND / IGNORE) via a separate LLM call
- Rebuild of
active_context.jsonfor upcoming dialogues - Automatic recovery of unprocessed sessions
🎭 Agent Persona (Max)
- Name, gender and communication style are set in
agent_instructions.md SOUL.md— philosophical persona manifesto: attitude toward the world, inclinations, shadow, meta-reflection, written by the agent itself after hours of testing conversations- Automatic extraction of reflections and persona traits from dialogues into
reflections.json
🏗️ Architecture
1️⃣ Sleep Mode
Agent is offline; microphone is monitored locally via Vosk. No LLM costs.
2️⃣ Active Session
After the wake word audio switches to Gemini Live API. Dialogue runs in realtime.
3️⃣ Post-Session Processing
After a session ends:
- Save transcript to
memory_engine/daily/YYYY-MM-DD.md - Call
process_missing_sessions(day_date)— process all unprocessed sessions of the day - Call
build_memory_context()— rebuild active context
💾 Memory Structure
📅 Daily Markdown Logs
Every session is saved to memory_engine/daily/YYYY-MM-DD.md with metadata:
session_idstarted_atandended_at(ISO 8601 with timezone offset)- Dialogue transcript
Source of truth for post-session processing.
🧩 Active Context
File memory_engine/active_context.json:
{
"last_context": "...",
"summary_yesterday": "...",
"summary_today": "...",
"reply_count_today": 0,
"summary_reply_count": 0,
"long_term_injections": []
}
Used as the working context for upcoming live sessions.
🗄️ Long-Memory Storage
Directory memory_engine/memory/:
people.json— information about peopleplaces.json— places and locationsfacts.json— facts and knowledgegoals.json— goals and tasksexperience.json— experience and skillsreflections.json— reflections and insightsepisodes.log.jsonl— episode chronologyprocessed_sessions.json— registry of processed sessions (prevents reprocessing the same session)
⚙️ Configuration
- Google AI Studio API Key in
.env - Key parameters in
config.py - Memory settings in
memory_config.py
📂 Project Structure
MadMax/
├── main.py # Entry point
├── config.py # Configuration
├── agent_instructions.md # Agent system prompt (Max)
├── SOUL.md # Agent persona and philosophy
├── core/
│ ├── orchestrator.py # Agent lifecycle (sleep / live / post-session)
│ ├── audio_io.py # Audio I/O and Vosk wake-word
│ ├── gemini_client.py # Gemini Live API client
│ ├── agent_tools.py # Memory tools for live mode
│ ├── session_transcript_logger.py # Session transcript persistence
│ ├── errors.py # Exceptions
│ └── state.py # Session state
├── memory_engine/
│ ├── active_context_builder.py # active_context.json builder
│ ├── long_memory_extractor_agent.py # Memory extraction from transcripts
│ ├── long_memory_apply.py # Memory operations + LLM Surgeon
│ ├── long_memory_normalize.py # Normalization and fuzzy matching
│ ├── long_memory_ops.py # Operation schemas and validation
│ ├── long_memory_query_service.py # Long-term memory search
│ ├── summarize_context_agent.py # Day summarization
│ ├── llm_client_utils.py # LLM timeout and diagnostics
│ ├── memory_config.py # Memory paths and constants
│ ├── entity_policies.py # Entity link policies
│ ├── time_policy.py # Timestamp policy
│ ├── daily/ # Daily markdown logs
│ └── memory/ # Long-memory JSON files + backups
└── live_api_docs/ # Gemini Live API documentation
🚀 Quick Start
# 1. Clone and enter directory
cd MadMax
# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Download wake-word model (~40 MB)
./setup.sh
# 5. Configure environment
# Create .env file (or export variables):
# GOOGLE_API_KEY=your_key_here
# 6. Run the agent
python main.py
Requirements:
- Linux
- Python 3.11+
- Microphone and speakers
- Google AI Studio API Key
🛠️ Roadmap
✅ Already implemented
- Function Calling for memory — live agent calls
memory_lookup_person,memory_lookup_goal,memory_lookup_experience,memory_recent_episodesduring dialogue - Google Search (grounding) — agent receives fresh information from the internet in realtime
- Transactional memory isolation — backup before write, rollback on errors, single-use guard for
apply_payload - LLM Surgeon — automatic memory conflict resolution via a separate LLM call with batching
- Fail-fast error handling — explicit logs on corrupted JSON, graceful
CancelledError, latency diagnostics
🎯 Planned
🧹 Smart long-term memory cleanup
Goal: Automatic removal of stale or irrelevant data from memory.
Planned logic:
- Fact prioritization — relevance score based on access frequency and freshness
- Old episode archival — move rarely used episodes to cold storage
- Automatic duplicate merging — find and merge similar facts/goals
- Temporary goal expiration — auto-complete or archive goals with expired deadlines
- Configurable retention rules — set data lifetime for different categories
Result: Memory stays relevant, does not grow uncontrollably, and is not cluttered with duplicates and outdated information.
🔧 Refactoring & Type Safety
- Pydantic for structured payloads instead of
dict[str, Any]
🏗️ Technical Debt
Consciously accepted trade-offs that are known and documented:
| Issue | Impact | Why we kept it |
|---|---|---|
Any instead of Pydantic for operation payloads |
No type safety, IDE does not suggest fields | It works, changing it requires rewriting 5+ modules |
Tight coupling: GeminiLiveClient imports AudioIO |
Hard to test, risk of circular dependency | No DI container, Protocols require refactoring |
| No CI/CD | No automatic type checking and tests | Project is developed locally, pytest is run manually |
🤖 Agentic Engineering
Important: A significant part of this project was written using Agentic Engineering in pair-programming mode.
📊 Current Status
The project consists of three stable loops:
- Live conversation loop — realtime Speech-to-Speech dialogue with the user (Google Search, tool calling, Vosk wake-word)
- Post-session memory loop — automatic extraction, deduplication and knowledge persistence (people, places, facts, goals, experience, episodes, reflections, persona)
- Reliability loop — transactional isolation (backup + rollback), graceful shutdown, LLM timeouts, recovery of missed sessions
The voice agent is ready for daily use as-is. The main constraints are architectural debt (see section above), not functional issues.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found