mixpanel_data
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 6 GitHub stars
Code Fail
- rm -rf — Recursive force deletion command in .devcontainer/init-host.sh
- rm -rf — Recursive force deletion command in .devcontainer/post-create.sh
- rm -rf — Recursive force deletion command in .devcontainer/post-start.sh
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
Complete programmable interface to Mixpanel analytics. Python library and CLI for discovery, querying, and data extraction. Designed with coding agent use in mind.
mixpanel_data
⚠️ Pre-release Software: This package is under active development and not yet published to PyPI. APIs may change between versions.
A complete programmable interface to Mixpanel analytics—Python library and CLI for discovery, querying, and data extraction.
Why mixpanel_data?
Mixpanel's web UI is powerful for interactive exploration, but programmatic access requires navigating multiple REST endpoints with different conventions. mixpanel_data provides a unified interface: discover your schema, run analytics queries, and extract data—all through consistent Python methods or CLI commands.
Core analytics—segmentation, funnels, retention, saved reports—plus entity management (dashboards, reports, cohorts), raw JQL execution, and local SQL analysis via DuckDB.
Installation
Install directly from GitHub (package not yet published to PyPI):
pip install git+https://github.com/jaredmcfarland/mixpanel_data.git
Requires Python 3.10+. Verify installation:
mp --version
Quick Start
1. Authenticate
Option A: OAuth Login (interactive, recommended)
mp auth login --region us --project-id 12345 # Opens browser
mp auth status # Verify connection
Option B: Service Account (scripts, CI/CD)
# Interactive prompt (secure)
mp auth add production --username sa_xxx --project 12345 --region us
# You'll be prompted for the service account secret with hidden input
mp auth test # Verify connection
Alternative methods for CI/CD:
# Via inline environment variable (secret is only exposed to this command)
MP_SECRET=xxx mp auth add production --username sa_xxx --project 12345
# Via stdin (useful when secret is already in a variable)
echo "$SECRET" | mp auth add production --username sa_xxx --project 12345 --secret-stdin
Or set all credentials as environment variables: MP_USERNAME, MP_SECRET, MP_PROJECT_ID, MP_REGION
2. Explore Your Data
mp inspect events # List all events
mp inspect properties --event Purchase # Properties for an event
mp inspect funnels # Saved funnels
3. Fetch Events to Local Storage
# Sequential fetch for small date ranges
mp fetch events jan --from 2025-01-01 --to 2025-01-31
# Parallel fetch for large date ranges (up to 10x faster)
mp fetch events q1 --from 2025-01-01 --to 2025-03-31 --parallel
4. Query with SQL
mp query sql "SELECT event_name, COUNT(*) FROM jan GROUP BY 1 ORDER BY 2 DESC" --format table
5. Run Live Analytics
mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 --on country
6. Or Stream Directly (No Storage)
# Stream events as JSONL for piping to other tools
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout | jq '.event_name'
Python API
import mixpanel_data as mp
ws = mp.Workspace()
# Discover what's in your project
events = ws.list_events()
props = ws.list_properties("Purchase")
funnels = ws.list_funnels()
cohorts = ws.list_cohorts()
# Run live analytics queries
result = ws.segmentation(
event=events[0].name,
from_date="2025-01-01",
to_date="2025-01-31",
on="country"
)
print(result.df) # pandas DataFrame
# Query a saved funnel
funnel = ws.funnel(
funnel_id=funnels[0].id,
from_date="2025-01-01",
to_date="2025-01-31"
)
# Manage entities
dashboards = ws.list_dashboards()
cohort = ws.create_cohort(mp.CreateCohortParams(name="Power Users"))
# Fetch events into local DuckDB for SQL analysis
ws.fetch_events("jan", from_date="2025-01-01", to_date="2025-01-31")
# Use parallel=True for large date ranges (up to 10x faster)
ws.fetch_events("q1", from_date="2025-01-01", to_date="2025-03-31", parallel=True)
# Fetch profiles (use parallel=True for large datasets, up to 5x faster)
ws.fetch_profiles("users", parallel=True)
df = ws.sql("""
SELECT
DATE_TRUNC('day', event_time) as day,
event_name,
COUNT(*) as count
FROM jan
GROUP BY 1, 2
ORDER BY 1, 3 DESC
""")
Temporary Workspaces
For one-off analysis without persisting data:
# Ephemeral: temp file with compression (best for large datasets)
with mp.Workspace.ephemeral() as ws:
ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-31")
total = ws.sql_scalar("SELECT COUNT(*) FROM events")
# Database automatically deleted
# In-memory: zero disk footprint (best for small datasets, testing)
with mp.Workspace.memory() as ws:
ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-07")
total = ws.sql_scalar("SELECT COUNT(*) FROM events")
# No files ever created
Streaming
For ETL pipelines or one-time processing without storage:
# Stream events directly to external system
for event in ws.stream_events(from_date="2025-01-01", to_date="2025-01-31"):
send_to_warehouse(event)
CLI Reference
mp auth — Authentication: login, logout, status, token (OAuth); list, add, remove, switch, show, test (service accounts)
mp fetch — Extract data: events, profiles (add --parallel for up to 10x faster event exports or 5x faster profile exports, --stdout to stream as JSONL)
mp query — Run analytics: sql, segmentation, funnel, retention, jql, saved-report, flows, and 7 more
mp dashboards — Dashboard management: list, create, get, update, delete, favorite, pin, blueprints, and more
mp reports — Report management: list, create, get, update, delete, bulk operations, history
mp cohorts — Cohort management: list, create, get, update, delete, bulk operations
mp inspect — Discover schema: events, properties, funnels, cohorts, bookmarks; local DB: tables, schema, drop, and 5 more
All commands support --format (json, jsonl, table, csv, plain) and --help.
Filtering with --jq
Commands that output JSON support --jq for client-side filtering:
# Get first 5 events
mp inspect events --format json --jq '.[:5]'
# Extract total from segmentation
mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 \
--format json --jq '.total'
# Filter SQL results
mp query sql "SELECT * FROM events LIMIT 100" --format json \
--jq '.[] | select(.event_name == "Purchase")'
See CLI Reference for complete documentation.
DuckDB JSON Queries
Mixpanel properties are stored as JSON columns:
-- Extract property
SELECT properties->>'$.country' as country FROM events
-- Filter on property
SELECT * FROM events WHERE properties->>'$.plan' = 'premium'
-- Cast numeric
SELECT SUM(CAST(properties->>'$.amount' AS DECIMAL)) FROM events
Documentation
Full documentation: jaredmcfarland.github.io/mixpanel_data
For Humans and Agents
The entire surface area is self-documenting. Every CLI command supports --help with complete argument descriptions. The Python API uses typed dataclasses for all return values—IDEs show you what fields are available. Exceptions include error codes and context for programmatic handling. This means both human developers and AI coding agents can explore capabilities without external documentation.
Key design features:
- Entity CRUD: Full lifecycle management of dashboards, reports, and cohorts via Mixpanel App API
- Discoverable schema:
list_events(),list_properties(),list_funnels(),list_cohorts(),list_bookmarks()reveal what's in your project before you query - Consistent interfaces: Same operations available as Python methods and CLI commands
- Structured output: All CLI commands support
--format jsonfor machine-readable responses, plus--jqfor inline filtering - Parallel fetching: Up to 10x faster event exports for large date ranges, 5x faster profile exports via
--parallelorparallel=True - Local SQL iteration: Fetch once, query repeatedly—no re-fetching needed
- Dual authentication: Service accounts (Basic Auth) for automation, OAuth 2.0 PKCE for interactive use
- Typed exceptions: Error codes and context for programmatic handling
Claude Code Plugin
This project also includes a Claude Code plugin that brings analytics workflows directly into conversational AI interactions.
Ask questions about your Mixpanel data in natural language and get guided, interactive analytics workflows—all within Claude Code.
Installation:
/plugin marketplace add jaredmcfarland/mixpanel_data
/plugin install mixpanel-data
Then restart Claude Code.
What you get:
- Auto-discovery skill:
mixpanel-dataskill activates when you mention Mixpanel, analytics, funnels, or retention—loads comprehensive reference docs and guides your workflow - 7 interactive commands:
/mp-auth- Secure credential management with account switching/mp-inspect- 12-operation schema explorer (events, properties, funnels, cohorts, tables)/mp-fetch- Guided data ingestion with validation/mp-query- Universal query builder (SQL, JQL, live analytics)/mp-funnel- Conversion analysis with visualizations/mp-retention- Retention curves and cohort analysis/mp-report- Comprehensive reporting with automated insights
- 4 specialist agents: Auto-invoked based on your questions
mixpanel-analyst- General analytics, SQL/JQL query buildingfunnel-optimizer- Conversion analysis and drop-off diagnosticsretention-specialist- Cohort behavior and retention curvesjql-expert- Advanced JavaScript queries and transformations
- Multiple query paths: SQL (DuckDB local analysis), JQL (complex transforms), or Mixpanel API (live analytics)
- Secure by design: Credentials managed outside conversation context
Learn more: Plugin Documentation
Contributing
See CONTRIBUTING.md for development setup and guidelines.
License
MIT
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found