awesome-autoresearch

agent
SUMMARY

A curated list of autonomous improvement loops, research agents, and autoresearch-style systems inspired by Karpathy's autoresearch.

README.md

πŸ”¬ Awesome Autoresearch

A curated, high-signal index of autonomous improvement loops, research agents, and descendants inspired by karpathy/autoresearch.

Awesome
PRs Welcome
License: CC0-1.0

by Boring Dystopia Development

boringdystopia.ai Β  X @alvinunreal Β  Telegram Join channel

Contents

πŸ› οΈ General-purpose descendants

  • uditgoenka/autoresearch GitHub stars - Claude Code skill that generalizes autoresearch into a reusable loop for software, docs, security, shipping, debugging, and other measurable goals.
  • leo-lilinxiao/codex-autoresearch GitHub stars - Codex-native autoresearch skill with resume support, lessons across runs, optional parallel experiments, and mode-specific workflows.
  • supratikpm/gemini-autoresearch GitHub stars - Gemini CLI skill that generalises autoresearch to any measurable goal. Gemini-native: uses Google Search grounding as a live verification source inside the loop, true headless overnight mode via --yolo --prompt, and 1M token context. Also works in Antigravity IDE via .agents/skills/.
  • davebcn87/pi-autoresearch GitHub stars - pi extension plus dashboard for persistent experiment loops, live metrics, confidence tracking, and resumable autoresearch sessions.
  • drivelineresearch/autoresearch-claude-code GitHub stars - Claude Code plugin/skill port of pi-autoresearch, with a clean experiment-loop workflow and a concrete biomechanics case study.
  • greyhaven-ai/autocontext GitHub stars - Closed-loop control plane for repeated agent improvement, with evaluation, persistent knowledge, staged validation, and optional distillation into cheaper local runtimes.
  • jmilinovich/goal-md GitHub stars - Generalizes autoresearch into a GOAL.md pattern for repos where the agent must first construct a measurable fitness function before it can optimize.
  • mutable-state-inc/autoresearch-at-home GitHub stars - Collaborative fork of upstream autoresearch that adds experiment claiming, shared best-config syncing, hypothesis exchange, and swarm-style coordination across many single-GPU agents.
  • zkarimi22/autoresearch-anything GitHub stars - Generalizes autoresearch to any measurable metric β€” system prompts, API performance, landing pages, test suites, config tuning, SQL queries. "If you can measure it, you can optimize it."
  • Entrpi/autoresearch-everywhere GitHub stars - Cross-platform expansion that auto-detects hardware config and starts the loop. The "glue and generalization" half of autoresearch.
  • ShengranHu/ADAS GitHub stars - Automated Design of Agentic Systems β€” ICLR 2025. Meta-agents that invent novel agent architectures by programming them in code.
  • MaximeRobeyns/self_improving_coding_agent GitHub stars - SICA: Self-Improving Coding Agent that edits its own codebase. ICLR 2025 Workshop paper demonstrating scaffold-level self-improvement on coding benchmarks.
  • peterskoett/self-improving-agent GitHub stars - Alternative self-improving agent architecture with reflection and meta-learning cycles.
  • metauto-ai/HGM GitHub stars - Huxley-GΓΆdel Machine for coding agents β€” applies self-improvement to SWE-bench performance via meta-level optimization.
  • gepa-ai/gepa GitHub stars - GEPA (Genetic-Pareto) β€” ICLR 2026 Oral. Reflective prompt evolution that outperforms RL (GRPO) on benchmarks. Optimizes any textual parameters against any metric using natural language reflection.
  • MrTsepa/autoevolve GitHub stars - GEPA-inspired autoresearch for self-play: mutate code strategies, evaluate head-to-head, rate with Elo/Bradley-Terry, branch from the Pareto front. Agent reads match traces to target mutations. Works as a Claude Code skill.
  • HKUDS/ClawTeam GitHub stars - Agent swarm intelligence for autoresearch β€” spawns parallel GPU research directions, distributes work across agents, aggregates results.
  • Orchestra-Research/AI-Research-SKILLs GitHub stars - Comprehensive skill library including autoresearch orchestration with two-loop architecture (inner optimization + outer synthesis).
  • WecoAI/aideml GitHub stars - AIDE: Tree-search ML engineering agent that autonomously improves model performance via iterative code generation and evaluation.
  • weco.ai - Weco: Cloud platform for AIDE with observability, experiment tracking, and managed runs β€” brings the autoresearch loop into production.

πŸ”¬ Research-agent systems

  • aiming-lab/AutoResearchClaw GitHub stars - End-to-end research pipeline that turns a topic into literature review, experiments, analysis, peer review, and paper drafts; broader than autoresearch, but clearly in the same lineage.
  • OpenRaiser/NanoResearch GitHub stars - End-to-end autonomous research engine that plans experiments, generates code, runs jobs locally or on SLURM, analyzes real results, and writes papers grounded in those outputs.
  • wanshuiyin/Auto-claude-code-research-in-sleep GitHub stars - Markdown-first research workflows for Claude Code and other agents, centered on autonomous literature review, experiments, paper iteration, and cross-model critique.
  • Sibyl-Research-Team/AutoResearch-SibylSystem GitHub stars - Fully autonomous AI scientist built on Claude Code, with explicit AutoResearch lineage, multi-agent research iteration, GPU experiment execution, and a self-evolving outer loop.
  • eimenhmdt/autoresearcher GitHub stars - Early open-source package for automating scientific workflows, currently centered on literature-review generation with an ambition toward broader autonomous research.
  • hyperspaceai/agi GitHub stars - Distributed, peer-to-peer research network where autonomous agents run experiments, gossip findings, maintain CRDT leaderboards, and archive results to GitHub across multiple research domains.
  • SakanaAI/AI-Scientist GitHub stars - The AI Scientist: First comprehensive system for fully automatic scientific discovery. From idea generation to paper writing with minimal human supervision.
  • SakanaAI/AI-Scientist-v2 GitHub stars - Workshop-level automated scientific discovery via agentic tree search. Removes template dependency from v1, generalizes across research domains.
  • HKUDS/AI-Researcher GitHub stars - NeurIPS 2025 paper. Full end-to-end research automation: hypothesis β†’ experiments β†’ manuscript β†’ peer review. Production version at novix.science.
  • openags/Auto-Research GitHub stars - OpenAGS: Orchestrates a team of AI agents across the full research lifecycle β€” lit review, hypothesis generation, experiments, manuscript writing, and peer review.
  • SamuelSchmidgall/AgentLaboratory GitHub stars - End-to-end autonomous research workflow: idea β†’ literature review β†’ experiments β†’ report. Supports both autonomous and co-pilot modes.
  • AgentRxiv - Collaborative autonomous research framework where agent laboratories share a preprint server to build on each other's work iteratively.
  • JinheonBaek/ResearchAgent GitHub stars - Iterative research idea generation over scientific literature with LLMs. Multi-agent review and feedback loops.
  • du-nlp-lab/MLR-Copilot GitHub stars - Autonomous ML research framework β€” generates ideas, implements experiments, analyzes results.
  • MASWorks/ML-Agent GitHub stars - Reinforcing LLM agents for autonomous ML engineering. Learns from trial and error to improve model performance.
  • PouriaRouzrokh/LatteReview GitHub stars - Low-code Python package for automated systematic literature reviews via AI-powered agents.
  • LitLLM/LitLLM GitHub stars - AI-powered literature review assistant using RAG for accurate, well-structured related-work sections in academic writing.
  • Agent Laboratory - Three-phase research pipeline: Literature Review β†’ Experimentation β†’ Report Writing, with specialized agents for each phase.
  • WecoAI/aideml GitHub stars - AIDE: AI-Driven Exploration β€” tree-search-based ML engineering agent that automates experiment design, code generation, and evaluation. Treats ML engineering as code optimization against any metric.

πŸ’» Platform ports and hardware forks

  • miolini/autoresearch-macos GitHub stars - Widely adopted macOS fork that adapts upstream autoresearch for Apple Silicon / MPS while preserving the original loop shape.
  • trevin-creator/autoresearch-mlx GitHub stars - MLX-native Apple Silicon port that keeps the upstream fixed-budget val_bpb loop while removing the PyTorch/CUDA dependency entirely.
  • jsegov/autoresearch-win-rtx GitHub stars - Windows-native RTX fork focused on consumer NVIDIA GPUs, with explicit VRAM floors and a practical desktop setup path.
  • iii-hq/n-autoresearch GitHub stars - Multi-GPU autoresearch infrastructure with structured experiment tracking, adaptive search strategy, crash recovery, and queryable orchestration around the classic train.py loop.
  • lucasgelfond/autoresearch-webgpu GitHub stars - Browser/WebGPU port that lets agents generate training code, run experiments in-browser, and feed results back into the loop without a Python setup.
  • tonitangpotato/autoresearch-engram GitHub stars - Fork with persistent cognitive memory β€” frequency-weighted retrieval of cross-session knowledge for improved experiment continuity.
  • Colab/Kaggle T4 port - Adapts autoresearch for free T4 GPUs (Google Colab / Kaggle) with zero cost and zero local setup. Key changes: Flash Attention 3 β†’ PyTorch SDPA, removes H100-only kernel dependency. (upstream issue #208)
  • ArmanJR-Lab/autoautoresearch - Jetson AGX Orin port with a director β€” a Go binary that acts as a "creative director" injecting novelty (arxiv papers + DeepSeek Reasoner) into the loop to escape local minima. Includes multi-experiment comparison (baseline vs director-guided) with detailed stall analysis.

🎯 Domain-specific adaptations

  • mattprusak/autoresearch-genealogy GitHub stars - Applies the autoresearch pattern to genealogy, using structured prompts, archive guides, source checks, and vault workflows to iteratively expand and verify family-history research.
  • ArchishmanSengupta/autovoiceevals GitHub stars - Uses adversarial callers plus keep-or-revert prompt edits to harden voice AI agents across Vapi, Smallest AI, and ElevenLabs.
  • chrisworsey55/atlas-gic GitHub stars - Applies the autoresearch keep-or-revert loop to trading agents, optimizing prompts and portfolio orchestration against rolling Sharpe ratio instead of model loss.
  • RightNow-AI/autokernel GitHub stars - Applies the autoresearch loop to GPU kernel optimization: profile bottlenecks, edit one kernel, benchmark, keep or revert, repeat.
  • Rkcr7/autoresearch-sudoku GitHub stars - Enhanced autoresearch workflow where an AI agent iteratively rewrites and benchmarks a Rust sudoku solver, ultimately beating leading human-built solvers on hard benchmark sets.

πŸ“Š Evaluation & benchmarks

  • snap-stanford/MLAgentBench GitHub stars - Benchmark suite for evaluating AI agents on ML experimentation tasks. 13 tasks from CIFAR-10 to BabyLM.
  • openai/mle-bench GitHub stars - OpenAI's benchmark for measuring how well AI agents perform at ML engineering.
  • chchenhui/mlrbench GitHub stars - MLR-Bench: Evaluating AI agents on open-ended ML research. 201 tasks from NeurIPS/ICLR/ICML workshops.
  • gersteinlab/ML-Bench GitHub stars - Evaluates LLMs and agents for ML tasks on repository-level code.
  • THUDM/AgentBench GitHub stars - Comprehensive benchmark for LLM-as-Agent evaluation across 8 distinct environments. ICLR 2024.

πŸ“ˆ Notable use cases and writeups

  • Shopify Liquid optimization - Tobi LΓΌtke shared an autoresearch-style optimization run on Shopify's Liquid engine, with public traces showing major parse/render speedups and allocation reductions. (tweet, PR with traces)
  • Driveline baseball biomechanics - Public autoresearch-style experiment loop for pitch-velocity prediction from biomechanics data, with large reported gains in model quality. (tweet)
  • Tennis XGBoost prediction + reward hacking writeup - Nick Oak documents an autoresearch-inspired loop for tennis match prediction, including where the optimization setup went wrong. (blog Β· repo Β· gamed branch)
  • Vesuvius Challenge ink detection swarm - Multi-agent experimental loop applied to ancient-scroll ink detection, with a strong writeup on cross-scroll generalization improvements. (blog)
  • Earth system model optimization - Hybrid workflow where an LLM proposes equation structures and a search process tunes parameters, showing how the autoresearch pattern extends into scientific modeling. (tweet, blog)
  • The Agentic Researcher - Paper: "A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning." Cites autoresearch as the canonical example of automated ML experiment pipelines. (arxiv 2603.15914)
  • Scaling Autoresearch to GPU Clusters - SkyPilot blog on running autoresearch on H100/H200 clusters with cloud orchestration. (SkyPilot Blog)
  • Self-Improving Coding Agents - Addy Osmani's practical guide to setting up self-improving agent loops with Claude Code. (article)
  • autoresearch@home: Distributed AI Research - SETI@home model applied to autoresearch β€” contribute GPU time to collective model optimization. (Ensue Blog)
  • Claude Code + AutoResearch for Self-Improving Skills - MindStudio guide to building self-improving AI skills using Claude Code with autoresearch patterns. (article)
  • 100 ML Experiments Overnight - Particula technical breakdown with domain-agnostic fork applications. (article)
  • PM's Guide to Autoresearch - Product manager's guide covering setup, community forks, and real-world applications. (article)
  • Autoresearch 101 Builder's Playbook - Substack deep-dive on applying autoresearch patterns to prompts, agents, and workflows with concrete examples. (article)
  • Kingy AI Technical Breakdown - Detailed technical walkthrough of the autoresearch loop architecture, mutation operators, and fitness function design. (article)
  • Fortune Feature - Business and industry context on why autoresearch matters for the future of autonomous AI agents. (article)

πŸ“š Related resources

Curated lists and paper collections for AI agents, autonomous systems, and automated research:

Star History

Star History Chart

πŸ“„ License

This list is released under CC0-1.0.

Yorumlar (0)

Sonuc bulunamadi