awesome-autoresearch
agent
A curated list of autonomous improvement loops, research agents, and autoresearch-style systems inspired by Karpathy's autoresearch.
README.md
π¬ Awesome Autoresearch
A curated, high-signal index of autonomous improvement loops, research agents, and descendants inspired by karpathy/autoresearch.
by Boring Dystopia Development
Contents
- π οΈ General-purpose descendants
- π¬ Research-agent systems
- π» Platform ports and hardware forks
- π― Domain-specific adaptations
- π Evaluation & benchmarks
- π Notable use cases and writeups
- π Related resources
- π License
π οΈ General-purpose descendants
- uditgoenka/autoresearch
- Claude Code skill that generalizes autoresearch into a reusable loop for software, docs, security, shipping, debugging, and other measurable goals.
- leo-lilinxiao/codex-autoresearch
- Codex-native autoresearch skill with resume support, lessons across runs, optional parallel experiments, and mode-specific workflows.
- supratikpm/gemini-autoresearch
- Gemini CLI skill that generalises autoresearch to any measurable goal. Gemini-native: uses Google Search grounding as a live verification source inside the loop, true headless overnight mode via --yolo --prompt, and 1M token context. Also works in Antigravity IDE via .agents/skills/.
- davebcn87/pi-autoresearch
-
piextension plus dashboard for persistent experiment loops, live metrics, confidence tracking, and resumable autoresearch sessions. - drivelineresearch/autoresearch-claude-code
- Claude Code plugin/skill port of
pi-autoresearch, with a clean experiment-loop workflow and a concrete biomechanics case study. - greyhaven-ai/autocontext
- Closed-loop control plane for repeated agent improvement, with evaluation, persistent knowledge, staged validation, and optional distillation into cheaper local runtimes.
- jmilinovich/goal-md
- Generalizes autoresearch into a
GOAL.mdpattern for repos where the agent must first construct a measurable fitness function before it can optimize. - mutable-state-inc/autoresearch-at-home
- Collaborative fork of upstream autoresearch that adds experiment claiming, shared best-config syncing, hypothesis exchange, and swarm-style coordination across many single-GPU agents.
- zkarimi22/autoresearch-anything
- Generalizes autoresearch to any measurable metric β system prompts, API performance, landing pages, test suites, config tuning, SQL queries. "If you can measure it, you can optimize it."
- Entrpi/autoresearch-everywhere
- Cross-platform expansion that auto-detects hardware config and starts the loop. The "glue and generalization" half of autoresearch.
- ShengranHu/ADAS
- Automated Design of Agentic Systems β ICLR 2025. Meta-agents that invent novel agent architectures by programming them in code.
- MaximeRobeyns/self_improving_coding_agent
- SICA: Self-Improving Coding Agent that edits its own codebase. ICLR 2025 Workshop paper demonstrating scaffold-level self-improvement on coding benchmarks.
- peterskoett/self-improving-agent
- Alternative self-improving agent architecture with reflection and meta-learning cycles.
- metauto-ai/HGM
- Huxley-GΓΆdel Machine for coding agents β applies self-improvement to SWE-bench performance via meta-level optimization.
- gepa-ai/gepa
- GEPA (Genetic-Pareto) β ICLR 2026 Oral. Reflective prompt evolution that outperforms RL (GRPO) on benchmarks. Optimizes any textual parameters against any metric using natural language reflection.
- MrTsepa/autoevolve
- GEPA-inspired autoresearch for self-play: mutate code strategies, evaluate head-to-head, rate with Elo/Bradley-Terry, branch from the Pareto front. Agent reads match traces to target mutations. Works as a Claude Code skill.
- HKUDS/ClawTeam
- Agent swarm intelligence for autoresearch β spawns parallel GPU research directions, distributes work across agents, aggregates results.
- Orchestra-Research/AI-Research-SKILLs
- Comprehensive skill library including autoresearch orchestration with two-loop architecture (inner optimization + outer synthesis).
- WecoAI/aideml
- AIDE: Tree-search ML engineering agent that autonomously improves model performance via iterative code generation and evaluation.
- weco.ai - Weco: Cloud platform for AIDE with observability, experiment tracking, and managed runs β brings the autoresearch loop into production.
π¬ Research-agent systems
- aiming-lab/AutoResearchClaw
- End-to-end research pipeline that turns a topic into literature review, experiments, analysis, peer review, and paper drafts; broader than autoresearch, but clearly in the same lineage.
- OpenRaiser/NanoResearch
- End-to-end autonomous research engine that plans experiments, generates code, runs jobs locally or on SLURM, analyzes real results, and writes papers grounded in those outputs.
- wanshuiyin/Auto-claude-code-research-in-sleep
- Markdown-first research workflows for Claude Code and other agents, centered on autonomous literature review, experiments, paper iteration, and cross-model critique.
- Sibyl-Research-Team/AutoResearch-SibylSystem
- Fully autonomous AI scientist built on Claude Code, with explicit AutoResearch lineage, multi-agent research iteration, GPU experiment execution, and a self-evolving outer loop.
- eimenhmdt/autoresearcher
- Early open-source package for automating scientific workflows, currently centered on literature-review generation with an ambition toward broader autonomous research.
- hyperspaceai/agi
- Distributed, peer-to-peer research network where autonomous agents run experiments, gossip findings, maintain CRDT leaderboards, and archive results to GitHub across multiple research domains.
- SakanaAI/AI-Scientist
- The AI Scientist: First comprehensive system for fully automatic scientific discovery. From idea generation to paper writing with minimal human supervision.
- SakanaAI/AI-Scientist-v2
- Workshop-level automated scientific discovery via agentic tree search. Removes template dependency from v1, generalizes across research domains.
- HKUDS/AI-Researcher
- NeurIPS 2025 paper. Full end-to-end research automation: hypothesis β experiments β manuscript β peer review. Production version at novix.science.
- openags/Auto-Research
- OpenAGS: Orchestrates a team of AI agents across the full research lifecycle β lit review, hypothesis generation, experiments, manuscript writing, and peer review.
- SamuelSchmidgall/AgentLaboratory
- End-to-end autonomous research workflow: idea β literature review β experiments β report. Supports both autonomous and co-pilot modes.
- AgentRxiv - Collaborative autonomous research framework where agent laboratories share a preprint server to build on each other's work iteratively.
- JinheonBaek/ResearchAgent
- Iterative research idea generation over scientific literature with LLMs. Multi-agent review and feedback loops.
- du-nlp-lab/MLR-Copilot
- Autonomous ML research framework β generates ideas, implements experiments, analyzes results.
- MASWorks/ML-Agent
- Reinforcing LLM agents for autonomous ML engineering. Learns from trial and error to improve model performance.
- PouriaRouzrokh/LatteReview
- Low-code Python package for automated systematic literature reviews via AI-powered agents.
- LitLLM/LitLLM
- AI-powered literature review assistant using RAG for accurate, well-structured related-work sections in academic writing.
- Agent Laboratory - Three-phase research pipeline: Literature Review β Experimentation β Report Writing, with specialized agents for each phase.
- WecoAI/aideml
- AIDE: AI-Driven Exploration β tree-search-based ML engineering agent that automates experiment design, code generation, and evaluation. Treats ML engineering as code optimization against any metric.
π» Platform ports and hardware forks
- miolini/autoresearch-macos
- Widely adopted macOS fork that adapts upstream autoresearch for Apple Silicon / MPS while preserving the original loop shape.
- trevin-creator/autoresearch-mlx
- MLX-native Apple Silicon port that keeps the upstream fixed-budget
val_bpbloop while removing the PyTorch/CUDA dependency entirely. - jsegov/autoresearch-win-rtx
- Windows-native RTX fork focused on consumer NVIDIA GPUs, with explicit VRAM floors and a practical desktop setup path.
- iii-hq/n-autoresearch
- Multi-GPU autoresearch infrastructure with structured experiment tracking, adaptive search strategy, crash recovery, and queryable orchestration around the classic
train.pyloop. - lucasgelfond/autoresearch-webgpu
- Browser/WebGPU port that lets agents generate training code, run experiments in-browser, and feed results back into the loop without a Python setup.
- tonitangpotato/autoresearch-engram
- Fork with persistent cognitive memory β frequency-weighted retrieval of cross-session knowledge for improved experiment continuity.
- Colab/Kaggle T4 port - Adapts autoresearch for free T4 GPUs (Google Colab / Kaggle) with zero cost and zero local setup. Key changes: Flash Attention 3 β PyTorch SDPA, removes H100-only kernel dependency. (upstream issue #208)
- ArmanJR-Lab/autoautoresearch - Jetson AGX Orin port with a director β a Go binary that acts as a "creative director" injecting novelty (arxiv papers + DeepSeek Reasoner) into the loop to escape local minima. Includes multi-experiment comparison (baseline vs director-guided) with detailed stall analysis.
π― Domain-specific adaptations
- mattprusak/autoresearch-genealogy
- Applies the autoresearch pattern to genealogy, using structured prompts, archive guides, source checks, and vault workflows to iteratively expand and verify family-history research.
- ArchishmanSengupta/autovoiceevals
- Uses adversarial callers plus keep-or-revert prompt edits to harden voice AI agents across Vapi, Smallest AI, and ElevenLabs.
- chrisworsey55/atlas-gic
- Applies the autoresearch keep-or-revert loop to trading agents, optimizing prompts and portfolio orchestration against rolling Sharpe ratio instead of model loss.
- RightNow-AI/autokernel
- Applies the autoresearch loop to GPU kernel optimization: profile bottlenecks, edit one kernel, benchmark, keep or revert, repeat.
- Rkcr7/autoresearch-sudoku
- Enhanced autoresearch workflow where an AI agent iteratively rewrites and benchmarks a Rust sudoku solver, ultimately beating leading human-built solvers on hard benchmark sets.
π Evaluation & benchmarks
- snap-stanford/MLAgentBench
- Benchmark suite for evaluating AI agents on ML experimentation tasks. 13 tasks from CIFAR-10 to BabyLM.
- openai/mle-bench
- OpenAI's benchmark for measuring how well AI agents perform at ML engineering.
- chchenhui/mlrbench
- MLR-Bench: Evaluating AI agents on open-ended ML research. 201 tasks from NeurIPS/ICLR/ICML workshops.
- gersteinlab/ML-Bench
- Evaluates LLMs and agents for ML tasks on repository-level code.
- THUDM/AgentBench
- Comprehensive benchmark for LLM-as-Agent evaluation across 8 distinct environments. ICLR 2024.
π Notable use cases and writeups
- Shopify Liquid optimization - Tobi LΓΌtke shared an autoresearch-style optimization run on Shopify's Liquid engine, with public traces showing major parse/render speedups and allocation reductions. (tweet, PR with traces)
- Driveline baseball biomechanics - Public autoresearch-style experiment loop for pitch-velocity prediction from biomechanics data, with large reported gains in model quality. (tweet)
- Tennis XGBoost prediction + reward hacking writeup - Nick Oak documents an autoresearch-inspired loop for tennis match prediction, including where the optimization setup went wrong. (blog Β· repo Β· gamed branch)
- Vesuvius Challenge ink detection swarm - Multi-agent experimental loop applied to ancient-scroll ink detection, with a strong writeup on cross-scroll generalization improvements. (blog)
- Earth system model optimization - Hybrid workflow where an LLM proposes equation structures and a search process tunes parameters, showing how the autoresearch pattern extends into scientific modeling. (tweet, blog)
- The Agentic Researcher - Paper: "A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning." Cites autoresearch as the canonical example of automated ML experiment pipelines. (arxiv 2603.15914)
- Scaling Autoresearch to GPU Clusters - SkyPilot blog on running autoresearch on H100/H200 clusters with cloud orchestration. (SkyPilot Blog)
- Self-Improving Coding Agents - Addy Osmani's practical guide to setting up self-improving agent loops with Claude Code. (article)
- autoresearch@home: Distributed AI Research - SETI@home model applied to autoresearch β contribute GPU time to collective model optimization. (Ensue Blog)
- Claude Code + AutoResearch for Self-Improving Skills - MindStudio guide to building self-improving AI skills using Claude Code with autoresearch patterns. (article)
- 100 ML Experiments Overnight - Particula technical breakdown with domain-agnostic fork applications. (article)
- PM's Guide to Autoresearch - Product manager's guide covering setup, community forks, and real-world applications. (article)
- Autoresearch 101 Builder's Playbook - Substack deep-dive on applying autoresearch patterns to prompts, agents, and workflows with concrete examples. (article)
- Kingy AI Technical Breakdown - Detailed technical walkthrough of the autoresearch loop architecture, mutation operators, and fitness function design. (article)
- Fortune Feature - Business and industry context on why autoresearch matters for the future of autonomous AI agents. (article)
π Related resources
Curated lists and paper collections for AI agents, autonomous systems, and automated research:
- ai-agents-2030/awesome-deep-research-agent
- Curated list of deep research agent papers and systems.
- YoungDubbyDu/LLM-Agent-Optimization
- Papers on LLM agent optimization methods.
- VoltAgent/awesome-ai-agent-papers
- Curated AI agent papers from 2026 β agent engineering, memory, evaluation, workflows, and autonomous systems.
- masamasa59/ai-agent-papers
- AI agent research papers updated biweekly via automated arxiv search with curated selection.
- tmgthb/Autonomous-Agents
- Autonomous agents research papers, updated daily.
- HKUST-KnowComp/Awesome-LLM-Scientific-Discovery
- EMNLP 2025 survey on LLMs in scientific discovery.
- openags/Awesome-AI-Scientist-Papers
- Collection of AI Scientist / Robot Scientist papers.
- agenticscience.github.io - Survey: "From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery."
- dspy.ai/GEPA - DSPy integration of GEPA reflective prompt optimizer for compound AI systems.
- OpenAI Cookbook: Self-Evolving Agents - Cookbook for autonomous agent retraining using GEPA-style reflective evolution.
- WecoAI/awesome-autoresearch
- Curated list of AutoResearch use cases with verifiable traces and progress charts, organized by domain (LLM training, GPU kernels, voice agents, trading, etc.).
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi