awesome-autoresearch

agent
Security Audit
Warn
Health Pass
  • License — License: NOASSERTION
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 2219 GitHub stars
Code Warn
  • Code scan incomplete — No supported source files were scanned during light audit
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose
This repository is a curated directory (an "awesome list") of autonomous research agents, improvement loops, and resources inspired by Karpathy's autoresearch concept. It serves as an index rather than a standalone executable application or MCP server.

Security Assessment
Because this tool is a documentation resource containing links to other projects, it does not directly execute shell commands, make network requests, or access sensitive data. The light code scan was incomplete because there are no standard source files to analyze. No hardcoded secrets or dangerous permissions were detected. Overall risk: Low.

Quality Assessment
The project is actively maintained, with its most recent push occurring today. It has strong community trust, evidenced by over 1,000 GitHub stars. While the automated check returned "NOASSERTION" for the license, the README clearly indicates the content is covered under the CC0-1.0 license.

Verdict
Safe to use.
SUMMARY

A curated list of autonomous improvement loops, research agents, and autoresearch-style systems inspired by Karpathy's autoresearch.

README.md

🔬 Awesome Autoresearch

A curated, high-signal index of autonomous improvement loops, research agents, and descendants inspired by karpathy/autoresearch.

Awesome
PRs Welcome
License: CC0-1.0

Contents

🛠️ General-purpose descendants

  • kayba-ai/recursive-improve - Recursive self-improvement framework where agents capture execution traces, analyze failure patterns, and apply targeted fixes with keep-or-revert evaluation. GitHub stars
  • vukrosic/auto-research - Docs-only control plane for an open autonomous AI research lab — file-based operating model for human direction and agent execution. GitHub stars
  • uditgoenka/autoresearch - Claude Code skill that generalizes autoresearch into a reusable loop for software, docs, security, shipping, debugging, and other measurable goals. GitHub stars
  • leo-lilinxiao/codex-autoresearch - Codex-native autoresearch skill with resume support, lessons across runs, optional parallel experiments, and mode-specific workflows. GitHub stars
  • SeeleAI/Thoth - Dashboard-first Claude Code and Codex runtime for autoresearch, with durable runs, locked work items, visible ledgers, and reviewable verdicts. GitHub stars
  • supratikpm/gemini-autoresearch - Gemini CLI skill that generalises autoresearch to any measurable goal. Gemini-native: uses Google Search grounding as a live verification source inside the loop, true headless overnight mode via --yolo --prompt, and 1M token context. Also works in Antigravity IDE via .agents/skills/. GitHub stars
  • davebcn87/pi-autoresearch - pi extension plus dashboard for persistent experiment loops, live metrics, confidence tracking, and resumable autoresearch sessions. GitHub stars
  • drivelineresearch/autoresearch-claude-code - Claude Code plugin/skill port of pi-autoresearch, with a clean experiment-loop workflow and a concrete biomechanics case study. GitHub stars
  • greyhaven-ai/autocontext - Closed-loop control plane for repeated agent improvement, with evaluation, persistent knowledge, staged validation, and optional distillation into cheaper local runtimes. GitHub stars
  • jmilinovich/goal-md - Generalizes autoresearch into a GOAL.md pattern for repos where the agent must first construct a measurable fitness function before it can optimize. GitHub stars
  • james-s-tayler/lazy-developer - Claude Code skill that orchestrates autoresearch across a prioritized sequence of optimization goals (coverage, test speed, build speed, complexity, LOC, performance) using GOAL.md as the engine. Supports standalone and Ralph Mode multi-instance execution. GitHub stars
  • mutable-state-inc/autoresearch-at-home - Collaborative fork of upstream autoresearch that adds experiment claiming, shared best-config syncing, hypothesis exchange, and swarm-style coordination across many single-GPU agents. GitHub stars
  • zkarimi22/autoresearch-anything - Generalizes autoresearch to any measurable metric — system prompts, API performance, landing pages, test suites, config tuning, SQL queries. "If you can measure it, you can optimize it." GitHub stars
  • Entrpi/autoresearch-everywhere - Cross-platform expansion that auto-detects hardware config and starts the loop. The "glue and generalization" half of autoresearch. GitHub stars
  • ShengranHu/ADAS - Automated Design of Agentic Systems — ICLR 2025. Meta-agents that invent novel agent architectures by programming them in code. GitHub stars
  • MaximeRobeyns/self_improving_coding_agent - SICA: Self-Improving Coding Agent that edits its own codebase. ICLR 2025 Workshop paper demonstrating scaffold-level self-improvement on coding benchmarks. GitHub stars
  • peterskoett/self-improving-agent - Alternative self-improving agent architecture with reflection and meta-learning cycles. GitHub stars
  • metauto-ai/HGM - Huxley-Gödel Machine for coding agents — applies self-improvement to SWE-bench performance via meta-level optimization. GitHub stars
  • gepa-ai/gepa - GEPA (Genetic-Pareto) — ICLR 2026 Oral. Reflective prompt evolution that outperforms RL (GRPO) on benchmarks. Optimizes any textual parameters against any metric using natural language reflection. GitHub stars
  • sentient-agi/EvoSkill - Automated skill discovery for coding agents: evolves reusable skills and prompts from failed trajectories against benchmarks, with support for Claude Code, Codex CLI, OpenCode, OpenHands, and Goose. GitHub stars
  • MrTsepa/autoevolve - GEPA-inspired autoresearch for self-play: mutate code strategies, evaluate head-to-head, rate with Elo/Bradley-Terry, branch from the Pareto front. Agent reads match traces to target mutations. Works as a Claude Code skill. GitHub stars
  • HKUDS/ClawTeam - Agent swarm intelligence for autoresearch — spawns parallel GPU research directions, distributes work across agents, aggregates results. GitHub stars
  • Orchestra-Research/AI-Research-SKILLs - Comprehensive skill library including autoresearch orchestration with two-loop architecture (inner optimization + outer synthesis). GitHub stars
  • WecoAI/aideml - AIDE: Tree-search ML engineering agent that autonomously improves model performance via iterative code generation and evaluation. GitHub stars
  • weco.ai - Weco: Cloud platform for AIDE with observability, experiment tracking, and managed runs — brings the autoresearch loop into production.

🔬 Research-agent systems

  • aiming-lab/AutoResearchClaw - End-to-end research pipeline that turns a topic into literature review, experiments, analysis, peer review, and paper drafts; broader than autoresearch, but clearly in the same lineage. GitHub stars
  • OpenRaiser/NanoResearch - End-to-end autonomous research engine that plans experiments, generates code, runs jobs locally or on SLURM, analyzes real results, and writes papers grounded in those outputs. GitHub stars
  • kaust-ark/ARK - ARK (Automatic Research Kit): idea + venue → paper pipeline orchestrating 6 agents — proposal analysis, literature search, Slurm experiments, LaTeX drafting, iterative peer review. Controlled via CLI, web dashboard, or Telegram. GitHub stars
  • wanshuiyin/Auto-claude-code-research-in-sleep - Markdown-first research workflows for Claude Code and other agents, centered on autonomous literature review, experiments, paper iteration, and cross-model critique. GitHub stars
  • skyllwt/AutoSci - Wiki-centric full-lifecycle research platform built on Claude Code, realizing Karpathy's LLM-Wiki vision. 20+ skills cover the full loop: ingest → ideate → novelty check → experiment design / run / eval → paper writing. Research state lives in a structured knowledge wiki with an interactive graph. GitHub stars
  • Sibyl-Research-Team/AutoResearch-SibylSystem - Fully autonomous AI scientist built on Claude Code, with explicit AutoResearch lineage, multi-agent research iteration, GPU experiment execution, and a self-evolving outer loop. GitHub stars
  • eimenhmdt/autoresearcher - Early open-source package for automating scientific workflows, currently centered on literature-review generation with an ambition toward broader autonomous research. GitHub stars
  • hyperspaceai/agi - Distributed, peer-to-peer research network where autonomous agents run experiments, gossip findings, maintain CRDT leaderboards, and archive results to GitHub across multiple research domains. GitHub stars
  • Human-Agent-Society/CORAL - CORAL: Autonomous multi-agent evolution for open-ended discovery (arXiv:2604.01658). Long-running agents with shared persistent memory, asynchronous execution, and heartbeat-based interventions; SOTA on 10 math/algorithmic/systems tasks. GitHub stars
  • SakanaAI/AI-Scientist - The AI Scientist: First comprehensive system for fully automatic scientific discovery. From idea generation to paper writing with minimal human supervision. GitHub stars
  • SakanaAI/AI-Scientist-v2 - Workshop-level automated scientific discovery via agentic tree search. Removes template dependency from v1, generalizes across research domains. GitHub stars
  • AweAI-Team/AiScientist - AiScientist: long-horizon ML research lab with hierarchical orchestration and File-as-Bus coordination — workspace files act as the durable system of record. Drives autonomous paper-reproduction (PaperBench) and competition-style MLE-Bench iteration loops under fixed compute/time budgets. (arXiv 2604.13018) GitHub stars
  • HKUDS/AI-Researcher - NeurIPS 2025 paper. Full end-to-end research automation: hypothesis → experiments → manuscript → peer review. Production version at novix.science. GitHub stars
  • openags/Auto-Research - OpenAGS: Orchestrates a team of AI agents across the full research lifecycle — lit review, hypothesis generation, experiments, manuscript writing, and peer review. GitHub stars
  • SamuelSchmidgall/AgentLaboratory - End-to-end autonomous research workflow: idea → literature review → experiments → report. Supports both autonomous and co-pilot modes. GitHub stars
  • AgentRxiv - Collaborative autonomous research framework where agent laboratories share a preprint server to build on each other's work iteratively.
  • JinheonBaek/ResearchAgent - Iterative research idea generation over scientific literature with LLMs. Multi-agent review and feedback loops. GitHub stars
  • du-nlp-lab/MLR-Copilot - Autonomous ML research framework — generates ideas, implements experiments, analyzes results. GitHub stars
  • MASWorks/ML-Agent - Reinforcing LLM agents for autonomous ML engineering. Learns from trial and error to improve model performance. GitHub stars
  • PouriaRouzrokh/LatteReview - Low-code Python package for automated systematic literature reviews via AI-powered agents. GitHub stars
  • LitLLM/LitLLM - AI-powered literature review assistant using RAG for accurate, well-structured related-work sections in academic writing. GitHub stars
  • Agent Laboratory - Three-phase research pipeline: Literature Review → Experimentation → Report Writing, with specialized agents for each phase.

💻 Platform ports and hardware forks

  • gianfrancopiana/openclaw-autoresearch - OpenClaw port of pi-autoresearch; autonomous experiment loop for any optimization target with statistical confidence scoring. GitHub stars
  • miolini/autoresearch-macos - Widely adopted macOS fork that adapts upstream autoresearch for Apple Silicon / MPS while preserving the original loop shape. GitHub stars
  • trevin-creator/autoresearch-mlx - MLX-native Apple Silicon port that keeps the upstream fixed-budget val_bpb loop while removing the PyTorch/CUDA dependency entirely. GitHub stars
  • jsegov/autoresearch-win-rtx - Windows-native RTX fork focused on consumer NVIDIA GPUs, with explicit VRAM floors and a practical desktop setup path. GitHub stars
  • iii-hq/n-autoresearch - Multi-GPU autoresearch infrastructure with structured experiment tracking, adaptive search strategy, crash recovery, and queryable orchestration around the classic train.py loop. GitHub stars
  • lucasgelfond/autoresearch-webgpu - Browser/WebGPU port that lets agents generate training code, run experiments in-browser, and feed results back into the loop without a Python setup. GitHub stars
  • tonitangpotato/autoresearch-engram - Fork with persistent cognitive memory — frequency-weighted retrieval of cross-session knowledge for improved experiment continuity. GitHub stars
  • Colab/Kaggle T4 port - Adapts autoresearch for free T4 GPUs (Google Colab / Kaggle) with zero cost and zero local setup. Key changes: Flash Attention 3 → PyTorch SDPA, removes H100-only kernel dependency.
  • ArmanJR-Lab/autoautoresearch - Jetson AGX Orin port with a director — a Go binary that acts as a "creative director" injecting novelty (arxiv papers + DeepSeek Reasoner) into the loop to escape local minima. Includes multi-experiment comparison (baseline vs director-guided) with detailed stall analysis. GitHub stars

🎯 Domain-specific adaptations

  • mattprusak/autoresearch-genealogy - Applies the autoresearch pattern to genealogy, using structured prompts, archive guides, source checks, and vault workflows to iteratively expand and verify family-history research. GitHub stars
  • ArchishmanSengupta/autovoiceevals - Uses adversarial callers plus keep-or-revert prompt edits to harden voice AI agents across Vapi, Smallest AI, and ElevenLabs. GitHub stars
  • chrisworsey55/atlas-gic - Applies the autoresearch keep-or-revert loop to trading agents, optimizing prompts and portfolio orchestration against rolling Sharpe ratio instead of model loss. GitHub stars
  • RightNow-AI/autokernel - Applies the autoresearch loop to GPU kernel optimization: profile bottlenecks, edit one kernel, benchmark, keep or revert, repeat. GitHub stars
  • Agent-Analytics/autoresearch-growth - Applies autoresearch to landing-page positioning and A/B test candidates, using analytics snapshots and measured experiment results to seed subsequent rounds. GitHub stars
  • Rkcr7/autoresearch-sudoku - Enhanced autoresearch workflow where an AI agent iteratively rewrites and benchmarks a Rust sudoku solver, ultimately beating leading human-built solvers on hard benchmark sets. GitHub stars
  • jeongph/autospec - Reads natural-language business rules and autonomously builds a Spring Boot service with tests via the keep-or-revert loop. Evaluates with Gradle build + JUnit XML. 119-line skeleton to 950 lines in 5 cycles. GitHub stars
  • vlasenkoalexey/tpu_performance_autoresearch_wiki - Applies the autoresearch keep-or-revert loop to TPU model performance (MFU / tokens-per-sec) on v6e hardware: profiles each run through an XProf MCP server, makes one model-code change per experiment, and keeps or reverts against measured MFU. Pairs the loop with a Karpathy-style LLM wiki for domain knowledge and per-experiment optimization traces; includes Llama3-8B and Qwen3-8B case studies across JAX and torchax lanes. GitHub stars

📊 Evaluation & benchmarks

  • snap-stanford/MLAgentBench - Benchmark suite for evaluating AI agents on ML experimentation tasks. 13 tasks from CIFAR-10 to BabyLM. GitHub stars
  • OpenAI/mle-bench - OpenAI's benchmark for measuring how well AI agents perform at ML engineering. GitHub stars
  • chchenhui/mlrbench - MLR-Bench: Evaluating AI agents on open-ended ML research. 201 tasks from NeurIPS/ICLR/ICML workshops. GitHub stars
  • gersteinlab/ML-Bench - Evaluates LLMs and agents for ML tasks on repository-level code. GitHub stars
  • THUDM/AgentBench - Comprehensive benchmark for LLM-as-Agent evaluation across 8 distinct environments. ICLR 2024. GitHub stars

📈 Notable use cases and writeups

📚 Related resources

Curated lists and paper collections for AI agents, autonomous systems, and automated research:

📄 License

This list is released under CC0-1.0.

Reviews (0)

No results found