Mimosa-AI 🌼🔬

Name: Mimosa-AI
Author: HolobiomicsLab

An Open-source Evolving AI-Framework for Autonomous Scientific Research

Mimosa-AI 🌼 — like the mimosa plant that senses, learns, and adapts — Mimosa is an open-source framework for autonomous scientific research that automatically synthesizes task-specific multi-agent workflows and refines them through execution feedback. Built around MCP-based tool discovery, code-generating agents, and LLM-based evaluation, it offers academics a modular and auditable alternative to closed black-box systems.

Objectives:

Reproduce scientific studies with traceability and rigor
Automate computational scientific pipelines across domains such as bioinformatics, docking, metabolomics, and more

Citation: Mimosa Framework: Toward Evolving Multi-Agent Systems for Scientific Research
M. Legrand, T. Jiang, M. Feraud, B. Navet, Y. Taghzouti, F. Gandon, E. Dumont, L.-F. Nothias — arXiv:2603.28986, 2026 — DOI — [BibTeX]

How does it work?
Example: Reproducing a Bioactive Molecular Networking Paper
Prerequisites
Installation
Configuration
Running Mimosa
Workspace and Audit Trail
Learning through Evolution of Multi-Agent Workflows
Transparency
Command Line Arguments
Evaluation
Phone Notifications
Telemetry Setup
License
Citation

How does it work ?

The user gives Mimosa-AI either a high-level scientific goal or a single task.

In goal mode, a planner decomposes the objective into smaller tasks. In task mode, the prompt is sent directly to the workflow synthesis layer.
Mimosa discovers available MCP-based tools on the local network through Toolomics, from data-analysis utilities and web services to laboratory instruments.
A meta-orchestrator synthesizes a task-specific multi-agent workflow and assigns relevant tools to specialized agents.
Code-generating agents execute subtasks using discovered tools and scientific software libraries.
An LLM-based judge evaluates execution traces and outputs; in learning mode, this feedback can drive iterative workflow refinement.
Mimosa archives workflows, traces, logs, reports, and generated artifacts in a final capsule for inspection and reuse.

schema

Example: Reproducing a Bioactive Molecular Networking Paper

Objective — Reproduce Nothias et al. (2018) end-to-end: from feature detection through network visualization, starting from .mzML files (raw data conversion assumed complete).

Mimosa-AI was given the paper, the raw data, and a set of domain-specific skills covering configuration details not fully specified in the publication.

Execution Timelapse

https://github.com/user-attachments/assets/dcd04ade-9c43-44a8-b3e3-a999d3dc895d

Output

molecular_network

The resulting molecular network matches the topology reported in the original paper, including cluster separation and edge weights reproduced autonomously.

Prerequisites

Python 3.10+
uv (recommended) or pip
A running Toolomics MCP server

Installation

1. Clone and create virtual environment

# Using uv (recommended)
pip install uv
uv venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

# Or with pip
python3 -m venv .venv
source .venv/bin/activate

2. Install dependencies

cd mimosa
uv pip install -r requirements.txt

3. Set API keys

Create a .env file at the project root. Include only the keys for the LLM providers you plan to use:

ANTHROPIC_API_KEY=...       # Claude — recommended for workflow orchestration
OPENAI_API_KEY=...          # OpenAI models - Optional
MISTRAL_API_KEY=...         # Mistral models - Optional
DEEPSEEK_API_KEY=...        # Deepseek - Optional
HF_TOKEN=...                # HuggingFace provider, Optional
OPENROUTER_API_KEY=...      # Any model via OpenRouter

# Optional — observability via Langfuse
LANGFUSE_PUBLIC_KEY=...
LANGFUSE_PRIVATE_KEY=...

4. Start the MCP server

Follow the setup instructions at HolobiomicsLab/toolomics. Configure it to run on a port range (e.g., 5000–5100).

Toolomics is Mimosa's companion platform for MCP server management. It exposes tools as discoverable MCP services, lets you register custom tools without modifying Mimosa's core orchestration logic, and provides the workspace where Mimosa reads and writes task artifacts. Both Mimosa and Toolomics are released under the Apache License 2.0.

Custom MCP tools can be added via the Toolomics docs.

Configuration

cp config_default.json my_config.json

Edit my_config.json. Key parameters:

Parameter	Description
`workspace_dir`	Path to the Toolomics workspace — all generated files appear here
`discovery_addresses`	IP + port ranges for MCP server discovery
`planner_llm_model`	LLM for task decomposition and planning
`prompts_llm_model`	LLM for workflow prompt generation
`workflow_llm_model`	LLM for multi-agent orchestration (recommended: anthropic/claude-opus-4-5 or z-ai/glm-5)
`smolagent_model_id`	Model for SmolAgents execution subtasks
`judge_model`	LLM for output self-evaluation and scoring
`learned_score_threshold`	Minimum score to accept a result and stop iterating
`max_learning_evolve_iterations`	Maximum self-improvement iterations before accepting the result

Running Mimosa

Mimosa supports two execution modes: Goal and Task.

Goal mode — multi-step scientific objective

Use this when your objective requires planning across multiple distinct operations (e.g., reproducing a paper, building an ML pipeline).

uv run main.py --goal "Your scientific objective" --config my_config.json

Examples:

uv run main.py \
  --goal "Reproduce experiments from 'Dual Aggregation Transformer for Image Super-Resolution' (https://arxiv.org/pdf/2306.00306) and compare results." \
  --config my_config.json

uv run main.py \
  --goal "Develop a machine learning model to predict protein-ligand binding affinity." \
  --config my_config.json

Task mode — single granular operation

Use this for a focused, self-contained operation without long-term planning.

uv run main.py --task "Your task description" --config my_config.json

Examples:

uv run main.py \
  --task "Train a multitask model on the Clintox dataset to predict drug toxicity and FDA approval status." \
  --config my_config.json

uv run main.py --task "Conduct a literature review on graph neural networks for drug discovery." --config my_config.json

Benchmark note: The results reported in the manuscript are measured in task mode, with the planning layer disabled, to isolate workflow synthesis and iterative refinement.

Note: Toolomics must be installed and the MCP server must be running before executing any mode.

Workspace and Audit Trail

During execution, Mimosa reads and writes files inside the Toolomics workspace configured by workspace_dir. When a run finishes, the workspace contents are copied into a timestamped folder under runs_capsule/ so the final state is preserved as an archive rather than remaining only in a transient working directory.

Toolomics workspace/: the live working directory for the current run, including intermediate files, scripts, downloads, and generated outputs.
sources/workflows/<uuid>/: the generated workflow and its execution metadata, including state_result.json, evaluation.txt, reward_progress.png, and memory/ traces.
runs_capsule/<capsule_name>/: the archived snapshot of the run, copied from the Toolomics workspace for later inspection, comparison, or sharing.
memory_explorer.py <uuid>: replay a workflow execution step by step to inspect agent traces, tool calls, and outputs.

Together, these locations form Mimosa's audit trail: they preserve what the system planned, executed, evaluated, and produced, supporting debugging, inspection, and potential replication.

Learning through Evolution of Multi-Agent Workflows

Mimosa-AI is a self-evolving multi-agent system that dynamically synthesizes specialized workflows for scientific tasks. Rather than forcing tasks through fixed pipelines, the system composes custom multi-agent architectures on-demand and learns from execution patterns to optimize future performance.

The full architecture is organized into five layers: (0) optional planning, (1) MCP-based tool discovery, (2) meta-orchestration, (3) agent execution, and (4) judge/evaluation. In benchmark task mode, the planning layer is bypassed so workflow synthesis and refinement can be evaluated directly.

Mimosa is also capable of learning from failures through Darwinian-inspired evolution of multi-agent workflows. For each task, it composes a custom multi-agent graph and refines it through single-incumbent local search: at each iteration, only the best-performing workflow generates a successor, and only improvements are kept. Over time, the system builds a library of proven workflows, so similar future tasks start from a strong baseline rather than from scratch.

For any new task, start with learn mode to let it build competence before full autonomy.

Start in Learning mode

uv run main.py --task "Train a multitask model on the Clintox dataset to predict drug toxicity and FDA approval status" --learn --config my_config.json

dgm

Progress visualization:

Once Mimosa-AI completes its learning phase on a task, you can visualize exactly how it improved over time. The reward progress plot showing performance gains across attempts is automatically saved.

The reward progress plot is saved under the sources/workflows/<uuid> folder under the filename reward_progress.png.

Example:

dgm

Transparency

We ship an interactive debugger, memory_explorer.py, that lets you step through any agent execution in granular detail.

Start it with a workflow (eg: memory_explorer.py 20260115_113303_9bb63437)

This replays the full execution trace—thoughts, tool calls and outputs so you can inspect exactly how decisions unfolded.

Command Line Arguments

Execution Modes

Argument	Description
`--goal GOAL`	Specify a high-level research objective, paper reproduction, or scientific question (planner mode)
`--task TASK`	Execute a single task: literature review, datasets download, implement a machine learning model...
`--manual`	Interactive CLI mode to debug MCPs and test Mimosa tools directly
`--papers <CSV path>`	Evaluation on a CSV dataset containing research papers and prompts
`--science_agent_bench`	Evaluation on ScienceAgentBench

Other parameters

Argument	Description
`--learn`	Enable iterative-learning to optimize task performance
`--max_evolve_iterations N`	Maximum learning iterations
`--csv_runs_limit N`	Limit number of CSV entries to evaluate
`--scenario <scenario file name>`	Use specific scenario based assertions instead of LLM-as-a-judge for scoring execution
`--single_agent`	Single agent mode. fast, but can't improve throught learning
`--debug`	Enable debug mode for more verbose logging

Evaluation

Mimosa-AI can be evaluated either on ScienceAgentBench or PaperBench.

⚠️ For unbiased evaluation it is advised to run ./cleanup.sh first, this will prevent Mimosa from using existing or cached workflows.

ScienceAgentBench

Manuscript-aligned results snapshot (task mode, 102 tasks):

DeepSeek-V3.2 single-agent: SR 38.2%, CBS 0.898, $0.05/task
DeepSeek-V3.2 one-shot multi-agent: SR 32.4%, CBS 0.794, $0.38/task
DeepSeek-V3.2 iterative-learning: SR 43.1%, CBS 0.921, $1.7/task
We also report model-dependent behavior in the manuscript: iterative learning improves GPT-4o, but yields marginal degradation for Claude Haiku 4.5.

To evaluate on ScienceAgentBench you must:

download the ScienceAgentBench full dataset:
dataset link
unzip it with password: scienceagentbench
copy content of benchmark/benchmark/datasets folder to Mimosa-AI/datasets/scienceagentbench/datasets

Evaluation on ScienceAgentBench with learning

uv run main.py --science_agent_bench --learn

Evaluation on ScienceAgentBench limited to 10 tasks with learning limited to 4 learning iterations

uv run main.py --science_agent_bench --csv_runs_limit 10 --max_evolve_iterations 4

PaperBench

Evaluation on OpenAI PaperBench with learning mode

OpenAI PaperBench is a benchmark for evaluating the ability of AI agents to replicate AI research, from the paper PaperBench: Evaluating AI’s Ability to Replicate AI Research.

uv run main.py --papers datasets/paper_bench.csv --csv_runs_limit 20  --learn

⚠️ This will save in runs_capsule/ folder the result of all paper's reproduction attempt, refer to Paper Bench documentation for complete evaluation.

Evaluation on custom benchmark of research paper

Place your benchmark CSV with the same format as paper_bench.csv in datasets/<your_benchmark_name>.csv.
Run on your benchmark:

uv run main.py --papers datasets/<your_benchmark_name>.csv --csv_runs_limit 20  --learn

Phone Notifications

Receive real-time updates about Mimosa's status via Pushover notifications.

Setup Instructions

Create Pushover Account
- Visit pushover.net
- Register and note your User Key
Create Application
- In Pushover dashboard, click "Create an Application/API Token"
- Name it "Mimosa" and copy the generated API Token

Configure Environment

export PUSHOVER_USER="your_user_key"
export PUSHOVER_TOKEN="your_api_token"

Install Mobile App
- Download Pushover from your device's app store
- Log in with your Pushover account

Telemetry Setup

Monitor and debug AI agents with real-time observability dashboards using Langfuse.

Quick Start

Deploy Langfuse Locally

git clone https://github.com/langfuse/langfuse.git
cd langfuse
docker compose up -d

Configure Environment Variables

Add to your .env file:

LANGFUSE_PUBLIC_KEY=your_public_key
LANGFUSE_PRIVATE_KEY=your_private_key

Access Dashboard

While Mimosa-AI is running, visit http://localhost:3000

Available Metrics

The telemetry dashboard provides:

Agent Execution Traces: Step-by-step workflow visualization
Performance Metrics: Response times and success rates
Error Debugging: Detailed failure analysis
Resource Usage: Token consumption and API calls

Example Dashboard:
Langfuse Dashboard

Note: Telemetry is optional but recommended for debugging and performance optimization.

License

This repository is publicly distributed under the Apache License 2.0. Apache 2.0 is a permissive open-source license that allows commercial use, modification, and redistribution, provided that redistributions preserve the applicable license, copyright, patent, trademark, attribution, and NOTICE notices and that modified files carry prominent notices stating that changes were made.

For contribution and licensing details, see:

NOTICE
docs/licensing-notes.md
CLA/INDIVIDUAL_CLA.md
CLA/EMPLOYER_AUTHORIZATION.md

Citation

If you find this work useful, please cite:

@article{legrand2026mimosa,
  title={Mimosa Framework: Toward Evolving Multi-Agent Systems for Scientific Research},
  author={Legrand, Martin and Jiang, Tao and Feraud, Matthieu and Navet, Benjamin and Taghzouti, Yousouf and Gandon, Fabien and Dumont, Elise and Nothias, Louis-F{\'e}lix},
  journal={arXiv preprint arXiv:2603.28986},
  year={2026}
}

Mimosa-AI