migratowl

agent
Security Audit
Warn
Health Warn
  • License — License: Apache-2.0
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Pass
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

AI dependency migration analyzer — reads every changelog so you don't have to.

README.md

Migratowl

Migratowl

AI-powered dependency migration analyzer.
Discovers breaking upgrades, explains exactly what failed, and tells you how to fix it.

Python 3.13+ License


Migratowl PR comment example


What It Does

Migratowl answers one question: "If I upgrade this dependency, will anything break — and how do I fix it?"

It receives a webhook, clones the target repository, scans all dependency manifests, queries package registries for newer versions, and runs the project inside an isolated Kubernetes sandbox with every dependency bumped. An AI agent executes the test suite, reads the error output, fetches the relevant changelog, and produces a structured report per dependency.

The result tells developers:

  • Whether the upgrade is breaking
  • What specifically went wrong
  • A verbatim citation from the changelog
  • A plain-English fix suggestion
  • A confidence score (0.0–1.0)

Table of Contents


Supported Ecosystems

Language Manifest files Registry
Python pyproject.toml, requirements.txt PyPI
Node.js package.json npm
Go go.mod proxy.golang.org
Rust Cargo.toml crates.io
Java pom.xml (Maven), build.gradle (Gradle) Maven Central

How It Works

Migratowl runs a four-phase agent workflow inside an ephemeral Kubernetes sandbox.

POST /webhook
     │
     ▼
┌─────────────────────────────────────────────────────────┐
│  Phase 1 — Setup                                        │
│                                                         │
│  clone_repo ──► detect_languages ──► scan_dependencies  │
│                                           │             │
│                                    check_outdated_deps  │
└─────────────────────────┬───────────────────────────────┘
                          │ outdated dep list
                          ▼
┌─────────────────────────────────────────────────────────┐
│  Phase 2 — Main Analysis                                │
│                                                         │
│  copy_source("main") ──► update_dependencies (all)      │
│                               │                         │
│                        execute_project (install + test) │
└─────────────────────────┬───────────────────────────────┘
                          │ pass / fail + error output
                          ▼
┌─────────────────────────────────────────────────────────┐
│  Phase 3 — Confidence Scoring                           │
│                                                         │
│  All pass ──► every package: is_breaking=false, conf=1  │
│                                                         │
│  Some fail ──► assign confidence per package            │
│    conf ≥ threshold ──► fetch_changelog + write report  │
│    conf < threshold ──► delegate to package-analyzer    │
│                          subagent (isolated run)        │
└─────────────────────────┬───────────────────────────────┘
                          │ AnalysisReport[]
                          ▼
┌─────────────────────────────────────────────────────────┐
│  Phase 4 — Compile Results                              │
│                                                         │
│  Merge reports from main agent + subagents              │
│  ──► ScanAnalysisReport (POST to callback_url)          │
└─────────────────────────────────────────────────────────┘

Confidence scoring rules (applied in Phase 3 when tests fail):

  • Error message directly names the package → high confidence (≥ 0.8)
  • Import or attribute error for a known package API → high confidence
  • Major version jump (e.g. 2.x → 3.x) → moderate confidence boost
  • Generic failure with no clear link → low confidence (< 0.5)

The default confidence threshold is 0.7 (configurable via MIGRATOWL_CONFIDENCE_THRESHOLD).

Sandbox workspace layout:

/home/user/workspace/
├── source/          # Immutable clone — never executed
├── main/            # All deps bumped, executed in Phase 2
└── <package-name>/  # Per-package isolation (created on demand by subagent)

Quick Start

Prerequisites: Python 3.13+, uv, Docker, minikube, kubectl.

# 1. Install dependencies
uv sync

# 2. Configure environment
cp .env.example .env
# Edit .env — set at minimum: ANTHROPIC_API_KEY

# 3. Start local Kubernetes cluster
minikube start --driver=docker --memory=8192 --cpus=4

# 4. Install agent-sandbox controller and CRDs
kubectl apply -f https://github.com/kubernetes-sigs/agent-sandbox/releases/download/v0.1.0/manifest.yaml
kubectl apply -f https://github.com/kubernetes-sigs/agent-sandbox/releases/download/v0.1.0/extensions.yaml

# 5. Build sandbox runner image inside minikube
eval $(minikube docker-env)
docker build -t sandbox-runtime:latest k8s/runtime/

# 6. Apply RBAC and sandbox template
kubectl apply -f k8s/rbac.yaml
kubectl apply -f k8s/sandbox-template.yaml

# 7. Start the server
uv run uvicorn migratowl.api.main:app --reload

Trigger a scan:

curl -X POST http://localhost:8000/webhook \
  -H 'Content-Type: application/json' \
  -d '{
    "repo_url": "https://github.com/org/repo",
    "callback_url": "https://yourservice.example.com/results"
  }'
# → {"job_id": "...", "status_url": "/jobs/..."}

API Reference

POST /webhook

Accepts a scan request. Returns 202 Accepted immediately; analysis runs in the background and POSTs the result to callback_url when done.

Request body (ScanWebhookPayload):

Field Type Default Description
repo_url string required Git repository URL to scan
branch_name string "main" Branch to clone and analyze
git_provider "github" | "gitlab" "github" Git provider — determines which API is used for PR/MR comments and commit statuses
pr_number integer | null null PR (GitHub) or MR IID (GitLab) — when set, Migratowl posts a comment with the analysis result
commit_sha string | null null Full commit SHA — when set, Migratowl posts a pending status at scan start and a success/failure status on completion
callback_url string | null null URL to POST ScanAnalysisReport on completion
exclude_deps string[] [] Dependency names to skip entirely
check_deps string[] [] When non-empty, only these dependencies are checked (all others are ignored)
max_deps integer 50 Maximum outdated deps to analyze (must be > 0)
ecosystems string[] | null null Limit to specific ecosystems: "python", "nodejs", "go", "rust", "java". null = auto-detect all
mode string "normal" Version resolution mode — see below
include_prerelease boolean false When true, pre-release versions (alpha, beta, RC) are considered when finding the latest version

Version resolution modes (mode):

Mode Behaviour
"safe" Respects the declared semver constraint. ^4.21.2 only reports a newer version if one exists within the >=4.21.2,<5.0.0 range. A package already at the top of its pinned range is reported as up-to-date even when a new major exists.
"normal" Ignores the constraint operator. ^4.21.2 compares the bare version 4.21.2 against the globally highest published version — including major bumps like 5.x.

Example:

{
  "repo_url": "https://github.com/org/repo",
  "branch_name": "main",
  "git_provider": "github",
  "pr_number": 42,
  "commit_sha": "abc123...",
  "callback_url": "https://yourservice.example.com/results",
  "exclude_deps": ["boto3"],
  "check_deps": [],
  "max_deps": 20,
  "ecosystems": ["python"],
  "mode": "normal",
  "include_prerelease": false
}

202 response (WebhookAcceptedResponse):

{
  "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "status_url": "/jobs/3fa85f64-5717-4562-b3fc-2c963f66afa6"
}

GET /jobs/{job_id}

Poll the status of a scan job.

Response (JobStatus):

Field Type Description
job_id string UUID assigned at webhook acceptance
state string Job lifecycle state (see below)
created_at datetime ISO 8601, UTC
updated_at datetime ISO 8601, UTC
payload ScanWebhookPayload Original request payload
result ScanAnalysisReport | null Set when state = "completed"
error string | null Set when state = "failed"

Job lifecycle:

PENDING ──► RUNNING ──► COMPLETED
                   └──► FAILED
State Meaning
pending Queued, not yet started (v1 runs one scan at a time)
running Agent is actively analyzing the repository
completed Analysis finished; result is populated
failed Unrecoverable error; error describes what went wrong

404 when job_id is not found.


GET /healthz

Liveness check. Returns 200 {"status": "ok"} when the server is running.


Response Schema

The ScanAnalysisReport delivered to callback_url (and returned in GET /jobs/{job_id} when completed):

ScanAnalysisReport
├── repo_url                  string    — repository that was analyzed
├── branch_name               string    — branch that was cloned
├── scan_result               ScanResult
│   ├── all_deps              Dependency[]   — every declared dependency found
│   │   ├── name              string
│   │   ├── current_version   string
│   │   ├── ecosystem         string
│   │   └── manifest_path     string
│   ├── outdated              OutdatedDependency[]  — deps with newer versions
│   │   ├── name              string
│   │   ├── current_version   string
│   │   ├── latest_version    string
│   │   ├── ecosystem         string
│   │   ├── manifest_path     string
│   │   ├── homepage_url      string | null
│   │   ├── repository_url    string | null
│   │   └── changelog_url     string | null
│   ├── manifests_found       string[]  — manifest file paths discovered
│   └── scan_duration_seconds float
├── reports                   AnalysisReport[]  — one per analyzed package
│   ├── dependency_name       string
│   ├── is_breaking           bool
│   ├── error_summary         string    — what failed (empty if not breaking)
│   ├── changelog_citation    string    — verbatim excerpt from changelog
│   ├── suggested_human_fix   string    — plain-English remediation step
│   └── confidence            float     — 0.0–1.0
├── skipped                   string[]  — package names not analyzed
└── total_duration_seconds    float

Example report entry:

{
  "dependency_name": "requests",
  "is_breaking": true,
  "error_summary": "ImportError: cannot import name 'PreparedRequest'",
  "changelog_citation": "## 3.0.0 — Removed PreparedRequest from the public API.",
  "suggested_human_fix": "Replace `from requests import PreparedRequest` with `requests.models.PreparedRequest`.",
  "confidence": 0.95
}

Configuration

All MIGRATOWL_* variables are optional (defaults shown). Third-party SDK keys use their standard names without the MIGRATOWL_ prefix.

LLM

Variable Default Description
ANTHROPIC_API_KEY Required when MIGRATOWL_MODEL_PROVIDER=anthropic (default)
OPENAI_API_KEY Required when MIGRATOWL_MODEL_PROVIDER=openai
MIGRATOWL_MODEL_PROVIDER anthropic LLM provider: anthropic or openai
MIGRATOWL_MODEL_NAME claude-sonnet-4-6 Model name (must match provider)
MIGRATOWL_MODEL_RATE_LIMIT_RPS 0.1 Max LLM requests/second (0.1 = 6 req/min)
ANTHROPIC_BASE_URL Custom base URL for Anthropic API
OPENAI_BASE_URL Custom base URL for OpenAI API

Kubernetes Sandbox

Variable Default Description
MIGRATOWL_SANDBOX_TEMPLATE migratowl-sandbox-template agent-sandbox AgentSandboxTemplate name
MIGRATOWL_SANDBOX_NAMESPACE default Kubernetes namespace for sandbox pods
MIGRATOWL_SANDBOX_CONNECTION_MODE tunnel Connection mode: tunnel or direct
MIGRATOWL_WORKSPACE_PATH /home/user/workspace Workspace root inside the sandbox

Analysis

Variable Default Description
MIGRATOWL_CONFIDENCE_THRESHOLD 0.7 Packages above this are analyzed directly; below → subagent
MIGRATOWL_SCAN_REGISTRY_CONCURRENCY 10 Concurrent registry queries when checking outdated deps
MIGRATOWL_MAX_OUTPUT_CHARS 30000 Truncation limit for sandbox command output
MIGRATOWL_MAX_CHANGELOG_CHARS 15000 Truncation limit for fetched changelogs
MIGRATOWL_MAX_OUTDATED_DEPS 100 Hard cap on registry scan results

HTTP Client

Variable Default Description
MIGRATOWL_HTTP_TIMEOUT 30.0 Outbound request timeout (seconds)
MIGRATOWL_HTTP_RETRY_COUNT 3 Retries on 429 / 5xx responses
MIGRATOWL_HTTP_RETRY_BACKOFF_BASE 0.5 Base delay (seconds) for exponential backoff

API Server

Variable Default Description
MIGRATOWL_API_HOST 0.0.0.0 Bind address
MIGRATOWL_API_PORT 8000 Bind port

Git Providers

Variable Default Description
GITHUB_TOKEN GitHub personal access token; needs repo:status and public_repo (or repo for private repos) scopes to post PR comments and commit statuses
GITHUB_API_URL https://api.github.com Override for GitHub Enterprise Server (e.g. https://github.corp.com/api/v3)
GITLAB_TOKEN GitLab personal access token with api scope; needed to post MR comments and commit statuses
GITLAB_API_URL https://gitlab.com/api/v4 Override for self-hosted GitLab

Observability

Variable Default Description
LANGFUSE_PUBLIC_KEY Enables LangFuse tracing when both keys are set
LANGFUSE_SECRET_KEY See above
LANGFUSE_HOST https://cloud.langfuse.com LangFuse instance URL

Kubernetes Setup

Migratowl uses langchain-kubernetes in agent-sandbox mode by default, which requires the kubernetes-sigs/agent-sandbox controller and CRDs installed in your cluster. This provides warm pod pools and gVisor/Kata isolation.

# Install controller + CRDs (one-time)
kubectl apply -f https://github.com/kubernetes-sigs/agent-sandbox/releases/download/v0.1.0/manifest.yaml
kubectl apply -f https://github.com/kubernetes-sigs/agent-sandbox/releases/download/v0.1.0/extensions.yaml

# Build runtime image (must be visible to the cluster — use minikube docker-env locally)
eval $(minikube docker-env)
docker build -t sandbox-runtime:latest k8s/runtime/

# Apply manifests
kubectl apply -f k8s/rbac.yaml
kubectl apply -f k8s/sandbox-template.yaml

Optional warm pool (reduces cold-start latency):

kubectl apply -f k8s/warm-pool.yaml

Raw mode fallback — if you can't install the agent-sandbox controller, switch to raw mode (works on any cluster, no CRDs required):

MIGRATOWL_SANDBOX_CONNECTION_MODE=direct  # set in .env

Then install langchain-kubernetes[raw] instead of langchain-kubernetes[agent-sandbox]. Raw mode manages ephemeral pods directly and attaches a deny-all NetworkPolicy for isolation.

Security defaults applied to every pod:

  • runAsNonRoot: true, runAsUser: 1000
  • allowPrivilegeEscalation: false, capabilities.drop: [ALL]
  • automountServiceAccountToken: false
  • Deny-all NetworkPolicy (ingress + egress)

Observability

Migratowl integrates with LangFuse for trace-level observability. Tracing is off by default and activates when both keys are present.

# .env
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com   # or your self-hosted instance

When enabled, every scan produces a LangFuse session (keyed by job_id) containing:

  • Main agent trace — all LLM calls and tool invocations
  • Tool call spansclone_repo, scan_dependencies, execute_project, etc.
  • Subagent spanspackage-analyzer subagent runs nested under the parent trace

No additional code changes are needed — the observability.py module initializes the handler at startup and patches the LangGraph graph to inject session IDs automatically.


GitHub Actions / Dependabot

Migratowl can be triggered automatically from a GitHub Actions workflow whenever Dependabot opens or updates a PR. A ready-to-use workflow is provided at docs/examples/dependabot-scan.yml — copy it into .github/workflows/ in any repo you want to scan.

What the workflow does:

  1. Fires on pull_request events from dependabot[bot]
  2. POSTs to your Migratowl instance with the repo URL, branch, PR number, and commit SHA
  3. Migratowl sets a pending commit status immediately, then posts a PR comment with the full analysis table and sets a success or failure status when done

Setup:

# 1. Set MIGRATOWL_URL as a repository Actions variable
#    (Settings → Secrets and variables → Actions → Variables)
#    e.g. https://migratowl.internal.yourcompany.com

# 2. Ensure Migratowl is configured with a GitHub token:
GITHUB_TOKEN=ghp_...   # needs repo:status + public_repo (or repo for private)

For GitLab, change "git_provider": "github" to "gitlab" in the curl payload and configure:

GITLAB_TOKEN=glpat-...
GITLAB_API_URL=https://gitlab.com/api/v4   # or your self-hosted URL

GitHub Enterprise Server — set GITHUB_API_URL to your GHES API endpoint:

GITHUB_API_URL=https://github.corp.com/api/v3

Architecture

                          ┌─────────────────────────────┐
  HTTP client             │          FastAPI            │
  ─────────────────────►  │  POST /webhook              │
                          │  GET  /jobs/{id}            │
                          │  GET  /healthz              │
                          └──────────────┬──────────────┘
                                         │ asyncio.create_task
                                         ▼
                          ┌─────────────────────────────┐
                          │     Migratowl Agent         │
                          │  (deepagents / LangGraph)   │
                          │                             │
                          │  Tools:                     │
                          │  • clone_repo               │
                          │  • detect_languages         │
                          │  • scan_dependencies        │
                          │  • check_outdated_deps      │
                          │  • copy_source              │
                          │  • update_dependencies      │
                          │  • execute_project          │
                          │  • fetch_changelog          │
                          │  • read_manifest            │
                          │  • patch_manifest           │
                          │                             │
                          │  Subagent:                  │
                          │  • package-analyzer         │
                          └──────────────┬──────────────┘
                                         │ executes via
                                         ▼
                          ┌─────────────────────────────┐
                          │   Kubernetes Sandbox        │
                          │  (langchain-kubernetes)     │
                          │                             │
                          │  Ephemeral Pod              │
                          │  • Non-root, no caps        │
                          │  • Deny-all NetworkPolicy   │
                          │  • gVisor / Kata isolation  │
                          └─────────────────────────────┘

Project Layout

migratowl/
├── api/
│   ├── main.py          # FastAPI app, /webhook + /jobs endpoints, lifespan
│   ├── jobs.py          # In-memory JobStore (PENDING→RUNNING→COMPLETED|FAILED)
│   └── helpers.py       # build_user_message, extract_report
├── agent/
│   ├── graph.py         # graph singleton + sandbox lifecycle (langgraph.json entrypoint)
│   ├── factory.py       # create_migratowl_agent() — builds the LangGraph
│   ├── sandbox.py       # KubernetesProvider init/teardown helpers
│   ├── subagents.py     # package-analyzer subagent definition
│   ├── session_graph.py # Patches ainvoke/astream to inject LangFuse session IDs
│   └── tools/
│       ├── clone.py     # clone_repo, copy_source
│       ├── detect.py    # detect_languages
│       ├── scan.py      # scan_dependencies
│       ├── registry.py  # check_outdated_deps
│       ├── update.py    # update_dependencies
│       ├── execute.py   # execute_project (runs install + test in sandbox)
│       ├── changelog.py # fetch_changelog (PyPI / npm / GitHub / raw HTTP)
│       └── manifest.py  # read_manifest, patch_manifest (sandbox file I/O)
├── models/
│   └── schemas.py       # All Pydantic models (ScanWebhookPayload, ScanAnalysisReport, …)
├── config.py            # pydantic-settings Settings class (MIGRATOWL_ prefix)
├── observability.py     # LangFuse CallbackHandler setup + session ID injection
├── registry.py          # Registry query logic (PyPI, npm, crates.io, Go proxy)
├── parsers.py           # Manifest parsers per ecosystem
├── changelog.py         # Changelog fetch strategies (multi-strategy fallback)
├── patches.py           # Dependency version patching helpers
└── http.py              # Shared HTTPX async client with retry logic

k8s/
├── rbac.yaml            # ServiceAccount + ClusterRole for sandbox management
├── sandbox-template.yaml# AgentSandboxTemplate CRD for the runner pod
├── warm-pool.yaml       # Optional warm pool for faster pod startup
├── sandbox-router.yaml  # Optional sandbox router service
└── runtime/             # Dockerfile + entrypoint for the sandbox runner image

tests/                   # Mirrors migratowl/ package structure

Development

Task Command
Install uv sync
Run uv run uvicorn migratowl.api.main:app --reload
Test uv run pytest tests/ -v
Lint uv run ruff check migratowl/

TDD is mandatory for all production code in migratowl/. The Red-Green-Refactor cycle is enforced: write a failing test first, confirm RED, write minimal code to pass, confirm GREEN, then refactor. No production code without a corresponding test in tests/. See CLAUDE.md for details.


Contributing

See CONTRIBUTING.md. All contributors must sign the CLA.

  1. Open an issue first
  2. Branch: issue/<NUMBER>-short-description
  3. Write a failing test before any production code (TDD — no exceptions)
  4. Open a PR with Closes #<NUMBER>

License

BSD 3-Clause — see LICENSE.

Reviews (0)

No results found