variant-triage
Health Warn
- No license — Repository has no license file
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 15 GitHub stars
Code Fail
- eval() — Dynamic code execution via eval() in .venv/lib/python3.12/site-packages/_pytest/_code/code.py
- exec() — Shell command execution in .venv/lib/python3.12/site-packages/_pytest/_code/code.py
Permissions Pass
- Permissions — No dangerous permissions requested
This agent is a backend service and API pipeline for clinical genomics. It ingests genomic data, applies deterministic classification rules, and uses a constrained Large Language Model (LLM) to generate reproducible, evidence-grounded variant interpretations.
Security Assessment
The tool interacts with sensitive clinical and genomic data. It requires network requests to function, as it relies on external APIs for genomic annotations (such as ClinVar and gnomAD) and LLM processing. The automated scan flagged dynamic code execution (`eval()`) and shell command execution (`exec()`), but these are false positives originating from the local Python testing environment (pytest `.venv`), not the tool's actual source code. No hardcoded secrets or dangerous broad permissions were found. The codebase explicitly includes a clinical disclaimer noting it is a portfolio project using only synthetic data. Overall risk is assessed as Low.
Quality Assessment
The project appears to be actively maintained, with repository activity as recent as today. It features a comprehensive README, dedicated documentation for security considerations, and a mature tech stack (FastAPI, PostgreSQL, Docker). However, it currently lacks an open-source license, which means default copyright laws legally apply and could restrict enterprise adoption. Community trust is minimal at the moment, represented by 15 GitHub stars, indicating a small but present user base.
Verdict
Safe to use, provided you are aware it is an unlicensed portfolio project intended for synthetic data rather than clinical diagnosis.
API-first variant triage pipeline combining genomic filtering, annotation, and LLM-driven interpretation for clinical genomics workflows
variant-triage

A backend service for deterministic, evidence-grounded variant classification and interpretation.
Unlike typical LLM-driven tools that generate plausible outputs, this system constrains reasoning through structured rules (ACMG/AMP), curated evidence (ClinVar, gnomAD), and explicit decision paths to produce reproducible, auditable results.
Designed to model how clinical genomics workflows can be implemented as testable, production-style software systems rather than ad hoc analysis pipelines.
Most LLM approaches generate plausible interpretations. This system is designed to produce reproducible ones.
Stack: Python 3.12 · FastAPI · PostgreSQL 16 · SQLAlchemy 2 · Nextflow DSL2 · Docker · Fly.io
This is a portfolio project using only synthetic data. See CLINICAL_DISCLAIMER.md.
Live Demo
- API: https://variant-triage.fly.dev
- Swagger UI: https://variant-triage.fly.dev/docs
- Health check: https://variant-triage.fly.dev/health
The app may take ~30 seconds to wake from cold start on the free tier.
Documentation
- OVERVIEW.md - plain-English explanation of what this project does and why
- TUTORIAL.md - end-to-end walkthrough with curl examples
- SECURITY_CONSIDERATIONS.md - compliance and security notes
- CLINICAL_DISCLAIMER.md - research-only status
Why this matters
Variant interpretation is not just about generating an answer — it is about producing results that can be trusted, reproduced, and audited.
Most current approaches fall into two categories:
- Rule-based pipelines — deterministic but rigid and hard to extend
- LLM-driven tools — flexible but opaque and difficult to validate
This project explores a third approach:
Controlled reasoning — combining deterministic classification logic with constrained LLM assistance to produce outputs that are both flexible and reliable.
Overview
Variant interpretation is often performed through a combination of pipelines, scripts, and manual review. This project explores how that process can be expressed as a structured application with deterministic classification logic, explicit data models, traceable decision-making, and a consistent API surface.
The goal is to bridge the gap between bioinformatics workflows and production-facing services used in clinical or translational settings.
What this demonstrates
Controlled LLM reasoning - model outputs are constrained, validated, and grounded in curated evidence rather than free-form generation
End-to-end system design - VCF ingestion through classification, LLM-assisted interpretation (with guardrails and constrained outputs), and REST API exposure
Separation of concerns - clear boundaries between domain logic, persistence, and API layer
Reproducibility and testability - deterministic classification logic with 170+ tests and full CI
Operational awareness - JWT authentication, audit logging, containerised deployment with CI/CD
Clinical domain knowledge - ACMG/AMP 2015 germline rules, AMP/ASCO/CAP somatic tiering, ClinVar and gnomAD evidence integration
Extensibility - plugin architecture for classification rules, protocol-based evidence sources
Architecture
flowchart TD
A[VCF File\nshort-read / long-read] --> B[vcf_parser\ncyvcf2]
B --> C[VCFRecord\nDomain Model]
C --> D{Origin?}
D -->|GERMLINE| E[ACMG Engine\n10 rules\nPVS1 · PS1 · PM1-5 · PP2/3]
D -->|SOMATIC| F[AMP/ASCO/CAP Engine\n4 tiers\nCIViC · OncoKB · hotspots]
E --> G[Evidence Clients\ngnomAD · ClinVar · CADD]
F --> H[Evidence Clients\nCIViC · OncoKB · gnomAD]
G --> I[ClassificationResult\nPathogenic → Benign]
H --> J[SomaticResult\nTier I → IV]
I --> K[FastAPI\nJWT auth · audit log]
J --> K
K --> L[(PostgreSQL 16\nSample · Variant\nClassification · AuditLog)]
K --> M[LLM Assistant\nClaude · guardrails]
N[Nextflow DSL2\nbcftools norm → VEP] -.->|pre-process| A
Design decisions
- Classification logic as pure functions - deterministic behaviour, straightforward to test in isolation
- Plugin architecture for ACMG rules - each rule is an independent class implementing a common interface, making additions and overrides explicit
- Async evidence clients with in-memory caching - gnomAD GraphQL and ClinVar E-utilities run concurrently per variant, results cached to avoid duplicate lookups within a batch
- Audit logging with SHA-256 payload hashing - tamper-evident record of all requests without storing raw patient data
- LLM guardrails - regex-based checks on model output prevent diagnosis statements and treatment recommendations from reaching callers
- Graceful degradation - OncoKB and the LLM assistant both degrade to no-op if API tokens are absent, keeping the core classifier functional
Quickstart
Prerequisites
- Docker ≥ 24 and Docker Compose v2
- Python 3.12 (for local development)
Run with Docker Compose
git clone https://github.com/plobb/variant-triage
cd variant-triage
cp .env.example .env
# Set SECRET_KEY in .env
docker-compose up --build
API available at http://localhost:8000. Swagger UI at http://localhost:8000/docs.
Local development
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your database URL and secret key
alembic upgrade head
uvicorn app.api.main:app --reload
Testing
# Run all 170+ tests
pytest tests/
# With coverage report
pytest tests/ --cov=app --cov-report=term-missing
# Type checking
mypy --strict app/
# Lint
ruff check app/ tests/
Project roadmap
| Phase | Scope | Status |
|---|---|---|
| 1 - Foundation | Domain models, VCF parser (short + long-read), DB schema, CI | ✅ Complete |
| 2 - API layer | FastAPI routes, JWT auth, audit logging middleware | ✅ Complete |
| 3 - ACMG engine | 10-rule germline classifier, gnomAD + ClinVar evidence clients | ✅ Complete |
| 4 - Somatic | AMP/ASCO/CAP tiering, CIViC + OncoKB evidence clients | ✅ Complete |
| 5 - Nextflow | DSL2 pipeline: bcftools normalise → VEP annotation | ✅ Complete |
| 6 - LLM assistant | Claude-powered interpretation drafts with clinical guardrails | ✅ Complete |
| 7 - Deployment | Fly.io deploy, GitHub Actions CI/CD, security documentation | ✅ Complete |
Related work
- celltype-agent - agentic cell type annotation for single-cell and spatial genomics data (10x Chromium, Visium, Xenium) using Claude and curated marker databases
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found