variant-triage

agent
Security Audit
Fail
Health Warn
  • No license — Repository has no license file
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 15 GitHub stars
Code Fail
  • eval() — Dynamic code execution via eval() in .venv/lib/python3.12/site-packages/_pytest/_code/code.py
  • exec() — Shell command execution in .venv/lib/python3.12/site-packages/_pytest/_code/code.py
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose
This agent is a backend service and API pipeline for clinical genomics. It ingests genomic data, applies deterministic classification rules, and uses a constrained Large Language Model (LLM) to generate reproducible, evidence-grounded variant interpretations.

Security Assessment
The tool interacts with sensitive clinical and genomic data. It requires network requests to function, as it relies on external APIs for genomic annotations (such as ClinVar and gnomAD) and LLM processing. The automated scan flagged dynamic code execution (`eval()`) and shell command execution (`exec()`), but these are false positives originating from the local Python testing environment (pytest `.venv`), not the tool's actual source code. No hardcoded secrets or dangerous broad permissions were found. The codebase explicitly includes a clinical disclaimer noting it is a portfolio project using only synthetic data. Overall risk is assessed as Low.

Quality Assessment
The project appears to be actively maintained, with repository activity as recent as today. It features a comprehensive README, dedicated documentation for security considerations, and a mature tech stack (FastAPI, PostgreSQL, Docker). However, it currently lacks an open-source license, which means default copyright laws legally apply and could restrict enterprise adoption. Community trust is minimal at the moment, represented by 15 GitHub stars, indicating a small but present user base.

Verdict
Safe to use, provided you are aware it is an unlicensed portfolio project intended for synthetic data rather than clinical diagnosis.
SUMMARY

API-first variant triage pipeline combining genomic filtering, annotation, and LLM-driven interpretation for clinical genomics workflows

README.md

variant-triage

variant-triage

A backend service for deterministic, evidence-grounded variant classification and interpretation.

Unlike typical LLM-driven tools that generate plausible outputs, this system constrains reasoning through structured rules (ACMG/AMP), curated evidence (ClinVar, gnomAD), and explicit decision paths to produce reproducible, auditable results.

Designed to model how clinical genomics workflows can be implemented as testable, production-style software systems rather than ad hoc analysis pipelines.

Most LLM approaches generate plausible interpretations. This system is designed to produce reproducible ones.

Stack: Python 3.12 · FastAPI · PostgreSQL 16 · SQLAlchemy 2 · Nextflow DSL2 · Docker · Fly.io

This is a portfolio project using only synthetic data. See CLINICAL_DISCLAIMER.md.


Live Demo

The app may take ~30 seconds to wake from cold start on the free tier.


Documentation


Why this matters

Variant interpretation is not just about generating an answer — it is about producing results that can be trusted, reproduced, and audited.

Most current approaches fall into two categories:

  • Rule-based pipelines — deterministic but rigid and hard to extend
  • LLM-driven tools — flexible but opaque and difficult to validate

This project explores a third approach:

Controlled reasoning — combining deterministic classification logic with constrained LLM assistance to produce outputs that are both flexible and reliable.

Overview

Variant interpretation is often performed through a combination of pipelines, scripts, and manual review. This project explores how that process can be expressed as a structured application with deterministic classification logic, explicit data models, traceable decision-making, and a consistent API surface.

The goal is to bridge the gap between bioinformatics workflows and production-facing services used in clinical or translational settings.


What this demonstrates

  • Controlled LLM reasoning - model outputs are constrained, validated, and grounded in curated evidence rather than free-form generation

  • End-to-end system design - VCF ingestion through classification, LLM-assisted interpretation (with guardrails and constrained outputs), and REST API exposure

  • Separation of concerns - clear boundaries between domain logic, persistence, and API layer

  • Reproducibility and testability - deterministic classification logic with 170+ tests and full CI

  • Operational awareness - JWT authentication, audit logging, containerised deployment with CI/CD

  • Clinical domain knowledge - ACMG/AMP 2015 germline rules, AMP/ASCO/CAP somatic tiering, ClinVar and gnomAD evidence integration

  • Extensibility - plugin architecture for classification rules, protocol-based evidence sources


Architecture

flowchart TD
    A[VCF File\nshort-read / long-read] --> B[vcf_parser\ncyvcf2]
    B --> C[VCFRecord\nDomain Model]
    C --> D{Origin?}
    D -->|GERMLINE| E[ACMG Engine\n10 rules\nPVS1 · PS1 · PM1-5 · PP2/3]
    D -->|SOMATIC| F[AMP/ASCO/CAP Engine\n4 tiers\nCIViC · OncoKB · hotspots]
    E --> G[Evidence Clients\ngnomAD · ClinVar · CADD]
    F --> H[Evidence Clients\nCIViC · OncoKB · gnomAD]
    G --> I[ClassificationResult\nPathogenic → Benign]
    H --> J[SomaticResult\nTier I → IV]
    I --> K[FastAPI\nJWT auth · audit log]
    J --> K
    K --> L[(PostgreSQL 16\nSample · Variant\nClassification · AuditLog)]
    K --> M[LLM Assistant\nClaude · guardrails]
    N[Nextflow DSL2\nbcftools norm → VEP] -.->|pre-process| A

Design decisions

  • Classification logic as pure functions - deterministic behaviour, straightforward to test in isolation
  • Plugin architecture for ACMG rules - each rule is an independent class implementing a common interface, making additions and overrides explicit
  • Async evidence clients with in-memory caching - gnomAD GraphQL and ClinVar E-utilities run concurrently per variant, results cached to avoid duplicate lookups within a batch
  • Audit logging with SHA-256 payload hashing - tamper-evident record of all requests without storing raw patient data
  • LLM guardrails - regex-based checks on model output prevent diagnosis statements and treatment recommendations from reaching callers
  • Graceful degradation - OncoKB and the LLM assistant both degrade to no-op if API tokens are absent, keeping the core classifier functional

Quickstart

Prerequisites

  • Docker ≥ 24 and Docker Compose v2
  • Python 3.12 (for local development)

Run with Docker Compose

git clone https://github.com/plobb/variant-triage
cd variant-triage
cp .env.example .env
# Set SECRET_KEY in .env

docker-compose up --build

API available at http://localhost:8000. Swagger UI at http://localhost:8000/docs.

Local development

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

cp .env.example .env
# Edit .env with your database URL and secret key

alembic upgrade head
uvicorn app.api.main:app --reload

Testing

# Run all 170+ tests
pytest tests/

# With coverage report
pytest tests/ --cov=app --cov-report=term-missing

# Type checking
mypy --strict app/

# Lint
ruff check app/ tests/

Project roadmap

Phase Scope Status
1 - Foundation Domain models, VCF parser (short + long-read), DB schema, CI ✅ Complete
2 - API layer FastAPI routes, JWT auth, audit logging middleware ✅ Complete
3 - ACMG engine 10-rule germline classifier, gnomAD + ClinVar evidence clients ✅ Complete
4 - Somatic AMP/ASCO/CAP tiering, CIViC + OncoKB evidence clients ✅ Complete
5 - Nextflow DSL2 pipeline: bcftools normalise → VEP annotation ✅ Complete
6 - LLM assistant Claude-powered interpretation drafts with clinical guardrails ✅ Complete
7 - Deployment Fly.io deploy, GitHub Actions CI/CD, security documentation ✅ Complete

Related work

  • celltype-agent - agentic cell type annotation for single-cell and spatial genomics data (10x Chromium, Visium, Xenium) using Claude and curated marker databases

Reviews (0)

No results found