CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding

🔥 News

[6 Apr 2026] 🎉 The paper has been accepted to ACL 2026!
[02 Dec 2025] 🧲 Added zero-shot device detection (13 types) powered by MedSigLIP.
[17 Oct 2025] 🔩 CCD has been upgraded to support view classification for chest X-rays — see the Supported Expert Models section for details.
[06 Oct 2025] 🎮 The online demo is available at Hugging Face Spaces. Feel free to try it out!
[30 Sep 2025] 🗂️ The processed test data for quick start are now available — enjoy exploring with the provided guidelines!
[27 Sep 2025] ⛳ Our preprint is now live on arXiv — check it out for details.

🎯 Call for Contribution

We welcome contributions from the community to CDD+ ! If you have ideas for new features or improvements, feel free to open an issue or contact us directly. We are especially interested in contributions that extend CCD to more label modalities, such as morphology (size/shape), anatomical location, and devices/lines/tubes.

Overview

Multimodal large language models (MLLMs) are advancing radiology by combining image and text understanding, but often generate inaccurate or unsupported clinical details—so-called medical hallucinations. We propose Clinical Contrastive Decoding (CCD), a training-free and retrieval-free inference framework that integrates structured clinical signals from task‑specific radiology expert models. CCD reduces hallucinations and improves clinical accuracy without changing the base model. Experiments show CCD boosts performance on multiple datasets and models, offering a practical way to make radiology MLLMs more reliable.

CCD's Framework

framework

⛏️ Installation

[!TIP]
Use uv for installation — it's faster and more reliable than pip.

Option 1:

Install the latest version directly from GitHub for quick setup:

uv pip install git+https://github.com/X-iZhang/CCD.git

[!NOTE]
Requirements: Python 3.9 or later, and a CUDA-compatible GPU (recommended)

Option 2:

If you plan to modify the code or contribute to the project, you can clone the repository and install it in editable mode:

Clone the repository and navigate to the project folder

git clone https://github.com/X-iZhang/CCD.git
cd CCD

Set up the environment and install in editable mode

conda create -n CCD python=3.10 -y
conda activate CCD
pip install uv # enable uv support
uv pip install -e .

🔄 Upgrade to the latest code base

git pull
uv pip install -e .

⚡ Quick Start

CLI Inference

You can perform inference directly from the command line using our CLI tool:

python -m ccd.run_ccd \
  --model-path "X-iZhang/libra-maira-2" \
  --image "./path/to/Chest_Xray.jpg" \
  --question "Is there evidence of any abnormalities?" \
  --max-new-tokens 128

Optional arguments:

Argument	Description	Default
`--alpha`	Clinical guidance weight (range: 0.0–1.0)	0.5
`--beta`	Expert token weight (range: 0.0–1.0)	0.5
`--gamma`	Token bias magnitude (range: 2, 5, 10)	10
`--expert-model`	Choice of expert model: `"DenseNet"`, `"MedSiglip"`, `"View"`, or `"Device"`	DenseNet

Script Inference

You can run inference programmatically using the ccd_eval function from ccd/run_ccd.py.
After installing this repository, you can easily launch a model (either your own trained model or ours) locally or in Google Colab.

from ccd import ccd_eval

# Run CCD inference on a chest X-ray
output = ccd_eval(
    model_path="X-iZhang/libra-maira-2",  # or your custom radiology MLLM
    image="./path/to/Chest_Xray.jpg",
    question="Describe the findings in this chest X-ray.",
    alpha=0.5,        # Clinical guidance weight
    beta=0.5,         # Expert token weight
    gamma=10,         # Token bias magnitude
    temperature=0.9,  # Sampling temperature
    top_p=0.9,        # Nucleus sampling probability
    top_k=50,         # Top-k sampling
    expert_model="DenseNet",    # or "MedSiglip" or "View" or "Device"
    max_new_tokens=256
)
print(output)

💡 You can also use run_eval to test the original model output (without CCD).

from ccd import run_eval

# Run standard inference without CCD
output = run_eval(
    model_path="X-iZhang/libra-maira-2",
    image="./path/to/Chest_Xray.jpg",
    question="Describe the findings in this chest X-ray.",
    max_new_tokens=128,
    num_beams=1
)
print(output)

👉 run_eval also supported batch inference using a list of images and questions.

Gradio Web Interface

You can launch the Gradio demo locally with:

python -m ccd.app

Or try it directly on 🤗 Hugging Face Spaces 🤗.

Once the Gradio web interface is launched, you can open it using the URL printed on your screen. You will notice that both the default MAIRA-2 model and the expert models are ready for setup, with more models available in the list. Simply upload a chest X-ray image, enter your question, and click 🚀Generate to view the results!

demo

🛠️ Advanced Usage

Supported MLLM Models

CCD is compatible with any radiology MLLM that follows the Libra/LLaVA architecture:

[!NOTE]
To switch MLLM models, simply set the --model-path argument (CLI) or model_path parameter (Python) to one of the following checkpoints.

Model	Checkpoint
Libra-v1.0-7B	X-iZhang/libra-v1.0-7b
Libra-v1.0-3B	X-iZhang/libra-v1.0-3b
MAIRA-2	X-iZhang/libra-maira-2
LLaVA-Med-v1.5	X-iZhang/libra-llava-med-v1.5-mistral-7b
LLaVA-Rad	X-iZhang/libra-llava-rad
Med-CXRGen-F	X-iZhang/Med-CXRGen-F
Med-CXRGen-I	X-iZhang/Med-CXRGen-I

[!WARNING]
The model adapted from the Libra repository is intended for demonstration purposes only. For accurate evaluation, please refer to the original model weights and configuration settings, particularly the chat template.

Supported Expert Models

CCD integrates four expert models' signals for clinical signal extraction:

[!NOTE]
To switch expert models, simply set the --expert-model argument (CLI) or expert_model parameter (Python) to one of the following names.

Model	Checkpoint	Note
DenseNet	torchxrayvision/densenet121-res224-chex	CheXpert (Stanford)
MedSiglip	google/medsiglip-448	Variant of SigLIP
View Model	ChestViewSplit	'Frontal' or 'Lateral'
Device Model	google/medsiglip-448	Zero-shot detection of 13 device types or 'No Device'.

[!TIP]
When deploying DenseNet, it has been upgraded to support the view classification expert model, which helps the system better understand the view position of chest X-rays, thereby improving the accuracy of report generation. MedSigLIP has also been configured accordingly. The design is inspired by the MAIRA-2 chat template.

Parameter Settings

alpha (0.0-1.0): Weight for clinical guidance text
- Higher = more influence from expert-generated guidance
- Recommended: 0.3-0.7
beta (0.0-1.0): Weight for direct token biasing
- Higher = stronger push toward clinical terminology
- Recommended: 0.3-0.7
gamma (2, 5, 10): Maximum token bias magnitude
- 2: Subtle influence
- 5: Moderate influence
- 10: Strong influence (default)

[!TIP]
These parameters can be set beyond the recommended range for adversarial testing to observe CCD’s behaviour under extreme conditions.

🗂️ Dataset

CCD supports multiple medical imaging datasets commonly used in radiology research:

MIMIC-CXR — Chest X-ray images with corresponding radiology reports.
IU-Xray — Chest X-ray dataset with structured annotations.
CheXpert Plus — Large-scale dataset for chest X-ray interpretation.
Medical-CXR-VQA — A dataset for visual question answering in chest X-rays.

[!NOTE]
To facilitate hands-on testing, we provide pre-processed test splits for MIMIC-CXR, IU-Xray, CheXpert Plus and Medical-CXR-VQA, available on Hugging Face Collections.

[!WARNING]
Carefully read the READMEs; Please note that the image quality of these datasets has been compressed for efficient storage and sharing. Use the original datasets for evaluation.

📊 Evaluation

For evaluating generated reports, we recommend using RadEval — a unified framework for radiology text evaluation that integrates multiple standard metrics. Details can be found in the GitHub repository.

You can install RadEval via pip:

pip install RadEval

[!TIP]
RadEval supports metrics such as BLEU, ROUGE, BERTScore, CheXbert F1, and RadGraph F1, making it ideal for comprehensive evaluation of radiology report generation models.

📝 Citation

If you find our paper and code useful in your research and applications, please cite using this BibTeX:

@article{zhang2025ccd,
  title={CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding},
  author={Zhang, Xi and Meng, Zaiqiao and Lever, Jake and Ho, Edmond SL},
  journal={arXiv preprint arXiv:2509.23379},
  year={2025}
}

📚 Acknowledgments

This project builds upon the following outstanding open-source works:

Libra — A flexible toolkit supporting multiple radiology LLM backbones, covering the full pipeline from training to inference.
TorchXRayVision — A library for chest X-ray datasets and models.
MedSigLIP — Medical Signal–Language Image Pretraining.
RadEval — A unified framework for radiology text evaluation.

We thank the authors for their valuable contributions to the medical AI community.

📨 Contact

For any enquiries or collaboration opportunities, please contact: [email protected]

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🧰 Intended Use

CCD is designed to assist clinical practitioners, researchers, and medical trainees in generating and analysing chest X-ray reports, with a focus on temporal reasoning and context-aware description of radiological findings.

Key Applications

🩺 Clinical Decision Support — Produces preliminary findings or comparative analyses that can aid radiologists in drafting and reviewing reports.
🎓 Educational Tool — Demonstrates example interpretations and temporal progressions for teaching radiology residents and students.
🔬 Research Utility — Enables investigation of automated report generation, visual-language alignment, and temporal feature learning in medical imaging.

[!IMPORTANT]
All outputs must be reviewed and validated by qualified radiologists or medical professionals before informing any clinical decision.

Limitations and Recommendations

Data Bias — Performance may degrade on underrepresented populations or rare disease categories.
Clinical Oversight — CCD is a supportive system, not a replacement for professional medical judgment.
Temporal Sensitivity — Although TAC enhances temporal alignment, subtle or atypical longitudinal changes may remain unrecognised.
Generalisation — Performance may vary on image types or clinical contexts not present in the training distribution.

Ethical Considerations

Patient Privacy — All input data must be fully de-identified and compliant with HIPAA, GDPR, or equivalent local regulations.
Responsible Deployment — CCD’s outputs may contain inaccuracies; users should interpret them with appropriate caution.
Accountability — The responsibility for clinical verification and safe deployment lies with the end-user organisation or researcher.

Disclaimer

This model and accompanying tools are intended solely for research and educational purposes.
CCD is not approved by the FDA, CE, or other regulatory authorities for clinical use.
For medical diagnosis or treatment decisions, please consult a licensed healthcare professional.