SWE-Agent-Arena
agent
Warn
Health Warn
- No license — Repository has no license file
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 10 GitHub stars
Code Warn
- process.env — Environment variable access in app.js
- network request — Outbound network request in app.js
Permissions Pass
- Permissions — No dangerous permissions requested
Purpose
This tool is an open-source platform for evaluating coding agents by running them in parallel on real software engineering tasks. Users watch the agents work, compare their git diffs, and vote on the best results.
Security Assessment
The platform is designed to actively execute agents and run code in isolated Docker environments. It makes outbound network requests and accesses environment variables, which are standard behaviors for this type of application. No dangerous permissions or hardcoded secrets were detected during the scan. However, the tool is a research preview with limited safety measures. Because it collects user dialogue and distributes it under a CC-BY license, you should be careful never to submit private data, proprietary code, or personal information to the platform. Overall risk is rated as Medium.
Quality Assessment
The project is highly active, with its most recent code push occurring today. It is a legitimate research effort backed by an academic citation in an IEEE conference. On the downside, the repository completely lacks a formal license file. While it explicitly welcomes issues and PRs, the missing license means the legal terms for modifying or distributing the software are technically undefined. Community trust is currently minimal, marked by only 10 GitHub stars.
Verdict
Use with caution — the platform is active and academically backed, but you must avoid uploading proprietary code due to its open data-sharing policy and missing software license.
This tool is an open-source platform for evaluating coding agents by running them in parallel on real software engineering tasks. Users watch the agents work, compare their git diffs, and vote on the best results.
Security Assessment
The platform is designed to actively execute agents and run code in isolated Docker environments. It makes outbound network requests and accesses environment variables, which are standard behaviors for this type of application. No dangerous permissions or hardcoded secrets were detected during the scan. However, the tool is a research preview with limited safety measures. Because it collects user dialogue and distributes it under a CC-BY license, you should be careful never to submit private data, proprietary code, or personal information to the platform. Overall risk is rated as Medium.
Quality Assessment
The project is highly active, with its most recent code push occurring today. It is a legitimate research effort backed by an academic citation in an IEEE conference. On the downside, the repository completely lacks a formal license file. While it explicitly welcomes issues and PRs, the missing license means the legal terms for modifying or distributing the software are technically undefined. Community trust is currently minimal, marked by only 10 GitHub stars.
Verdict
Use with caution — the platform is active and academically backed, but you must avoid uploading proprietary code due to its open data-sharing policy and missing software license.
Compare agents pairwise via multi‑round evaluations for SE tasks.
README.md
title: SWE-Agent-Arena
emoji: ⚔️
colorFrom: green
colorTo: purple
sdk: docker
pinned: false
hf_oauth: true
short_description: Agent arena for software engineering tasks
SWE-Agent-Arena
An open-source platform for evaluating CLI coding agents on real software engineering tasks. Two anonymous agents tackle the same task in isolated environments — you compare their output and git diffs, then vote.
Key Capabilities
- Blind pairwise comparison with live-streaming output and side-by-side git diffs
- Multi-round conversations — send follow-ups to each agent independently, mirroring real iterative workflows
- RepoChat — auto-inject repo context (issues, PRs, files) from GitHub / GitLab / HuggingFace URLs
- Rich leaderboard — Elo, Bradley-Terry MLE, PageRank, CEI (conversation efficiency), MCS (consistency), and Newman modularity
How It Works
- Submit a task — sign in, describe an SE task (optionally paste a repo URL for context)
- Watch agents work — two anonymous agents run in parallel with live stdout
- Compare diffs — side-by-side view of what each agent changed
- Vote — Agent A, Agent B, Tie, or Tie (Both Bad)
Terms of Service
- Research preview — limited safety measures; may generate offensive content.
- Must not be used for illegal, harmful, violent, racist, or sexual purposes.
- Do not upload private information.
- Collected dialogue data may be distributed under a CC-BY or similar license.
Contributing
Issues, tasks, and PRs welcome — open an issue to get started.
Citation
@inproceedings{zhao2025se,
title={SE Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering},
author={Zhao, Zhimin},
booktitle={2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge)},
pages={78--81},
year={2025},
organization={IEEE}
}
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found