arxie

agent
Security Audit
Warn
Health Warn
  • No license — Repository has no license file
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 101 GitHub stars
Code Pass
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose
This tool is a self-hostable AI research workspace that provides a persistent paper database, structured evidence extraction, and literature review workflows.

Security Assessment
Overall risk: Low. The project requires an OpenAI API key to function, meaning it makes external network requests to OpenAI's servers by default. The light code audit scanned 12 files and found no dangerous patterns, no hardcoded secrets, and no dangerous permission requests. However, users should note that setting up the full infrastructure stack also requires managing credentials for local databases and services like PostgreSQL, Elasticsearch, Redis, and Minio via environment variables.

Quality Assessment
The project is highly active, with its last push occurring today. It has earned 101 GitHub stars, indicating a solid baseline of community trust and interest among developers. The main drawback is the complete absence of a license file. Without an explicit open-source license, the software is technically fully reserved by the creator, which restricts how it can be legally used, modified, or distributed by others.

Verdict
Use with caution: the code appears safe to run, but the lack of a formal license creates legal ambiguity for professional or collaborative use.
SUMMARY

AI research workspace with a persistent paper database, structured evidence extraction, hybrid search, and proposal workflows.

README.md

Arxie

An AI research workspace backed by a persistent paper database.

Arxie is a self-hostable research system for serious literature work. It combines:

  • a canonical paper database named Paperbase
  • structured extraction over full papers
  • hybrid search and comparison surfaces
  • a workspace-aware research assistant that runs on top of that database

This repository now ships the v0.2.0 product surface described in the April 14 PRD: persistent corpora, saved workspaces, structured evidence, comparison workflows, provider-backed ingest, and a browser workspace at /app.

What You Can Do

  • build a curated paper collection from local PDFs, DOI, arXiv, and OpenAlex identifiers
  • parse papers into sections, chunks, figures, and tables
  • extract datasets, methods, metrics, result rows, findings, limitations, glossary terms, and engineering tricks
  • search papers, chunks, and artifacts with SQL fallback or Elasticsearch-backed retrieval
  • compare results, methods, tricks, figures, and tables across a corpus slice
  • save workspaces with a collection, query, focus note, filters, and pinned papers
  • run Arxie answer, chat, literature review, and proposal evidence flows against that saved context

Current Scope

v0.2.0 is production-ready for a single-user, self-hosted deployment.

It is not a multi-tenant SaaS product. The code keeps ownership boundaries so the system can grow later, but the supported deployment model today is one operator running one server stack.

Quick Start

1. Install

git clone https://github.com/mmTheBest/arxie.git
cd arxie

python -m venv .venv
source .venv/bin/activate
pip install -e .

2. Configure

cp .env.example .env

Set at least:

  • OPENAI_API_KEY

For the full self-hosted stack, .env.example also includes the Paperbase runtime variables for PostgreSQL, Elasticsearch, Redis, and object storage.

3. Start The Self-Hosted Stack

Start infrastructure:

docker compose -f infra/docker-compose.paperbase.yml up -d postgres elasticsearch minio redis

Apply schema migrations:

docker compose -f infra/docker-compose.paperbase.yml run --rm paperbase-migrate

Start the API and worker:

docker compose -f infra/docker-compose.paperbase.yml up -d paperbase-api paperbase-worker

4. Open Arxie

  • Homepage: http://localhost:8080/
  • Workspace app: http://localhost:8080/app
  • Liveness: http://localhost:8080/livez
  • Readiness: http://localhost:8080/readyz

Local Process Mode

If you prefer running the services without Compose:

paperbase-db upgrade
paperbase-api
paperbase-worker

Useful make targets:

make paperbase-db-upgrade
make paperbase-api
make paperbase-worker
make paperbase-compose-config

Legacy CLI And RA API

The original src/ra assistant still ships with the repo.

Examples:

ra query "What are recent approaches to long-context LLMs?"
ra lit-review "attention mechanisms in computer vision"
ra trace "Attention Is All You Need"
ra chat

The legacy FastAPI surface is still available too:

uvicorn ra.api.app:app --host 0.0.0.0 --port 8000

Product Architecture

src/ra/                 Assistant workflows, CLI, and legacy REST API
src/paperbase/          Canonical schema, ingest, parse, extract, search
services/paperbase_api/ Browser-facing corpus API and UI
services/paperbase_worker/ Background job execution
infra/                  Self-hosting stack and environment files

Contributor-facing system docs live in docs/architecture.

Known Limits

  • Deployment is single-user and self-hosted, not collaborative or multi-tenant.
  • Figure and table extraction is phase-1 caption-driven extraction, not full OCR or chart digitization.
  • The legacy RA API and the Paperbase product API coexist; the browser product surface is the Paperbase API at port 8080.

License

MIT

Reviews (0)

No results found