paper-fetch

agent
Guvenlik Denetimi
Uyari
Health Uyari
  • No license — Repository has no license file
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 6 GitHub stars
Code Uyari
  • network request — Outbound network request in scripts/fetch.py
Permissions Gecti
  • Permissions — No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

Legal open-access PDF downloader by DOI — Unpaywall, arXiv, PMC, bioRxiv. Multi-platform Agent Skill.

README.md

paper-fetch — Legal Open-Access PDF Downloader

中文文档

What it does

  • Downloads paper PDFs from a DOI (or batch file of DOIs) via legal open-access sources
  • 5-source fallback chain: Unpaywall → Semantic Scholar openAccessPdf → arXiv → PubMed Central OA → bioRxiv/medRxiv
  • Zero dependencies — pure Python standard library, no pip install needed
  • Auto-named output{first_author}_{year}_{short_title}.pdf
  • Batch mode — pass a file of DOIs with --batch
  • Never touches Sci-Hub or any paywall-bypass service — if no OA copy exists, reports failure with metadata so you can go through ILL

Discipline Coverage

The skill is discipline-agnostic — it works for any field, not just life sciences or computer science. Coverage depends on whether the paper has a legal OA version, not on its subject area.

Source Discipline scope
Unpaywall ✅ All disciplines (covers every Crossref DOI — humanities, social sciences, physics, chemistry, economics, etc.)
Semantic Scholar ✅ All disciplines (cross-domain academic graph)
arXiv Physics, math, CS, statistics, quantitative finance, economics, EE
PubMed Central Biomedical only
bioRxiv / medRxiv Biology / medicine preprints only

In practice, Unpaywall + Semantic Scholar alone cover OA papers in chemistry, materials, economics, psychology, humanities, and every other field via institutional repositories, SSRN, RePEc, and publisher-hosted OA copies. arXiv/PMC/bioRxiv are additional fallbacks for their specific domains. If no legal OA copy exists anywhere, the skill reports failure honestly — it will never bypass paywalls regardless of discipline.

Multi-Platform Support

Works with all major AI coding agents that support the Agent Skills format:

Platform Status Details
Claude Code ✅ Full support Native SKILL.md format
OpenClaw / ClawHub ✅ Full support metadata.openclaw namespace
Hermes Agent ✅ Full support Installable under research category
pi-mono ✅ Full support metadata.pimo namespace
OpenAI Codex ✅ Full support agents/openai.yaml sidecar
SkillsMP ✅ Indexed GitHub topics configured

Comparison

vs No Skill (native agent)

Feature Native agent This skill
Resolve DOI to PDF Ad-hoc web search Deterministic 5-source chain
Unpaywall integration No Yes — highest OA coverage
arXiv / PMC / bioRxiv fallback Manual Automatic
Batch download No Yes — --batch dois.txt
Consistent filenames No Yes — author_year_title.pdf
Legal-only guarantee None Hard refuses paywall bypass
Dependencies Varies Python stdlib only

Prerequisites

  • Python 3.8+ (standard library only, no extra packages)
  • Unpaywall contact email (optional but recommended) — set once:
export [email protected]

Add it to ~/.zshrc / ~/.bashrc to persist. Without it, Unpaywall is skipped and the remaining 4 sources (Semantic Scholar, arXiv, PMC, bioRxiv/medRxiv) are still tried.

Skill Installation

Claude Code

# Global install
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.claude/skills/paper-fetch

# Project-level install
git clone https://github.com/Agents365-ai/paper-fetch.git .claude/skills/paper-fetch

OpenClaw / ClawHub

clawhub install paper-fetch

# Or manual
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.openclaw/skills/paper-fetch

Hermes Agent

git clone https://github.com/Agents365-ai/paper-fetch.git ~/.hermes/skills/research/paper-fetch

Or add to ~/.hermes/config.yaml:

skills:
  external_dirs:
    - ~/myskills/paper-fetch

pi-mono

git clone https://github.com/Agents365-ai/paper-fetch.git ~/.pimo/skills/paper-fetch

OpenAI Codex

# User-level
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.agents/skills/paper-fetch

# Project-level
git clone https://github.com/Agents365-ai/paper-fetch.git .agents/skills/paper-fetch

SkillsMP

skills install paper-fetch

Installation paths summary

Platform Global path Project path
Claude Code ~/.claude/skills/paper-fetch/ .claude/skills/paper-fetch/
OpenClaw ~/.openclaw/skills/paper-fetch/ skills/paper-fetch/
Hermes Agent ~/.hermes/skills/research/paper-fetch/ Via external_dirs
pi-mono ~/.pimo/skills/paper-fetch/
OpenAI Codex ~/.agents/skills/paper-fetch/ .agents/skills/paper-fetch/
SkillsMP N/A (installed via CLI) N/A

Usage

Single DOI:

python scripts/fetch.py 10.1038/s41586-021-03819-2

Custom output directory:

python scripts/fetch.py 10.1038/s41586-021-03819-2 --out ~/papers

Batch mode:

cat > dois.txt <<EOF
10.1038/s41586-021-03819-2
10.1126/science.abj8754
10.1101/2023.01.01.522400
EOF

python scripts/fetch.py --batch dois.txt --out ~/papers

Dry-run (preview without downloading):

python scripts/fetch.py 10.1038/s41586-020-2649-2 --dry-run

Human-readable text output:

python scripts/fetch.py 10.1038/s41586-020-2649-2 --format text

Or just ask your agent naturally:

Download the AlphaFold2 paper PDF to my ~/papers folder

Fetch the PDF for DOI 10.1038/s41586-020-2649-2

Download these three papers: 10.1038/s41586-021-03819-2, 10.1126/science.abj8754, 10.1101/2023.01.01.522400

Check if this paper has an open-access PDF available: 10.1038/s41586-020-2649-2

Batch download all DOIs from my dois.txt file into ~/papers

Resolution Order

  1. Unpaywall — best OA location across all publishers (highest hit rate)
  2. Semantic ScholaropenAccessPdf field + externalIds lookup
  3. arXiv — if the paper has an arXiv ID
  4. PubMed Central OA subset — if the paper has a PMCID
  5. bioRxiv / medRxiv — DOI prefix 10.1101/
  6. Otherwise → report failure with metadata (title/authors) for ILL

Files

  • SKILL.mdthe only required file. Loaded by all platforms.
  • scripts/fetch.py — the downloader (pure stdlib Python)
  • agents/openai.yaml — OpenAI Codex sidecar configuration
  • README.md — this file
  • README_CN.md — Chinese documentation

Known Limitations

  • Coverage depends on OA availability — if a paper has no legal OA copy, this skill cannot get it. That is a feature, not a bug.
  • Some publisher redirects return an HTML landing page instead of a PDF; the script validates the %PDF header and fails cleanly in that case
  • No authentication — institutional proxies (EZproxy / OpenAthens) are not supported in this version
  • Host allowlist — downloads are restricted to known OA provider domains; PDFs from unlisted hosts are blocked
  • 50 MB size limit — per-PDF download cap to prevent runaway downloads

License

MIT

Support

If this skill helps your work, consider supporting the author:

WeChat Pay
WeChat Pay
Alipay
Alipay
Buy Me a Coffee
Buy Me a Coffee

Author

Agents365-ai

Yorumlar (0)

Sonuc bulunamadi