semantic-scholar-mcp

mcp
Security Audit
Warn
Health Warn
  • License — License: NOASSERTION
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Pass
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

MCP server for Semantic Scholar. 200M+ academic papers in Claude Desktop

README.md

Semantic Scholar MCP Server

CI
codecov
PyPI version
DOI
Docker
GitHub Release
License: MIT
MCP
Python 3.10+
Smithery

A comprehensive 14-tool MCP server for Semantic Scholar academic research workflows. Direct access to 200M+ papers from Semantic Scholar — paper search, citation graph traversal, author profiles, and recommendations — from any Model Context Protocol client (e.g., Claude Desktop, Claude Code, Cursor, Cline, Continue, and others).


Installation

Option 1: One-Line Install (Recommended)

# No cloning needed — runs directly from PyPI
uvx s2-mcp-server

Option 2: Claude Code

claude mcp add semantic-scholar -- uvx s2-mcp-server

Option 3: Claude Desktop (Windows)

Add to %APPDATA%\Claude\claude_desktop_config.json:

{
  "mcpServers": {
    "semantic-scholar": {
      "command": "uvx",
      "args": ["s2-mcp-server"],
      "env": {
        "SEMANTIC_SCHOLAR_API_KEY": "your-key-here"
      }
    }
  }
}

Option 4: Claude Desktop (macOS)

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "semantic-scholar": {
      "command": "uvx",
      "args": ["s2-mcp-server"],
      "env": {
        "SEMANTIC_SCHOLAR_API_KEY": "your-key-here"
      }
    }
  }
}

Option 5: pip / From Source

pip install s2-mcp-server
# or
git clone https://github.com/smaniches/semantic-scholar-mcp.git
cd semantic-scholar-mcp && pip install -e .

Option 6: Docker

docker pull ghcr.io/smaniches/semantic-scholar-mcp:latest
docker run -e SEMANTIC_SCHOLAR_API_KEY=your-key ghcr.io/smaniches/semantic-scholar-mcp

Note: Get a free API key at semanticscholar.org/product/api. Without a key, you get rate-limited public access (1 req/sec).


Architecture

flowchart LR
  Client["MCP client<br/>(Claude Desktop, Claude Code,<br/>Cursor, Cline, Continue, …)"]
  subgraph Server ["s2-mcp-server (this package)"]
    direction TB
    FastMCP["FastMCP runtime<br/>(stdio transport, lifespan)"]
    Tools["14 @mcp.tool functions<br/>(server.py)"]
    Models["Pydantic input models<br/>+ field sets (models.py)"]
    Validators["Paper-ID validator<br/>(validators.py)"]
    Cache["TTL cache<br/>(cache.py)"]
    Fmt["Markdown formatters<br/>(formatters.py)"]
    HTTP["httpx client<br/>+ rate limit + retry/backoff<br/>(client.py)"]
    Errors["Typed exceptions<br/>(errors.py)"]
    Log["Structured JSON logger<br/>(logging_config.py)"]
  end
  S2Graph["Semantic Scholar<br/>Graph API"]
  S2Recs["Semantic Scholar<br/>Recommendations API"]

  Client <-- "stdio (JSON-RPC)" --> FastMCP
  FastMCP --> Tools
  Tools --> Models
  Tools --> Validators
  Tools --> Cache
  Tools --> HTTP
  Tools --> Fmt
  HTTP --> Errors
  HTTP --> Log
  HTTP -- "GET / POST<br/>x-api-key" --> S2Graph
  HTTP -- "GET / POST<br/>x-api-key" --> S2Recs

Module responsibilities (src/semantic_scholar_mcp/):

Module Responsibility
server.py FastMCP instance, 14 @mcp.tool registrations, lifespan, main() entry. Re-exports the helper surface for back-compat.
client.py Shared httpx.AsyncClient singleton, per-tier rate limiter (1 req/s public, 10 req/s keyed), retry loop with exponential backoff + jitter on 429/503/timeout, HTTP→typed-exception mapping.
models.py Pydantic input models per tool, ResponseFormat enum, the four tiered field-set constants (PAPER_SEARCH_FIELDS, …_LITE, PAPER_BULK_SEARCH_FIELDS, PAPER_DETAIL_FIELDS, AUTHOR_FIELDS).
validators.py Pre-flight paper-ID validation. Rejects NUL bytes, ?, #, path traversal; accepts the seven canonical ID formats.
cache.py In-memory TTL cache (5 min, 200 entries, oldest-first eviction) for paper/author lookups within a session.
formatters.py Markdown renderers for paper and author dicts, tuned for chat-surface readability.
errors.py SemanticScholarError hierarchy: AuthenticationError, RateLimitError, NotFoundError, ValidationError, ServerError.
logging_config.py One-JSON-per-line StructuredFormatter on stderr; safe to ship through any log aggregator.

Design choices worth knowing

  • Single httpx.AsyncClient per process. Created lazily, closed in the FastMCP lifespan teardown. Amortizes connection setup; respects keep-alive limits.
  • Rate limit is enforced at the client, not the API. A semaphore + last-request timestamp ensures we never exceed the per-tier interval even when the MCP host issues tool calls in parallel.
  • Retry is bounded and jittered. Up to MAX_RETRIES = 3, base 1 s, capped at 30 s. Honors Retry-After when present.
  • Errors are typed. Status codes map onto a small exception hierarchy so callers can branch on AuthenticationError vs RateLimitError vs NotFoundError instead of parsing strings.
  • Input validation is pre-flight. Paper IDs are checked before any outbound request; bad IDs never hit the wire.
  • Version is single-source. __version__ is derived from importlib.metadata.version("s2-mcp-server"), so bumping pyproject.toml is sufficient; release-please bumps the manifest, server.json (×2 paths), CITATION.cff, and .zenodo.json in lockstep on every release.

Configuration

API Key Options

You can provide your API key in two ways:

  1. Environment Variable (recommended for persistent use):

    export SEMANTIC_SCHOLAR_API_KEY="your-api-key-here"
    
  2. Per-Request Parameter (overrides env var):

    {
      "api_key": "your-api-key-here"
    }
    

    Caution: per-request api_key values are part of the tool-call
    arguments and may be visible in MCP transcripts, client logs, and the
    LLM's tool-call history depending on the client. For production use,
    prefer the SEMANTIC_SCHOLAR_API_KEY environment variable. Removal of
    the per-request parameter is planned for a follow-up release; see
    .github/SECURITY.md for the tracked list.

Get a free API key at: https://www.semanticscholar.org/product/api

Claude Desktop Setup

Add to your Claude Desktop config file:

Windows: %APPDATA%\Claude\claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "semantic-scholar": {
      "command": "python",
      "args": ["-m", "semantic_scholar_mcp"],
      "env": {
        "SEMANTIC_SCHOLAR_API_KEY": "your-api-key-here"
      }
    }
  }
}

Then restart Claude Desktop.


Supported ID Formats

The server accepts the following paper identifier formats:

Format Pattern Example
Semantic Scholar ID 40-character hex 649def34f8be52c8b66281af98ae884c09aef38b
DOI DOI:xxx DOI:10.1038/s41586-021-03819-2
ArXiv ARXIV:xxx ARXIV:2106.15928 or ARXIV:2106.15928v2
PubMed PMID:xxx PMID:32908142
Corpus ID CorpusId:xxx CorpusId:215416146
ACL ACL:xxx ACL:P19-1285
URL URL:xxx URL:https://arxiv.org/abs/2106.15928

Tools Reference

1. semantic_scholar_search_papers

Search for academic papers with advanced filters.

Parameters:

Parameter Type Required Description
query string Yes Search query (supports AND, OR, NOT operators and "phrase search")
year string No Year filter: "2024", "2020-2024", or "2020-"
fields_of_study string[] No Filter by fields: ["Computer Science", "Biology"]
publication_types string[] No Filter by type: ["Review", "JournalArticle"]
open_access_only boolean No Only return open access papers (default: false)
min_citation_count integer No Minimum citation count
limit integer No Max results 1-100 (default: 10)
offset integer No Pagination offset (default: 0)
response_format string No "markdown" or "json" (default: markdown)
api_key string No Override environment API key

Example:

Search for "transformer attention mechanism" papers from 2023 with at least 100 citations

JSON Example:

{
  "query": "transformer attention mechanism",
  "year": "2023",
  "min_citation_count": 100,
  "fields_of_study": ["Computer Science"],
  "limit": 20
}

2. semantic_scholar_get_paper

Get detailed information about a specific paper.

Parameters:

Parameter Type Required Description
paper_id string Yes Paper ID in any supported format
include_citations boolean No Include citing papers (default: false)
include_references boolean No Include referenced papers (default: false)
citations_limit integer No Max citations to return 1-100 (default: 10)
references_limit integer No Max references to return 1-100 (default: 10)
response_format string No "markdown" or "json" (default: markdown)
api_key string No Override environment API key

Example:

Get details for DOI:10.1038/s41586-021-03819-2 including its top 20 citations

JSON Example:

{
  "paper_id": "DOI:10.1038/s41586-021-03819-2",
  "include_citations": true,
  "citations_limit": 20
}

3. semantic_scholar_search_authors

Search for academic authors by name.

Parameters:

Parameter Type Required Description
query string Yes Author name to search
limit integer No Max results 1-100 (default: 10)
offset integer No Pagination offset (default: 0)
response_format string No "markdown" or "json" (default: markdown)
api_key string No Override environment API key

Example:

Find author "Yoshua Bengio"

JSON Example:

{
  "query": "Yoshua Bengio",
  "limit": 5
}

4. semantic_scholar_get_author

Get author profile with publications.

Parameters:

Parameter Type Required Description
author_id string Yes Semantic Scholar author ID
include_papers boolean No Include publications (default: true)
papers_limit integer No Max papers to return 1-100 (default: 20)
response_format string No "markdown" or "json" (default: markdown)
api_key string No Override environment API key

Example:

Get author profile for author ID 1741101 with their top 50 publications

JSON Example:

{
  "author_id": "1741101",
  "include_papers": true,
  "papers_limit": 50
}

5. semantic_scholar_recommendations

Get AI-powered paper recommendations based on a seed paper.

Parameters:

Parameter Type Required Description
paper_id string Yes Seed paper ID in any supported format
limit integer No Max recommendations 1-100 (default: 10)
response_format string No "markdown" or "json" (default: markdown)
api_key string No Override environment API key

Example:

Get recommendations based on paper 649def34f8be52c8b66281af98ae884c09aef38b

JSON Example:

{
  "paper_id": "ARXIV:1706.03762",
  "limit": 15
}

6. semantic_scholar_bulk_papers

Retrieve multiple papers in a single request (max 500).

Parameters:

Parameter Type Required Description
paper_ids string[] Yes List of paper IDs (max 500)
response_format string No "markdown" or "json" (default: json)
api_key string No Override environment API key

Example:

Retrieve these papers: DOI:10.1038/nature12373, ARXIV:2106.15928, PMID:32908142

JSON Example:

{
  "paper_ids": [
    "DOI:10.1038/nature12373",
    "ARXIV:2106.15928",
    "PMID:32908142"
  ]
}

7. semantic_scholar_bulk_search

Search papers with sorting and cursor-based pagination for large result sets.
Unlike search_papers, supports a sort order and returns a token for
paging through all results.

Parameters:

Parameter Type Required Description
query string Yes Search query
sort string No Sort order, e.g. "citationCount:desc", "publicationDate:asc"
token string No Continuation token from a previous bulk_search response
year string No Year filter: "2024", "2020-2024", "2020-"
fields_of_study string[] No Filter by fields: ["Computer Science"]
publication_types string[] No Filter by type: ["Review", "JournalArticle"]
min_citation_count integer No Minimum citation count
limit integer No Max results per page 1-1000 (default: 100)
response_format string No "markdown" or "json" (default: markdown)
api_key string No Override environment API key

JSON Example:

{
  "query": "graph neural networks",
  "sort": "citationCount:desc",
  "year": "2020-2024",
  "limit": 100
}

Returns: total result count, the page of papers, and a token for the
next page (when more results exist).


8. semantic_scholar_export_citation

Export a citation for a paper in BibTeX format.

Parameters:

Parameter Type Required Description
paper_id string Yes Paper ID in any supported format
format string No Citation format (currently only "bibtex")
api_key string No Override environment API key

JSON Example:

{
  "paper_id": "DOI:10.1038/s41586-021-03819-2",
  "format": "bibtex"
}

Returns: the BibTeX string for the requested paper.


9. semantic_scholar_match_paper

Find the single best paper matching a title string. Returns a numeric
matchScore alongside the matched paper.

Parameters:

Parameter Type Required Description
query string Yes Paper title to match (1-500 chars)
response_format string No "markdown" or "json" (default: markdown)
api_key string No Override environment API key

JSON Example:

{
  "query": "Attention Is All You Need"
}

Returns: the best-matching paper plus its matchScore, or "No matching
paper found." if no match.


10. semantic_scholar_paper_authors

Get full author profiles for a paper's authors (richer than the abbreviated
author list returned by get_paper).

Parameters:

Parameter Type Required Description
paper_id string Yes Paper ID in any supported format
limit integer No Max authors to return 1-1000 (default: 100)
response_format string No "markdown" or "json" (default: markdown)
api_key string No Override environment API key

JSON Example:

{
  "paper_id": "ARXIV:1706.03762",
  "limit": 25
}

Returns: the list of full author records for the paper.


11. semantic_scholar_author_batch

Retrieve multiple authors in a single request (max 1000).

Parameters:

Parameter Type Required Description
author_ids string[] Yes List of author IDs (1-1000)
response_format string No "markdown" or "json" (default: json)
api_key string No Override environment API key

JSON Example:

{
  "author_ids": ["1741101", "40348417", "144749327"]
}

Returns: counts of requested / retrieved, the retrieved author
records, and a not_found list of IDs the API did not return.


12. semantic_scholar_multi_recommend

Get recommendations using multiple positive (and optional negative) example
papers.

Parameters:

Parameter Type Required Description
positive_paper_ids string[] Yes Papers to find similar results for (1-100)
negative_paper_ids string[] No Papers to dissimilate from (0-100)
limit integer No Max recommendations 1-500 (default: 10)
response_format string No "markdown" or "json" (default: markdown)
api_key string No Override environment API key

JSON Example:

{
  "positive_paper_ids": ["ARXIV:1706.03762", "ARXIV:1810.04805"],
  "negative_paper_ids": ["DOI:10.1038/nature14539"],
  "limit": 20
}

Returns: the recommended papers plus an echo of the positive/negative
seeds used.


13. semantic_scholar_snippet_search

Search within paper full text and return text snippets with surrounding
context. Heavily rate-limited without an API key.

Parameters:

Parameter Type Required Description
query string Yes Search query for paper text (1-500 chars)
paper_ids string[] No Limit search to specific papers (max 100)
year string No Year filter: "2024", "2020-2024", "2020-"
fields_of_study string[] No Filter by fields: ["Computer Science"]
min_citation_count integer No Minimum citation count
limit integer No Max results 1-100 (default: 10)
response_format string No "markdown" or "json" (default: markdown)
api_key string No Override environment API key

JSON Example:

{
  "query": "scaling laws for language models",
  "year": "2022-2024",
  "limit": 20
}

Returns: matching snippets, each with the source paper title, section,
and a short text excerpt.


14. semantic_scholar_status

Check server health and API connectivity status.

Parameters: None

Example:

Check Semantic Scholar API status

Response:

{
  "server": "semantic-scholar-mcp",
  "version": "1.3.1",
  "api_key_configured": true,
  "timestamp": "2026-04-06T12:00:00.000000+00:00",
  "api_reachable": true
}

Rate Limits

Tier Requests/Second How to Get
No API Key 1 req/sec Default
API Key 10 req/sec Sign up (free)
Academic Partner 10-100 req/sec Apply via S2

Note: The client-side rate limiter enforces the intervals above. The upstream Semantic Scholar API may impose stricter limits during high-traffic periods.

The server automatically handles rate limiting with:

  • Request serialization to enforce minimum intervals
  • Exponential backoff retry for 429 (rate limit) and 503 (service unavailable) errors
  • Maximum 3 retries with jitter

Architecture

+-----------------+     +----------------------+     +-----------------+
|  Claude Desktop |---->|  semantic-scholar-mcp |---->| Semantic Scholar|
|   (MCP Client)  |<----|     (This Server)     |<----+      API        |
+-----------------+     +----------------------+     +-----------------+
        |                         |                          |
        | stdio (JSON-RPC)        | Your API Key             | HTTPS
        | Local process           | Local machine            | 200M+ papers

Where your API key goes. The MCP server runs locally on your machine and
does not store your API key on disk. When the server makes authenticated
requests, the key is sent only to api.semanticscholar.org over HTTPS as
the x-api-key header that the Semantic Scholar API requires. No telemetry
is sent to any third party. See the per-request api_key caution above for
how transcript exposure can occur when the parameter is used per-request
instead of via the environment variable.


Development

# Clone
git clone https://github.com/smaniches/semantic-scholar-mcp.git
cd semantic-scholar-mcp

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=src/semantic_scholar_mcp --cov-report=term-missing

# Type checking
mypy src/

Security

API keys are never persisted to disk by the server. Prefer the
SEMANTIC_SCHOLAR_API_KEY environment variable over the per-request api_key
tool parameter (see SECURITY.md for details on the
transcript-exposure risk). All API communication uses HTTPS to
api.semanticscholar.org. See SECURITY.md for
vulnerability reporting and the v1.2.x known-limitations list.


Related MCP servers by the same author

  • alphafold-sovereign-mcp — Model Context Protocol server for AlphaFold DB and 13 other biomedical data sources, with a local SQLite knowledge graph (pip install --pre alphafold-sovereign-mcp).
  • uniprot-mcp — Model Context Protocol server for UniProt Swiss-Prot and TrEMBL (pip install uniprot-mcp-server).

License

MIT License - see LICENSE file.


Author

Santiago Maniches


Contributing

Contributions welcome! Please read our Contributing Guidelines.


Support


Built by TOPOLOGICA LLC
Advancing computational research through topological intelligence

Reviews (0)

No results found