octoparse-mcp
mcp
Basarisiz
Health Uyari
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Basarisiz
- rm -rf — Recursive force deletion command in package.json
- network request — Outbound network request in package.json
- network request — Outbound network request in src/api/clients/http-client-factory.ts
- process.env — Environment variable access in src/api/protected-resource.ts
- process.env — Environment variable access in src/config/app-config.ts
Permissions Gecti
- Permissions — No dangerous permissions requested
Purpose
This server acts as a bridge for AI models, enabling them to interact with the Octoparse platform to search for, execute, and export web scraping workflows via a standardized API.
Security Assessment
The overall risk is rated as Medium. The tool accesses environment variables to manage API keys and server configurations, which is standard practice, though these credentials must be protected locally. It makes several outbound network requests to communicate with the official Octoparse API servers to fetch and export scraped data. A notable failure in the scan was the detection of a recursive force deletion command (`rm -rf`) within the `package.json` file. While this is frequently used in build scripts to clean directories, it is a potential risk that requires manual verification to ensure it does not target unexpected paths. No hardcoded secrets or explicitly dangerous system permissions were found.
Quality Assessment
The codebase is licensed under the permissive MIT license and appears to be actively maintained, with its most recent push occurring today. However, community trust and visibility are currently very low. The repository has only 5 stars on GitHub, indicating that the project has not yet been widely tested or vetted by the broader open-source community.
Verdict
Use with caution — verify the build scripts due to the force deletion commands, and be aware of its low community vetting.
This server acts as a bridge for AI models, enabling them to interact with the Octoparse platform to search for, execute, and export web scraping workflows via a standardized API.
Security Assessment
The overall risk is rated as Medium. The tool accesses environment variables to manage API keys and server configurations, which is standard practice, though these credentials must be protected locally. It makes several outbound network requests to communicate with the official Octoparse API servers to fetch and export scraped data. A notable failure in the scan was the detection of a recursive force deletion command (`rm -rf`) within the `package.json` file. While this is frequently used in build scripts to clean directories, it is a potential risk that requires manual verification to ensure it does not target unexpected paths. No hardcoded secrets or explicitly dangerous system permissions were found.
Quality Assessment
The codebase is licensed under the permissive MIT license and appears to be actively maintained, with its most recent push occurring today. However, community trust and visibility are currently very low. The repository has only 5 stars on GitHub, indicating that the project has not yet been widely tested or vetted by the broader open-source community.
Verdict
Use with caution — verify the build scripts due to the force deletion commands, and be aware of its low community vetting.
Official Octoparse MCP server for AI-powered web scraping workflows.
README.md
Octoparse MCP Server
🇺🇸 English | 🇨🇳 中文
This repository is the AI-native, workflow-focused version of the Octoparse MCP Server.
The server now exposes only 3 tools:
search_templatesexecute_taskexport_data
The goal is to shorten the tool chain, reduce token usage, and make the scraping flow much easier for LLMs to execute reliably.
Workflow
Canonical flow:
search_templates → execute_task → export_data
What each tool does:
search_templatesfinds runnable templates and returnsrecommendedTemplateexecute_taskis a dual-mode tool:validateOnly=trueruns synchronous preflight validation, while normal execution creates and starts an Octoparse cloud taskexport_datais the follow-up entrypoint for non-task clients and the unified export tool after task execution completes
Current Capabilities
search_templates
- Searches templates by keyword
- Separates API relevance from likes-based browsing order
- Returns
recommendedTemplate - Provides explicit local-only guidance when the best match cannot run in cloud
execute_task
- Accepts only
templateName + parameters - Builds the low-level Octoparse parameter structure server-side
- Supports
validateOnly=true - Supports optional MCP task execution
- In task mode, follow runtime state through
tasks/getandtasks/result - In non-task mode, returns
accepted + taskIdimmediately after create/start succeeds; then follow up withexport_data(taskId) - Supports
targetMaxRows targetMaxRows > 0only takes effect in task mode and enables threshold-stop behaviortargetMaxRows = 0or omitting the field means run to natural completion
export_data
- Preview mode by default
mode=inlinefor larger inline payloadsmode=summaryfor columns + sample rows only- Marks rows exported only when preview/inline returns the full pending set
Requirements
- Node.js 18+
- npm 9+
- Access to the Octoparse Client API
Install
npm install
Common Environment Variables
NODE_ENV=development
PORT=8080
HOST=0.0.0.0
SERVER_NAME=octoparse-mcp-server
SERVER_VERSION=1.0.0
CLIENTAPI_BASE_URL=https://pre-v2-clientapi.octoparse.com
OFFICIAL_SITE_URL=https://pre.octoparse.com
HTTP_TIMEOUT=30000
HTTP_RETRIES=3
HTTP_RETRY_DELAY=1000
SEARCH_TEMPLATE_PAGE_SIZE=8
EXECUTE_TASK_POLL_MAX_MINUTES=10
TRANSPORT_IDLE_TTL_SECONDS=1800
TRANSPORT_CLEANUP_INTERVAL_SECONDS=300
LOG_LEVEL=debug
LOG_ENABLE_CONSOLE=true
Start
npm run build
npm run start
For local development:
npm run dev
MCP Endpoints
The server listens on:
POST /GET /DELETE /
Health checks:
GET /hcGET /liveness
Authentication:
Authorization: Bearer <token>- or
X-API-Key: <api-key>
Tool Examples
1. Search a template
{
"tool": "search_templates",
"arguments": {
"keyword": "amazon"
}
}
2. Validate parameters without creating a task
{
"tool": "execute_task",
"arguments": {
"templateName": "amazon-product-scraper",
"validateOnly": true,
"parameters": {
"SearchKeyword": ["iphone"]
}
}
}
3. Start a cloud task and return accepted immediately
{
"tool": "execute_task",
"arguments": {
"templateName": "amazon-product-scraper",
"parameters": {
"SearchKeyword": ["iphone"]
}
}
}
4. Use threshold-stop in task mode
{
"tool": "execute_task",
"arguments": {
"templateName": "amazon-product-scraper",
"parameters": {
"SearchKeyword": ["iphone"]
},
"targetMaxRows": 100
}
}
Notes:
- This call should use MCP task augmentation
- When
targetMaxRows > 0, the server polls in the background and best-effort requestsstopTasknear the threshold targetMaxRows = 0means no threshold stop
5. Export a summary
{
"tool": "export_data",
"arguments": {
"taskId": "your-task-id",
"mode": "summary"
}
}
Validation
npm run build
npm test
The current regression coverage focuses on:
- recommended template selection and local-only guidance
execute_task.validateOnly- non-task
execute_taskreturningaccepted targetMaxRows=0meaning natural completion- missing / unmapped parameter handling
export_data.summarynot callingmarkExported
Design Priorities
This version optimizes for two things:
- Better agent usability: fewer tools, fewer low-level parameters, clearer next actions
- Better runtime stability: shorter call chains, tighter transport cleanup, lighter logging, and more compact error payloads
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi