company-docs-mcp
Health Gecti
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 43 GitHub stars
Code Basarisiz
- process.env — Environment variable access in build-cli.mjs
- process.env — Environment variable access in scripts/generate-manifest.ts
- fs.rmSync — Destructive file system operation in scripts/ingest/crawl-website.ts
- child_process — Shell command execution capability in scripts/ingest/ingest-csv.ts
- process.env — Environment variable access in scripts/ingest/ingest-markdown.ts
- process.env — Environment variable access in scripts/ingest/ingest-supabase.ts
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
AI-powered company knowledge MCP. Unified place for internal policies, values, documentation, and governance. Agents can search, cite, and answer questions using real company docs.
Company Docs MCP
Turn any documentation into an AI-searchable knowledge base. Feed it markdown, HTML, PDFs, web pages, or plain text — publish to a database and let anyone on your team query it through AI tools like Claude, Cursor, or Slack. Powered by the Model Context Protocol.
What This Does
- Ingest — Point the CLI at your content. It accepts markdown files, HTML pages, PDFs, live URLs, or even an entire website via crawl.
- Publish — Run a command that converts your content into searchable vectors and stores it in a database.
- Query — Connect any MCP-compatible AI tool to your server. Ask questions in plain English and get answers sourced directly from your documentation.
Two Ways to Use This
There are two distinct roles when working with Company Docs MCP. Most people on your team only need the first one.
If someone already set up the server for your team
You just need a URL. No accounts, no installation, no terminal commands.
- Get the server URL from whoever set it up (it looks like
https://company-docs-mcp.example.workers.dev/mcp) - Add it to your AI tool:
- Claude: Settings > Connectors > Add custom connector > paste the URL
- Cursor / Windsurf: Add the URL as a remote MCP server in settings
- Start asking questions about your documentation
That's it. Cloudflare, Supabase, and the CLI are only needed by the person who sets up and maintains the server.
If you're setting up the server (admin/maintainer)
The rest of this README is for you. Follow the setup guide below to get everything running.
How the Pieces Fit Together
The system uses three services. All three offer free tiers that are sufficient for most teams.
flowchart TD
A["Your Content<br/>Markdown, HTML, PDFs, URLs"]
B["Cloudflare Workers AI"]
C[("Supabase")]
D["Your Team<br/>Claude, Cursor, Slack, Chat UI"]
E["Cloudflare Worker"]
A -- "ingest + publish" --> B
B -- "store vectors" --> C
D -- "ask a question" --> E
E -- "vector search" --> C
C -. "matching docs" .-> E
E -. "answers" .-> D
style A fill:#f9f9f9,stroke:#333,color:#333
style B fill:#dbeafe,stroke:#1d4ed8,color:#333
style C fill:#d4edda,stroke:#155724,color:#333
style D fill:#f0fdf4,stroke:#15803d,color:#333
style E fill:#dbeafe,stroke:#1d4ed8,color:#333
| Service | What it does | Why it's needed |
|---|---|---|
| Cloudflare | Hosts your server and converts text into searchable vectors using its built-in AI | This is where your server runs 24/7 so your team can query docs at any time. It also handles the AI processing that makes semantic search possible — no separate AI subscription needed. |
| Supabase | Stores your documentation in a PostgreSQL database with vector search | Powers "smart" search — asking "how do I deploy?" will find documents about releases, CI/CD, and shipping, not just pages containing the word "deploy." |
| npm package | A command-line tool that ingests your content (markdown, HTML, PDFs, URLs) and publishes it to the database | You run this on your computer whenever you add or update documentation. |
No third-party AI API keys are required. Cloudflare provides the AI capabilities through its Workers AI service, which is included with every Cloudflare account at no extra cost.
What You'll Need
Before starting, create free accounts on these two services:
- Node.js 18 or later — the runtime that powers the CLI tool (download here)
- A Cloudflare account — for hosting and AI (sign up here, free tier works)
- A Supabase account — for the database (sign up here, free tier works)
That's it. No OpenAI, Anthropic, or Google API keys needed.
Setup Guide
Follow these steps in order. Each one builds on the previous.
Step 1: Install the Package
Open your terminal in the project where your documentation lives and run:
npm install company-docs-mcp
This downloads the CLI tool to your project. No external services are contacted yet.
Step 2: Create Your Database (Supabase)
Your documentation needs a database to store content and make it searchable.
- Go to supabase.com and create a new project
- Go to Settings > API and copy three values (you'll need these in Step 4):
- Project URL (looks like
https://abc123.supabase.co) - anon key (a long string starting with
eyJ) - service_role key (another long string starting with
eyJ— keep this private)
- Project URL (looks like
- Open the SQL Editor in the left sidebar, paste the contents of
database/schema.sql, and click Run
This creates the database tables and search functions the system uses.
The schema file is included in the npm package at
node_modules/company-docs-mcp/database/schema.sql.
Step 3: Log In to Cloudflare
The CLI needs access to Cloudflare's AI service to convert your documentation into searchable vectors. The simplest way to connect is through the Wrangler CLI (Cloudflare's command-line tool, included with this package).
Run this command:
npx wrangler login
A browser window will open asking you to log in to your Cloudflare account and grant permission. Click Allow and return to your terminal.
You also need your Cloudflare Account ID:
- Go to dash.cloudflare.com
- Your Account ID is shown on the right side of the overview page — copy it
That's the only Cloudflare setup needed for publishing. The CLI automatically detects the login credentials that wrangler login saved to your computer.
Token expiration: The login session expires periodically. If you see an authentication error when publishing, just run
npx wrangler loginagain.
Step 4: Configure Your Environment
Create a file called .env in your project root with these values:
# Supabase — where your documentation is stored
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=eyJ...
SUPABASE_SERVICE_KEY=eyJ...
# Cloudflare — your Account ID (from Step 3)
CLOUDFLARE_ACCOUNT_ID=your-account-id
Replace the placeholder values with the ones you copied from Supabase (Step 2) and Cloudflare (Step 3).
Keep this file private. Never commit
.envto version control — it contains credentials. Add.envto your.gitignorefile.
Step 5: Prepare Your Content
The system accepts multiple content formats. Use whichever fits your workflow:
| Format | How to ingest | Best for |
|---|---|---|
| Markdown (.md) | npx company-docs ingest markdown --dir=./docs |
Documentation you write and maintain as files |
| HTML (.html) | npm run ingest -- html ./file.html |
Exported web pages, saved articles |
| PDF (.pdf) | npm run ingest -- pdf ./document.pdf |
Policies, reports, specs in PDF format |
| URL (any web page) | npm run ingest -- url https://example.com/page |
Live web content you want to index |
| Website crawl | npm run ingest:web -- --url=https://docs.example.com |
Entire documentation sites |
| CSV of URLs | npm run ingest:csv -- urls.csv |
Batch-importing many web pages at once |
Note: The
npx company-docsCLI (from the npm package) directly supports markdown ingestion and publishing. The other formats (HTML, PDF, URL, website crawl, CSV) are available when running from the cloned repository.
Markdown is the most common starting point. Create files in a directory — any folder structure works:
docs/
├── onboarding/
│ ├── new-hire-checklist.md
│ └── tools-and-access.md
├── engineering/
│ ├── deployment-guide.md
│ └── code-review-process.md
└── policies/
├── pto-policy.md
└── expense-guidelines.md
You can optionally add YAML frontmatter to control how each document is categorized:
---
title: Deployment Guide
category: engineering
tags: [deploy, ci-cd, release]
description: How to deploy to production
---
# Deployment Guide
Your content here...
If you don't include frontmatter, the system will auto-detect a category and extract tags from the content.
Step 6: Ingest and Publish
Two steps turn your content into a searchable knowledge base:
Step 1 — Ingest (converts your content into structured entries):
# Markdown (most common)
npx company-docs ingest markdown --dir=./docs
# Or from the cloned repo, for other formats:
npm run ingest -- html ./exported-page.html
npm run ingest -- pdf ./policy-document.pdf
npm run ingest -- url https://wiki.example.com/important-page
Step 2 — Publish (pushes entries to the database with search vectors):
npx company-docs publish
What happens:
- The ingest step reads your content, extracts titles and sections, and saves structured entries as JSON files in a
content/entries/folder in your project. publishsends each entry to Cloudflare's AI to generate search vectors, then stores everything in your Supabase database. A content hash automatically skips entries that haven't changed, so re-running is fast.
To preview what would be published without actually writing to the database:
npx company-docs publish --dry-run
Updating content: Whenever you change your source files, run both steps again. Only changed entries are re-processed.
Step 7: Deploy the Server (Cloudflare Worker)
The server is what runs 24/7 and handles search queries from your team's AI tools. It's deployed as a Cloudflare Worker.
Clone the repository
git clone https://github.com/southleft/company-docs-mcp.git
cd company-docs-mcp
npm install
Configure the Worker
Edit wrangler.toml with your organization name:
name = "company-docs-mcp"
main = "src/index.ts"
compatibility_date = "2024-01-01"
compatibility_flags = ["nodejs_compat"]
[ai]
binding = "AI"
[vars]
ORGANIZATION_NAME = "Your Organization"
VECTOR_SEARCH_ENABLED = "true"
VECTOR_SEARCH_MODE = "vector"
Create a search cache
The Worker caches recent search results to keep things fast. Run this command to create the cache:
npx wrangler kv namespace create CONTENT_CACHE
It will print an ID. Add it to wrangler.toml:
[[kv_namespaces]]
binding = "CONTENT_CACHE"
id = "the-id-that-was-printed"
Add your database credentials to the Worker
These are stored securely as encrypted secrets — they never appear in plain text in the dashboard or config files.
echo "your-supabase-url" | npx wrangler secret put SUPABASE_URL
echo "your-anon-key" | npx wrangler secret put SUPABASE_ANON_KEY
echo "your-service-key" | npx wrangler secret put SUPABASE_SERVICE_KEY
Deploy
Make sure you're logged in (you should be from Step 3 — if not, run npx wrangler login again), then:
npm run deploy
Your server is now live at https://company-docs-mcp.<your-subdomain>.workers.dev.
Step 8: Connect and Test
Share this URL with your team:
https://company-docs-mcp.<your-subdomain>.workers.dev/mcp
Claude: Settings > Connectors > Add custom connector > paste the URL.
Cursor / Windsurf / Other MCP clients: Add the URL as a remote MCP server in your client's settings.
Once connected, your AI tool will have access to these search tools:
| Tool | What it does |
|---|---|
search_documentation |
Finds documentation that matches your question using semantic search |
search_chunks |
Searches specific sections within documents |
browse_by_category |
Lists all documentation in a category (categories come from frontmatter, the --category flag, or auto-detection) |
get_all_tags |
Lists every tag used across your documentation |
Cloudflare's Role — A Quick Summary
Since Cloudflare appears in several steps, here's a plain-language summary of what it does and when:
| When | What Cloudflare does | How it's accessed |
|---|---|---|
| Publishing docs (Step 6) | Converts your text into numerical vectors that enable semantic search | CLI calls the Cloudflare REST API using your wrangler login credentials |
| Running the server (Step 7+) | Hosts the always-on server that your team queries; generates vectors for incoming questions | Built-in — no API keys needed at runtime |
Is Cloudflare optional? No — it's required for both publishing and hosting. However, the free tier is more than sufficient and no separate AI subscription is needed. The only setup required is creating an account and running npx wrangler login.
CLI Reference
npm Package Commands
These work anywhere via npx company-docs:
company-docs <command> [options]
| Command | Description |
|---|---|
ingest markdown |
Parse markdown files into content/entries/ |
publish |
Push entries to the database with AI-generated vectors |
ingest supabase |
Same as publish |
manifest |
Generate content/manifest.json (used during Worker deployment) |
Repository Commands
These are available when running from the cloned repository:
| Command | Description |
|---|---|
npm run ingest -- html <file> |
Ingest an HTML file |
npm run ingest -- pdf <file> |
Ingest a PDF document |
npm run ingest -- url <url> |
Ingest a single web page |
npm run ingest:csv -- <file> |
Ingest URLs listed in a CSV file |
npm run ingest:web -- --url=<url> |
Crawl and ingest an entire website |
Ingest Markdown Options
| Option | Description | Default |
|---|---|---|
--dir, -d |
Folder containing your markdown files | ./docs |
--category, -c |
Category label for the content (overrides frontmatter) | documentation |
--recursive |
Include files in subfolders | true |
--verbose, -v |
Show detailed output | false |
Publish Options
| Option | Description |
|---|---|
--clear |
Delete all existing data before publishing (start fresh) |
--dry-run |
Preview what would change without writing to the database |
--verbose |
Show detailed per-entry progress |
Examples
# Ingest markdown from different folders with different categories
npx company-docs ingest markdown --dir=./docs/engineering --category=engineering
npx company-docs ingest markdown --dir=./docs/policies --category=hr
npx company-docs publish
# Ingest a PDF and a web page (from cloned repo)
npm run ingest -- pdf ./policies/employee-handbook.pdf
npm run ingest -- url https://wiki.example.com/onboarding
npx company-docs publish
# Crawl an entire documentation site (from cloned repo)
npm run ingest:web -- --url=https://docs.example.com --max-pages=50
# Full re-publish from scratch
npx company-docs publish --clear
# Preview changes
npx company-docs publish --dry-run --verbose
YAML Frontmatter Reference
Each markdown file can optionally include a YAML frontmatter block at the very top. The system reads these fields:
---
title: Page Title
category: engineering
tags: [deploy, ci-cd, release]
description: A short summary of this page
status: stable
version: 1.0.0
source: src/path/to/source.ts
figma: https://figma.com/...
author: Jane Smith
department: Engineering
---
| Field | Effect |
|---|---|
title |
Used as the document title (overrides the first # Heading) |
category |
Sets the browseable category for this document |
tags |
Adds tags for filtering and discovery |
description |
Stored as metadata, returned in search results |
status |
Stored as metadata (e.g., draft, stable, deprecated) |
version |
Stored as metadata |
source, figma, author, department |
Stored as metadata, available in search results |
All fields are optional. If no frontmatter is present, the system auto-detects a category and extracts tags from the content.
Priority order: Frontmatter values take highest priority, followed by CLI flags (like --category), followed by auto-detection.
Incremental Updates
The system is designed for repeated runs — you don't need to start from scratch each time:
- Content hashing — Only entries whose content has actually changed are re-processed
- Deterministic IDs — The same source file or URL always produces the same database ID, preventing duplicates
- Stale cleanup — Entries removed from your source files are automatically cleaned up from the database
# Edit your content, then re-ingest and re-publish — only changes are processed
npx company-docs ingest markdown --dir=./docs
npx company-docs publish
Optional: Slack Integration
The server includes a Slack slash command so team members can search documentation directly from Slack:
/docs deployment process
/docs PTO policy
/docs how to set up staging
See docs/SLACK_SETUP.md for setup instructions.
Optional: Chat Interface
The server includes a web-based chat UI at its root URL (visit the Worker URL in a browser). It has two modes:
- Search mode — Finds relevant documentation using the same vector search as MCP. No additional setup needed.
- AI chat mode — Sends your question to OpenAI GPT-4o, which searches your docs and synthesizes a conversational answer. This is the only feature that requires an OpenAI API key (set as a Worker secret:
OPENAI_API_KEY).
Customize the chat UI with environment variables in wrangler.toml:
[vars]
ORGANIZATION_NAME = "Your Organization"
ORGANIZATION_LOGO_URL = "https://example.com/logo.svg"
ORGANIZATION_TAGLINE = "Ask anything about our documentation"
See docs/BRANDING.md for full branding options.
Optional: OpenAI Embeddings
By default, the system uses Cloudflare's Workers AI for embeddings (free, no extra keys). If your organization prefers OpenAI, you can switch:
OPENAI_API_KEY=sk-...
EMBEDDING_PROVIDER=openai
| Provider | Model | Dimensions | When to use |
|---|---|---|---|
| Workers AI (default) | @cf/baai/bge-large-en-v1.5 |
1024 | Default. No extra keys. Free on Cloudflare. |
| OpenAI | text-embedding-3-small |
1536 | If your organization already standardizes on OpenAI. |
Important: The embedding provider must match the database schema. The default schema.sql uses 1024 dimensions (Workers AI). If switching to OpenAI, change all vector(1024) to vector(1536) in the schema before running it.
Troubleshooting
No results from search
- Verify
npx company-docs publishcompleted without errors - Check that your
.envhas the correct Supabase credentials - Run
npx company-docs publish --dry-runto see what entries exist
Authentication error when publishing
- Your
wrangler loginsession may have expired — runnpx wrangler loginagain - Verify
CLOUDFLARE_ACCOUNT_IDis set in your.env
Duplicate entries
- Re-run
npx company-docs ingest markdownfollowed bynpx company-docs publish— duplicates are cleaned up automatically
MCP client not connecting
- Make sure the Worker is deployed and accessible
- Use the
/mcppath in the URL (not just the root URL) - Restart your MCP client after adding the connector
Wrangler login not working
- If you have a
CLOUDFLARE_API_TOKENset in your environment or.envfile, it can interfere with the login flow. Remove or comment it out, then trynpx wrangler loginagain.
Security
- Never commit
.envfiles — they contain credentials - The
SUPABASE_SERVICE_KEYhas full database access — keep it private - The
SUPABASE_ANON_KEYis restricted by Row Level Security policies (read-only) - Review docs/SECURITY_KEY_ROTATION.md if you need to rotate credentials
License
MIT — see LICENSE for details.
Contributing
Issues and pull requests are welcome at github.com/southleft/company-docs-mcp.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi