ks-cookbook

mcp
Guvenlik Denetimi
Uyari
Health Uyari
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Gecti
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions — No dangerous permissions requested
Purpose
This project provides a developer acceleration layer for enterprise Retrieval-Augmented Generation (RAG) and agent pipelines. It offers 32 production-style agents and over 100 recipes to handle document intelligence, ingestion, and permission-aware retrieval via the Model Context Protocol (MCP).

Security Assessment
The automated code scan checked 12 files and found no dangerous patterns, hardcoded secrets, or requests for risky permissions. However, given its nature as a document intelligence platform, it inherently processes the data you feed it. Depending on your configuration, it will likely make outbound network requests to connect with various LLM providers (such as OpenAI) and vector databases. Overall risk is rated as Low, provided you securely manage any required API keys and remain aware of where your documents are being sent.

Quality Assessment
The project is licensed under the standard MIT license, which is excellent for open-source adoption and enterprise use. The repository is highly active, with the most recent code pushed today. The primary concern is its extremely low community visibility. With only 5 GitHub stars, the tool has not yet been widely vetted by a broad user base. Consequently, community trust should be considered early-stage and unproven.

Verdict
Use with caution — the code itself appears safe based on automated scans, but the tool's extremely low community adoption means it lacks the extensive peer review typical of more mature dependencies.
SUMMARY

Developer acceleration layer for enterprise RAG + agent pipelines. 32 production-style flagships on LangChain, LangGraph, CrewAI, Temporal, OpenAI Agents SDK, pydantic-ai. Permission-aware retrieval, chunk-level citations, schema-enforced output. Built on Model Context Protocol (MCP).

README.md

Knowledge Stack Cookbook

Focus on agents. We handle document intelligence.

32 production-style flagship agents + 100+ recipes for enterprise RAG — built on MCP, works with every major agent framework.

GitHub Repo stars
GitHub forks
Discord

⭐ If this repo helps you ship, please star it on GitHub — it's the single biggest signal for what we build next.

Star on GitHub

License: MIT
Python 3.11+
Flagships
Recipes
MCP compatible
GitHub Discussions
LangChain
LangGraph
CrewAI
Temporal

Quickstart · Flagships · Discord · Docs · Star History


Knowledge Stack is the document intelligence layer behind your agents — ingestion, chunking, permissions, versioning, and citation tracking — exposed through a stable MCP surface that plugs into LangChain, LangGraph, CrewAI, Temporal, OpenAI Agents SDK, pydantic-ai, Claude Desktop, Cursor, and anything else that speaks Model Context Protocol.

This repo shows how to build enterprise RAG pipelines in minutes instead of weeks — across banking, finance, legal, accounting, tax, healthcare, insurance, real estate, sales, HR, engineering, government, pharma, and energy.

⭐ Why star this repo?

  • Stars decide our roadmap. They tell us which verticals and frameworks to deepen next.
  • Get notified when we ship new flagships, recipes, and framework integrations.
  • Help other engineers discover production-grade RAG patterns instead of toy demos.

→ Star ks-cookbook (it takes 2 seconds and means a lot.)

👋 Welcome

This repo is for developers building enterprise RAG pipelines, agent workflows, and document intelligence systems on top of Knowledge Stack. The Cookbook shows how to move from raw documents → structured knowledge → production-ready agent workflows without writing custom ingestion infrastructure.

What you can build

Example workflows you can implement quickly on top of this cookbook:

  • 🔎 Enterprise document search with sentence-level citations
  • 🤖 Internal copilots grounded in company knowledge
  • 📚 Multi-document agent pipelines over PDFs, Excel, contracts, reports, technical manuals
  • 🕑 Version-aware knowledge retrieval systems
  • 🔐 Permission-aware agents with RBAC, per-tenant isolation, and audit-ready citations
  • 🏢 Secure private or on-prem deployments

You focus on agent logic. Knowledge Stack manages the knowledge layer.

Join the developer community

  • 💬 Discord — fastest place to get implementation and architecture help. Many questions get answered there first.
  • 🗣️ GitHub Discussions — share what you're building, propose flagships, ask long-form questions.
  • 🐛 Issues — bugs, feature requests, docs fixes.

When asking a question (Discord, Discussions, or issues)

Sharing these upfront makes it much faster to help:

  • what you're trying to build
  • which flagship or recipe you're following
  • which framework you're using (LangChain / LangGraph / CrewAI / Temporal / n8n / custom)
  • where you're stuck (ingestion, retrieval, citations, permissions, scaling, deployment, …)

Why this repo exists

If you're already using LangChain, LangGraph, CrewAI, or Temporal, you've noticed the same thing: the orchestration tooling is mature, but enterprise document infrastructure is still something every team rebuilds from scratch.

Most AI demos stop at "here is a chat response." Enterprise teams need stricter:

  • outputs reviewable by legal, finance, compliance, operations, or engineering
  • citations that point back to source material (chunk-level, verifiable)
  • permission-aware retrieval — the same agent behaves differently for different users
  • version-aware reads so audits reference the document as of a specific date
  • patterns that are easy to copy into real internal tooling

Knowledge Stack provides the enterprise document intelligence layer. This cookbook shows how to plug that layer directly into your agent workflows.

What Knowledge Stack manages for your agent

Instead of building this yourself:

  • document ingestion pipelines (PDF, DOCX, HTML, Markdown, …)
  • chunk storage and structured navigation
  • permission filtering and ACLs
  • version-aware retrieval
  • citation grounding (chunk-level UUIDs)
  • folder-level access control per user
  • structured document read surface (folders → documents → sections → chunks)

Knowledge Stack exposes these as APIs and MCP tools. So your team focuses on:

  • agent workflows
  • orchestration logic (LangGraph nodes, CrewAI crews, Temporal activities)
  • output schemas
  • automation pipelines
  • business logic

Pipeline mental model

┌──────────────────────────────────────────────────────────────┐
│  Agent logic      (LangChain / LangGraph / CrewAI / Temporal │
│                    / OpenAI Agents SDK / pydantic-ai)        │
└────────────────────────────┬─────────────────────────────────┘
                             ↓
┌──────────────────────────────────────────────────────────────┐
│  Knowledge Stack MCP tools  (read, search, list_contents, …) │
└────────────────────────────┬─────────────────────────────────┘
                             ↓
┌──────────────────────────────────────────────────────────────┐
│  Permission-aware retrieval   +   version-aware reads        │
└────────────────────────────┬─────────────────────────────────┘
                             ↓
┌──────────────────────────────────────────────────────────────┐
│  Chunk citations  →  schema-enforced output  →  .md/.docx    │
└──────────────────────────────────────────────────────────────┘

Knowledge Stack sits between your agent runtime and your document corpus. Your orchestration layer doesn't change.

Build enterprise RAG faster

Typical enterprise RAG requires building:

You would normally build With Knowledge Stack you skip to
ingestion pipelines + chunking + metadata ✅ done — upload and go
ACL filtering per user / group / folder ✅ enforced on every read
version pinning + historical retrieval ✅ version-aware by default
citation-grounded output tracking ✅ every chunk has a UUID
schema-enforced agent outputs ✅ patterns shown in this cookbook

You start directly at the agent layer.

Keep your existing agent framework

Knowledge Stack does not replace your agent runtime. Use it with whatever you already run:

It replaces the hardest part of enterprise RAG: document infrastructure.

What this repo teaches

Each flagship shows how to:

  1. connect an agent to Knowledge Stack via MCP
  2. retrieve permission-filtered documents
  3. enforce schema-constrained output
  4. attach chunk-level citations
  5. generate a real artifact (.md / .docx / .xlsx / .csv)

These are production agent patterns — not chat toys. Recipes (under recipes/) are ≤100 LOC single-file versions of the same ideas across LangGraph, raw OpenAI, raw Anthropic, and MCP-only.

Who this is for

Teams building internal AI agents on top of large document collections where permissions, citations, and structured outputs matter. If you're shipping agents into regulated verticals — banking, insurance, healthcare, legal, pharma, energy, government — this repo is aimed directly at you.

Quickstart

Junior-engineer path: from git clone to a working recipe in ~5 minutes.

There are two ways to run the cookbook. Pick one:

Path Use this when… What you do
A — ingestion: true You just want to see the recipes work against pre-ingested data. Sign up at https://app.knowledgestack.ai, request a read-only "Cookbook demo" key, run any recipe.
B — ingestion: false You want to ingest real PDFs/XLSX/PPTX into your own tenant and run the recipes against your data. Clone this repo, run scripts/seed_unified_corpus.py against your tenant, then run any recipe.

Architecture (one diagram)

                                     ┌────────────────────────┐
   recipes/<name>/recipe.py ───stdio─►   knowledgestack-mcp   │  ── HTTPS ──► api.knowledgestack.ai
   (≤100 LOC, no FOLDER_IDs)         │   (search/read/find)   │                (your tenant)
                                     └────────────────────────┘
              │                                   ▲
              │                                   │
              ▼                                   │
   pydantic-ai Agent ─── tools: search_knowledge ─┘
        │                       └─► read(path_part_id=<hit>)  ─► [chunk:<uuid>] marker
        ▼
   Structured output (pydantic schema) with citations[chunk_id, document_name, snippet]

Every recipe asks Knowledge Stack questions in natural language
(search_knowledge(query="When does the {company} agreement expire?")) and
follows each hit with read(path_part_id=<hit>) to retrieve the chunk text
and the [chunk:<uuid>] citation marker. There are no folder UUIDs in any
recipe — Knowledge Stack finds the right document by content.

1. Prerequisites

  • Python 3.11+
  • uv (install: curl -LsSf https://astral.sh/uv/install.sh | sh)
  • A Knowledge Stack API key — sign in at https://app.knowledgestack.ai
  • An OpenAI key (gpt-4o) — gpt-4o-mini skips grounding and produces empty citations

2. Clone and configure

git clone https://github.com/knowledgestack/ks-cookbook.git
cd ks-cookbook
cp .env.example .env

Fill in .env:

KS_API_KEY=sk-user-...
KS_BASE_URL=https://api.knowledgestack.ai
OPENAI_API_KEY=sk-proj-...
MODEL=gpt-4o

3. Install everything

make setup

Installs every workspace package into .venv, validates env vars.

4a. Path A — run against the shared cookbook tenant

The maintainers run a public-read cookbook tenant where the corpus is
pre-ingested. Use the cookbook key from https://app.knowledgestack.ai
and skip straight to step 5.

4b. Path B — ingest the unified corpus into your own tenant

# 1. Create a parent folder in your tenant via the UI; copy its path_part_id
# 2. Run the unified ingest:
uv run python scripts/seed_unified_corpus.py \
    --parent-folder-id <YOUR_PARENT_FOLDER_PATH_PART_ID>

The script uploads every file under seed/<vertical>/ (29 real public-domain
documents — CMS, NIST, IRS, FDA, FAR, NERC, NAIC, OCC, FinCEN, AWS,
SEC EDGAR, BLS XLSX, CDC PPTX, …) and waits for KS ingestion (~4 min/doc).

Format coverage in the bundled corpus:

  • 25 PDF (multi-page, with tables/images)
  • 2 PPTX (CDC PowerPoint decks)
  • 2 XLSX (BLS occupational data, FRED GDP)

5. Run your first recipe

uv run python recipes/icd10_coder/recipe.py \
    --note-file recipes/icd10_coder/sample_inputs/deid_visit_001.txt

You'll see the agent make ~10–20 MCP tool calls (search_knowledge,
read), then emit a JSON CodingResult with real chunk_ids pointing into
cms_fy2026_icd10cm_coding_guidelines.pdf in your tenant.

Other quick wins:

uv run python recipes/clause_extractor/recipe.py --contract "Apple 2024 proxy"
uv run python recipes/contract_renewal_checker/recipe.py --contract "Donna Huang software development"
uv run python recipes/benefits_enrollment_qa/recipe.py \
    --question "What ERISA disclosures must an employer provide to participants in the company SPD?"
uv run python recipes/aml_sar_narrative/recipe.py --case-id "structuring-cash-deposits"

Each recipe folder has its own README.md with a live captured output
example, sign-in steps, and troubleshooting.

To see every demo target: make help

Output examples

These are not toy console logs. The flagships write artifacts a team could actually inspect.

Each flagship writes its output into its own package directory as sample_output.<ext>:

  • flagships/credit_memo_drafter/sample_output.md — cited borrower risk memo
  • flagships/contract_obligation_extractor/sample_output.md — obligations extracted from an MSA
  • flagships/rev_rec_memo/sample_output.md — ASC 606 position memo
  • flagships/prior_auth_letter/sample_output.docx — clinical prior-auth submission
  • flagships/compliance_questionnaire/sample_output.xlsx — auto-completed CAIQ questionnaire
  • flagships/research_brief/sample_output.docx — research brief built from KB evidence
  • flagships/csv_enrichment/sample_output.csv — CSV enriched from KB content

Every output lives beside the flagship that produced it.

Repo map

flagships/<name>/
  README.md              # flagship-specific walkthrough
  pyproject.toml         # package metadata + entrypoint
  src/<module>/
    __main__.py          # CLI entry
    agent.py             # prompt + MCP interaction
    schema.py            # structured output contract
  sample_inputs/         # default demo inputs

recipes/
  INDEX.md               # lightweight patterns and starter recipes

The MCP server (knowledgestack-mcp) and the Python SDK (ksapi) now live in their own repos:

There are currently 32 flagship packages in the workspace and each one is independently runnable.

How a flagship is structured

A typical flagship follows this flow:

  1. Accept a business input such as a borrower name, endpoint, alert, contract, or patient scenario.
  2. Connect to knowledgestack-mcp.
  3. Search, list, and read the relevant folder contents from Knowledge Stack.
  4. Ask the model to produce a schema-constrained answer grounded in that source material.
  5. Write the output artifact to disk.

The important part is that the retrieval layer and citation discipline are reusable. Once you understand one flagship, the rest are easy to adapt.

Flagships by vertical

32 flagship demos. Each links to its own README with the expected corpus, a sample input, and a sample output — open one to see exactly what it does before running anything.

Banking & financial services

  • Credit memo drafter — Draft a cited credit memo from your bank's credit policy plus a borrower's financials.
    Tags: banking credit-risk underwriting commercial-lending
  • Loan covenant monitor — Flag covenant breaches or near-breaches from a borrower's quarterly financials.
    Tags: banking covenant-monitoring credit-risk
  • KYC onboarding review — CDD checklist and risk tier for a new customer against the bank's KYC policy.
    Tags: banking kyc aml compliance
  • Earnings risk analyzer — Hebbia-style 10-K risk-flag memo with chunk-level citations.
    Tags: finance sec-filings 10-k investment-research

Legal

  • Contract obligation extractor — Every shall / must / will obligation extracted from a contract, categorized and cited.
    Tags: legal contracts msa obligations
  • MSA redline vs. playbook — Compare an inbound MSA clause-by-clause against your company's standard playbook.
    Tags: legal contracts redline negotiations
  • Privacy impact assessment — PIA memo from a feature description, citing GDPR Article 35 and company template.
    Tags: legal privacy gdpr security

Accounting & tax

  • Rev-rec memo (ASC 606) — Five-step revenue-recognition memo grounded in your company's rev-rec policy.
    Tags: accounting asc-606 revenue-recognition memos
  • Audit workpaper drafter — Tie a GL balance to source documents with citations to PCAOB AS 1215.
    Tags: accounting audit pcaob workpapers
  • Tax position memo — Tax research memo citing IRC sections and Treasury Regs.
    Tags: tax irc research memos

Healthcare

  • Prior-authorization letter — Cited prior-auth or appeal letter grounded in the payer's medical policy.
    Tags: healthcare prior-auth payer clinical
  • Clinical trial eligibility — Match a patient against inclusion/exclusion criteria from a real trial protocol.
    Tags: healthcare clinical-trials eligibility ctms

Insurance

  • Claim adjudication memo — Coverage-analysis memo for a P&C claim, grounded in the applicable policy wording.
    Tags: insurance claims coverage-analysis p-and-c
  • Subrogation opportunity review — Assess recovery potential on a claim, citing NAIC Model 902 and internal SOP.
    Tags: insurance subrogation claims
  • Insurance policy comparison — Side-by-side analysis with explicit coverage gaps.
    Tags: insurance policy-comparison coverage

Real estate

  • Lease abstract — One-page cited abstract (tenant, term, rent, renewals, CAM, exclusives).
    Tags: real-estate leases commercial
  • Zoning compliance check — Check a proposed use against local Land Development Code.
    Tags: real-estate zoning compliance municipal

Sales & revenue

  • CSV enrichment — Enrich every row of a CSV with a short summary from your knowledge base.
    Tags: sales data-enrichment batch operations
  • Research brief — Generate a cited .docx research brief from your tenant.
    Tags: research reports analyst
  • RFP first draft — Draft RFP responses grounded in past proposals and capability docs.
    Tags: sales rfp proposals go-to-market
  • Sales battlecard — Battlecard with differentiators, objection handlers, and win themes.
    Tags: sales competitive enablement
  • Compliance questionnaire filler — Auto-complete a CAIQ / SIG questionnaire from your policy docs.
    Tags: security compliance caiq sig questionnaires

HR

Engineering, product & SRE

  • Incident runbook lookup — Match a PagerDuty alert to a runbook with cited remediation steps.
    Tags: engineering sre runbooks incident-response
  • API doc generator — Endpoint → developer docs grounded in OpenAPI spec + style guide.
    Tags: engineering api documentation devex
  • Release notes generator — Customer-facing notes from specs and migration guide.
    Tags: product engineering release-notes
  • SOW scope validator — Completeness check of a proposed SOW against template + methodology.
    Tags: proserv sow scope-management

Government, pharma & energy

  • Grant compliance checker — Sub-awardee activity checked against NOFO and 2 CFR 200.
    Tags: government grants compliance cfr
  • FOIA response drafter — FOIA response letter with exemption analysis.
    Tags: government foia public-records
  • Adverse event narrative — CIOMS-style AE narrative from drug label + PV SOP.
    Tags: pharma pharmacovigilance cioms safety
  • NERC CIP evidence pack — Compliance evidence memo for a NERC CIP requirement.
    Tags: energy nerc-cip compliance utilities

Browse by tag

accounting · aml · api · asc-606 · audit · banking · batch · caiq · cfr · cioms · claims · clinical · clinical-trials · commercial · commercial-lending · compliance · contracts · coverage · coverage-analysis · credit-risk · ctms · data-enrichment · devex · documentation · eligibility · enablement · energy · engineering · finance · foia · gdpr · government · go-to-market · grants · handbook · healthcare · hr · incident-response · insurance · investment-research · irc · job-descriptions · kyc · leases · legal · memos · msa · municipal · negotiations · nerc-cip · obligations · operations · payer · pcaob · pharma · pharmacovigilance · policy-comparison · prior-auth · privacy · product · proposals · proserv · public-records · q-and-a · questionnaires · real-estate · recruiting · redline · release-notes · reports · research · revenue-recognition · rfp · runbooks · safety · sales · scope-management · sec-filings · security · sig · sow · sre · subrogation · tax · underwriting · utilities · workpapers · zoning · 10-k

See INDUSTRIES.md for the broader roadmap and proposed next flagships.

Core commands

make setup               # install workspace packages and validate env
make help                # list runnable demos
make lint                # ruff across the workspace
make test                # MCP package tests
make demo-credit-memo    # run one flagship
make demo-csv            # run a lightweight batch enrichment demo
make demo-research       # run the research brief demo

Configuration notes

The cookbook auto-loads .env from the repo root.

Relevant variables:

  • KS_API_KEY: required
  • KS_BASE_URL: defaults to https://api.knowledgestack.ai
  • OPENAI_API_KEY or ANTHROPIC_API_KEY: at least one is required
  • CORPUS_FOLDER_ID: override the default sample corpus for many demos
  • demo-specific variables such as TOPIC, QUESTION, BORROWER, IN, and OUT

Most flagships ship with seeded defaults, so you can run them without hunting down IDs first. When you want to point a demo at your own data, override the folder ID:

CORPUS_FOLDER_ID=your-folder-id make demo-credit-memo

Bring your own data

To adapt a flagship to your own tenant:

  1. Upload your documents to Knowledge Stack.
  2. Identify the target folder.
  3. Pass that folder ID into a flagship command.
  4. Inspect the generated artifact and verify the citations.

The agent code should stay mostly unchanged. The data source changes; the retrieval and schema pattern does not.

MCP tools used by the flagships

The demos rely on the knowledgestack-mcp read-side tool surface, including:

  • list_contents
  • find
  • read
  • read_around
  • search_knowledge
  • search_keyword
  • get_info
  • view_chunk_image
  • get_organization_info
  • get_current_datetime

That is the contract most builders should care about when adapting these examples.

For contributors

This repo is set up to be easy to extend:

  • copy a flagship and change the prompt and schema
  • keep citations mandatory
  • make the output a file artifact, not just stdout
  • prefer realistic sample corpora and sample inputs

Useful docs:

Developer docs

Full developer wiki lives under docs/wiki/:

To scaffold a new flagship:

cp -r flagships/_template flagships/<your-name>

Using the cookbook from Claude Desktop or Cursor

If you want your assistant to talk directly to Knowledge Stack, add the MCP server to your config:

{
  "mcpServers": {
    "knowledgestack": {
      "command": "uvx",
      "args": ["knowledgestack-mcp"],
      "env": {
        "KS_API_KEY": "sk-user-..."
      }
    }
  }
}

Contributing

We're actively looking for contributions. Good first PRs:

  • New flagship for a vertical we haven't covered (proposals in INDUSTRIES.md)
  • New recipe (≤100 LOC single file) — patterns across LangChain, LangGraph, CrewAI, Temporal, raw OpenAI / Anthropic are all welcome
  • Expand an existing flagship to a second framework (e.g. port a pydantic-ai flagship to LangGraph)
  • Improve a sample corpus or assemble a cleaner public-domain dataset
  • Docs fixes and clearer developer docs in docs/wiki/

Start here: CONTRIBUTING.md. Scaffold a new flagship or recipe:

cp -r flagships/_template flagships/<your-name>
# or
cp -r recipes/_template  recipes/<your-name>

Building something with Knowledge Stack? Reach out.

If you're building an internal agent, ingestion pipeline, or enterprise RAG system on top of Knowledge Stack, we'd love to hear from you — whether you want to collaborate on a flagship, need help with a production deployment, or have feedback on the MCP surface.

⭐ Star History

If this repo is useful to you, give it a star — it's the single biggest signal we use to decide which flagships, frameworks, and verticals to prioritize next.

Community ask

If this repo helped you ship or prototype something, star the repository. Stars improve discoverability, help us prioritize which examples to deepen, and validate that open-source, enterprise-grade agent patterns are worth maintaining in the open.

You can also:

  • 🐦 Share it — tweet/post about a flagship that solved a real problem for you
  • 💬 Tell us what's missing — open a flagship request
  • 🛠️ Contribute — see CONTRIBUTING.md, every PR is reviewed quickly

Keywords

enterprise RAG, AI agents, agent framework, MCP, Model Context Protocol, LangChain, LangGraph, CrewAI, Temporal workflows, OpenAI Agents SDK, pydantic-ai, Claude Desktop, Cursor, permission-aware retrieval, document intelligence, citation grounding, structured output, tool use, knowledge base, vector search, semantic search, BM25, chunk retrieval, version-aware retrieval, tenant isolation, banking AI, legal AI, healthcare AI, insurance AI, accounting AI, compliance automation, KYC, AML, ASC 606, FOIA, NERC CIP, PCAOB, GDPR, prior authorization, CIOMS, clinical trial eligibility, credit memo, covenant monitoring, MSA redline, rev-rec, audit workpaper, tax research, RFP, sales battlecard, SRE runbooks, API documentation, release notes, PIA, SOW, grant compliance.

Filing issues & PRs

We've made both as low-friction as possible:

Pull requests use a template that walks you through summary, test plan, and checklist — nothing fancy, just so reviewers can move fast.

License

MIT. See LICENSE.

Yorumlar (0)

Sonuc bulunamadi