memgraph-ingester

agent
Guvenlik Denetimi
Uyari
Health Uyari
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Gecti
  • Code scan — Scanned 6 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions — No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

Ingester of Java structure in Memgraph. Speed up your AI agent!

README.md

Memgraph Ingester. Speed up your AI agent!

Build
Maven Central
License: MIT
Visitors
GitHub commits
GitHub last commit

Ingests the structural model of a Java codebase into Memgraph as a
queryable code + memory knowledge graph, combining source structure with persistent engineering
context (decisions, rules, findings, etc.).

Optionally paired with the
Memgraph MCP server,
this enables you AI agent to reason over both code and accumulated project
knowledge via graph queries instead of raw text search — improving accuracy, reducing cost, and
speeding up analysis.

Having MCP configured is not required: mgconsole utility can be used to query the graph directly which also decreases tokens usage.

You can use the code in this repo as-is, or fork it and customize it to your needs.
Memgraph is free too.
Please submit any issues or pull requests.

What it does

Memgraph Ingester creates two project-scoped graphs for a Java codebase:

  • A Code graph under (:Project)-[:CONTAINS]->(:Code)
  • A Memory graph under (:Project)-[:HAS_MEMORY]->(:Memory)

Every code and memory node is scoped by a project property, so multiple Java codebases can share
the same Memgraph instance without collisions.

The Code graph stores Java source structure in a queryable, persistent form. The ingester walks
the source tree with JavaParser and symbol resolution,
then writes packages, files, classes, interfaces, annotations, methods, fields, inheritance,
and within-project call relationships.

The parser is configured for Java 25 syntax. It should handle most sources written for earlier Java
versions too, but JavaParser is not a javac replacement and may still miss unsupported or
edge-case constructs.

The Memory graph stores durable engineering context: decisions, ADRs, rules, findings, tasks,
risks, questions, ideas, and domain notes. Memory items can refer to stable :CodeRef nodes, which
are resolved back to the current code graph after ingestion. This lets agents query both
structure (code) and knowledge (memory) without relying only on raw text search.

See doc/MEMORY.md for the Memory usage guide with prompt examples and Cypher
recipes. See SCHEMA.md for the full graph model.

Requirements

  • Required: Java 25 JRE to run
  • Required: Memgraph instance (or Docker)
  • Optional: Java 25 SDK, Maven 3.9+ to build
  • Optional: mgconsole

Quick start

  • Download the latest jar (v6.0.10 the latest for now)
wget https://github.com/ousatov-ua/memgraph-ingester/releases/download/v6.0.10/memgraph-ingester.jar
  • Run Memgraph
docker run -p 7687:7687 -p 7444:7444 --name memgraph memgraph/memgraph-mage:3.9.0
  • Ingest the project:

Without classpath libs (weaker resolving):

cd /path/to/your/java/project
java -jar path/to/memgraph-ingester.jar \
  --source path/to/src \
  --bolt bolt://localhost:7687 \
  --project my-project \
  --wipe-project-code \
  --wipe-project-memories \
  --apply-schema

With classpath libs (better resolving). Example for Maven projects:

cd /path/to/your/java/project
CP=$(mvn -q dependency:build-classpath -DincludeScope=test -Dmdep.outputFile=/dev/stdout 2>/dev/null)
java -jar path/to/memgraph-ingester.jar \
  --source path/to/src \
  --bolt bolt://localhost:7687 \
  --project my-project \
  --wipe-project-code \
  --wipe-project-memories \
  --apply-schema \
  --classpath "$CP"
  • Append knowledge for your agent
# GitHub Copilot
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-github.sh \
  | bash -s -- my-project
# Claude
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-claude.sh \
  | bash -s -- my-project
# Codex
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-codex.sh \
  | bash -s -- my-project
# Gemini
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-gemini.sh \
  | bash -s -- my-project
  • Enable MCP Memgraph for your AI agent (below you can find examples) OR
    put mgconsole in the path

Going further

Maven dependency (optional)


<dependency>
  <groupId>io.github.ousatov-ua</groupId>
  <artifactId>memgraph-ingester</artifactId>
  <version><!-- see latest on Maven Central --></version>
</dependency>

Start Memgraph

  • With Docker Compose:
cd memgraph-platform
docker-compose up -d
  • Just Docker
docker run -p 7687:7687 -p 7444:7444 --name memgraph memgraph/memgraph-mage:3.9.0

Bolt listens on localhost:7687.

Build the ingester

git clone https://github.com/ousatov-ua/memgraph-ingester.git
cd memgraph-ingester
mvn clean package -Pshade -DskipTests

Produces a shaded fat JAR at target/memgraph-ingester.jar.

Or use published shaded fat JAR
in releases page.

Apply the schema (one-time per Memgraph instance)

cat src/main/resources/io/github/ousatov/tools/memgraph/cypher/create-schema.cypher | mgconsole --host localhost --port 7687

Creates uniqueness constraints and lookup indexes for both the code graph and the memory graph. Safe
to re-run — existing constraints are reported and skipped.

You can also use the CLI. This command will apply the schema to the memgraph database first, then
ingest the project:

java -jar target/memgraph-ingester.jar \
  --source /path/to/your/java/project/src/main/java \
  --bolt bolt://localhost:7687 \
  --project my-project \
  --apply-schema

Next command will also wipe all data in the memgraph database first, then will apply the
schema and ingest the project:

java -jar target/memgraph-ingester.jar \
  --source /path/to/your/java/project/src/main/java \
  --bolt bolt://localhost:7687 \
  --project my-project \
  --wipe-all \
  --apply-schema

Ingest a project

This will wipe the Code graph for this project first:

java -jar target/memgraph-ingester.jar \
  --source /path/to/your/java/project/src/main/java \
  --bolt bolt://localhost:7687 \
  --project my-project \
  --wipe-project-code

This will wipe the Code and Memory graph for this project first:

java -jar target/memgraph-ingester.jar \
  --source /path/to/your/java/project/src/main/java \
  --bolt bolt://localhost:7687 \
  --project my-project \
  --wipe-project-code \
  --wipe-project-memories

Verify


MATCH (p:Project)-[:CONTAINS]->(c:Code)
RETURN p.name, c.sourceRoots, c.lastIngested;

You should see your project with a fresh lastIngested timestamp.

CLI options

Option Short Required Default Description
--source -s yes Root directory to scan (e.g. src/main/java)
--bolt -b yes Bolt URL, e.g. bolt://localhost:7687
--project -P yes Logical project name. Namespaces all nodes.
--user -u no Memgraph username (empty by default)
--pass -p no Memgraph password (empty by default)
--threads -t no 1 Parser threads (default 1). Each thread gets its own Bolt session.
--wipe-project-code no no false Delete this project's code graph before ingesting
--wipe-project-memories no no false Delete this project's memory graph before ingesting
--apply-schema no no false Apply schema before ingesting
--wipe-all no no false Wipe all data (schema will be dropped first)
--incremental no no false Skip files whose last-modified timestamp matches the stored value
--watch -w no false Watch for changes in the source directory and automatically re-ingest
--classpath no no Additional classpath entries (JARs) for symbol resolution, separated by the platform path separator. Improves CALLS edge and type resolution coverage.

--wipe-project-code only affects code nodes matching the given --project; other codebases in the
same Memgraph instance are untouched, and the :Project anchor remains.
--wipe-project-memories only affects memory nodes matching the given --project; the code graph
and
the :Project anchor remain.

Parallel ingestion

Large codebases ingest faster with multiple parser threads:

java -jar target/memgraph-ingester.jar \
  --source /path/to/your/java/project/src/main/java \
  --bolt bolt://localhost:7687 \
  --project my-project \
  --wipe-project-code \
  --threads 8

Each thread holds its own JavaParser and its own Bolt session. The Driver itself is shared.

Realistic speedup — don't expect linear scaling. JavaParser work is CPU-bound and parallelizes
well, but Memgraph Community serializes writes internally, so the write path bottlenecks quickly:

Threads Typical speedup Bottleneck
1 1× (baseline) Sequential parse + write
4 ~2.5–3× Write serialization starts
8 ~3–4× Diminishing returns
16+ ~3–4× Writes fully saturated

4–8 threads is the sweet spot on most machines. Values higher than your CPU core count rarely help.

Determinism note: with --threads > 1, file processing order is non-deterministic. MERGE is
idempotent, so results are identical, but log order will vary between runs.

Watch mode

For active development, use --watch (or -w) to monitor the source directory for changes. The
ingester will automatically re-ingest modified .java files, update call edges, and refresh code
references whenever a change is detected:

java -jar target/memgraph-ingester.jar \
  --source /path/to/your/java/project/src/main/java \
  --bolt bolt://localhost:7687 \
  --project my-project \
  --watch

Watch mode uses Java's WatchService for efficient OS-level notifications and includes a small
debounce delay to handle multiple rapid writes (e.g., from IDE saves). It recursively watches all
subdirectories under the --source root.

Using with AI agents

This repo ships scripts designed to
be dropped into any project that's been
ingested. It tells AI agents how to scope queries to the right project, how the schema is shaped,
when to reach for the graph vs. filesystem search, and how to use Memories for durable decisions
and follow-up context.

Per-repo setup

CLAUDE

Use the bundled init-memgraph-claude.sh script, which fetches
the template, substitutes the
project name, and appends the result to the local CLAUDE.md

Run it from inside the repo you just ingested:

# Point at the script in your local checkout
/path/to/memgraph-ingester/script/init-memgraph-claude.sh my-project

Or fetch-and-run straight from GitHub:

curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-claude.sh \
  | bash -s -- my-project

Commit the updated CLAUDE.md. Claude Code reads it on every session start.

CODEX

Use the bundled init-memgraph-codex.sh script, which fetches
the template, substitutes the
project name, and appends the result to the local AGENTS.md

Run it from inside the repo you just ingested:

# Point at the script in your local checkout
/path/to/memgraph-ingester/script/init-memgraph-codex.sh my-project

Or fetch-and-run straight from GitHub:

curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-codex.sh \
  | bash -s -- my-project

Commit the updated AGENTS.md. Codex reads it on every session start.

GEMINI

Use the bundled init-memgraph-gemini.sh script, which fetches
the template, substitutes the
project name, and appends the result to the local AGENTS.md

Run it from inside the repo you just ingested:

# Point at the script in your local checkout
/path/to/memgraph-ingester/script/init-memgraph-gemini.sh my-project

Or fetch-and-run straight from GitHub:

curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-gemini.sh \
  | bash -s -- my-project

Commit the updated AGENTS.md. Gemini reads it on every session start.

GITHUB COPILOT

Use the bundled init-memgraph-github.sh script, which fetches
the template, substitutes the
project name, and appends the result to the local AGENTS.md

Run it from inside the repo you just ingested:

# Point at the script in your local checkout
/path/to/memgraph-ingester/script/init-memgraph-github.sh my-project

Or fetch-and-run straight from GitHub:

curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-github.sh \
  | bash -s -- my-project

Commit the updated AGENTS.md. GitHub Copilot reads it on every session start.

MCP server setup

CLAUDE

Claude Code needs the Memgraph MCP server to actually run queries. Minimal project-scoped config in
.claude.json for the target project:

{
  "mcpServers": {
    "memgraph": {
      "command": "uvx",
      "args": [
        "mcp-memgraph"
      ],
      "env": {
        "MEMGRAPH_URL": "bolt://localhost:7687",
        "MCP_READ_ONLY": "false"
      }
    }
  }
}

Please set MCP_READ_ONLY to "false" if you want to have Memories captured

Verify it's registered:

claude mcp list

CODEX

Codex needs the Memgraph MCP server to actually run queries. Minimal project-scoped config in
~/.codex/config.toml:

[mcp_servers.memgraph]

command = "uv"
args = [
    "run",
    "--with",
    "mcp-memgraph",
    "--python",
    "3.13",
    "mcp-memgraph"
]
[mcp_servers.memgraph.env]
MCP_TRANSPORT = "stdio"
MEMGRAPH_URL = "bolt://localhost:7687"
MEMGRAPH_USER = "memgraph"
MEMGRAPH_PASSWORD = ""
MEMGRAPH_DATABASE = "memgraph"
MCP_READ_ONLY = "false"

[mcp_servers.memgraph.tools.run_query]
approval_mode = "approve"

The Codex example is read-only. To let an agent create or update Memory nodes, use a writable MCP
connection, for example, by setting MCP_READ_ONLY = "false" and keeping run_query approval
enabled.

Verify it's registered:

codex mcp list

GEMINI

Codex needs the Memgraph MCP server to actually run queries. Minimal project-scoped config in
~/.gemini/settings.json:

{
  "mcpServers": {
    "mcp-memgraph": {
      "command": "uvx",
      "args": [
        "mcp-memgraph"
      ],
      "env": {
        "MEMGRAPH_URL": "bolt://localhost:7687",
        "MCP_READ_ONLY": "false"
      },
      "timeout": 5000,
      "trust": true
    }
  }
}

The example is read-only. To let an agent create or update Memory nodes, use a writable MCP
connection, for example, by setting MCP_READ_ONLY = "false" and keeping run_query approval
enabled.

Verify it's registered:

gemini mcp list

GITHUB COPILOT

GitHub Copilot needs the Memgraph MCP server to actually run queries. Minimal project-scoped config
in
~/.copilot/mcp-config.json:

{
  "mcpServers": {
    "mcp-memgraph": {
      "type": "local",
      "command": "uvx",
      "args": [
        "mcp-memgraph"
      ],
      "env": {
        "MEMGRAPH_URL": "bolt://localhost:7687",
        "MCP_READ_ONLY": "false"
      },
      "tools": [
        "*"
      ]
    }
  }
}

Improved type resolution with --classpath

By default, the ingester resolves types against the JDK and the project source tree. To improve
CALLS edge coverage, EXTENDS/IMPLEMENTS FQN resolution, and field/return type resolution for
external library types, pass dependency JARs via --classpath:

# Use Maven to collect the classpath (compile + test scopes)
CP=$(mvn -q dependency:build-classpath -DincludeScope=test -Dmdep.outputFile=/dev/stdout 2>/dev/null)
java -jar target/memgraph-ingester.jar \
  --source /path/to/your/java/project/src/main/java \
  --bolt bolt://localhost:7687 \
  --project my-project \
  --wipe-project-code \
  --classpath "$CP"

Tip: Use -DincludeScope=test to include test-scoped dependencies (JUnit, Testcontainers,
etc.) — this ensures parameter types from those libraries resolve to FQNs in method signatures
and field types. Without the matching JARs, external types fall back to simple names
(e.g., Session instead of org.neo4j.driver.Session).

This lets JavaParser resolve method calls whose parameters use types from Neo4j Driver, Spring,
JUnit 5, picocli, etc. — dramatically increasing the number of CALLS edges in the graph.

Re-ingesting after code changes

The graph goes stale as code changes. Re-run the ingester with --wipe-project-code to refresh:

java -jar target/memgraph-ingester.jar \
  --source /path/to/your/java/project/src/main/java \
  --bolt bolt://localhost:7687 \
  --project my-project \
  --wipe-project-code

For faster re-runs, use --incremental to skip files that haven't changed since the last ingestion
(compared by filesystem lastModified timestamp):

java -jar target/memgraph-ingester.jar \
  --source /path/to/your/java/project/src/main/java \
  --bolt bolt://localhost:7687 \
  --project my-project \
  --incremental

Check freshness anytime:


MATCH (:Project {name: 'my-project'})-[:CONTAINS]->(c:Code)
RETURN c.sourceRoots, c.lastIngested;

Multiple projects in one Memgraph instance

Run the ingester once per codebase with different --project values. Each gets its own
:Project -> :Code anchor chain; code nodes are composite-keyed by (key, project), so nothing
collides.

java -jar target/memgraph-ingester.jar -s ~/code/repo-a/src/main/java -b bolt://localhost:7687 -P repo-a --wipe-project-code
java -jar target/memgraph-ingester.jar -s ~/code/repo-b/src/main/java -b bolt://localhost:7687 -P repo-b --wipe-project-code

List everything that's indexed:


MATCH (p:Project)-[:CONTAINS]->(c:Code)
RETURN p.name, c.sourceRoots, c.lastIngested
  ORDER BY c.lastIngested DESC;

What gets captured

Node label Identity
:Project name
:Code project
:Package (name, project)
:File (path, project)
:Class (fqn, project)
:Interface (fqn, project)
:Annotation (fqn, project)
:Method (signature, project)
:Field (fqn, project)
Relationship Meaning
(:Project)-[:CONTAINS]->(:Code) Code graph anchor
(:Code)-[:CONTAINS]->(:Package | :File) Top-level code membership
(:Package)-[:CONTAINS]->(:Class | :Interface | :Annotation) Package contents
(:File)-[:DEFINES]->(:Class | :Interface | :Annotation) Source location
(:Class)-[:EXTENDS]->(:Class) Class inheritance
(:Class)-[:IMPLEMENTS]->(:Interface) Interface implementation
(:Interface)-[:EXTENDS]->(:Interface) Interface inheritance
(:Class | :Interface)-[:DECLARES]->(:Method | :Field) Type members
(:Method)-[:CALLS]->(:Method) Call graph (best-effort)
(:*)-[:ANNOTATED_WITH]->(:Annotation) Annotation usage

Memory nodes are manually authored by agents or clients and share the same project namespace.
Only the properties listed below are allowed — no extra properties may be added to any Memory node.

Label Key props All allowed properties
:Memory project project
:Decision id, project id, project, title, topic, status, rationale, consequences, createdAt, updatedAt
:ADR id, project id, project, number, title, status, context, decision, consequences, createdAt, updatedAt
:Rule id, project id, project, title, topic, severity, description, createdAt, updatedAt
:Context id, project id, project, title, topic, content, source, createdAt, updatedAt
:Finding id, project id, project, title, topic, type, summary, evidence, createdAt, updatedAt
:Task id, project id, project, title, status, priority, description, createdAt, updatedAt
:Risk id, project id, project, title, topic, severity, status, mitigation, createdAt, updatedAt
:Question id, project id, project, title, status, answer, createdAt, updatedAt
:Idea id, project id, project, title, topic, status, notes, createdAt, updatedAt
:CodeRef project, targetType, key project, targetType, key

Controlled values:

  • Decision status: proposed | accepted | rejected | superseded
  • ADR status: draft | accepted | rejected | superseded
  • Rule severity: hard | soft | recommendation
  • Finding type: bug | perf | constraint | security
  • Task status: todo | doing | done | blocked | cancelled
  • Risk severity: low | medium | high | critical
  • Risk status: open | mitigated | accepted | obsolete
  • Question status: open | answered | obsolete
Relationship Meaning
(:Project)-[:HAS_MEMORY]->(:Memory) Memory graph anchor
(:Memory)-[:HAS_DECISION | :HAS_ADR | :HAS_RULE | :HAS_CONTEXT]->(:*) Memory item ownership
(:Memory)-[:HAS_FINDING | :HAS_TASK | :HAS_RISK | :HAS_QUESTION]->(:*) Memory item ownership
(:Memory)-[:HAS_IDEA]->(:Idea) Memory item ownership
(:Decision | :ADR | :Rule | :Context | :Finding | :Task | :Risk | :Idea)-[:REFERS_TO]->(:CodeRef) Memory-to-code reference
(:CodeRef)-[:RESOLVES_TO]->(:Code | :Package | :File | :Class | :Interface | :Annotation | :Method | :Field) Current code node resolved after ingestion

CodeRef.targetType is one of Code, Package, File, Class, Interface, Annotation,
Method, or Field. CodeRef.key uses the matching code identity: project name for Code,
package name for Package, path for File, FQN for types/annotations/fields, and signature for
Method. The ingester refreshes RESOLVES_TO edges after each run, so memory can survive code
graph wipes and re-ingestion.

Caveats

  • CALLS is best-effort. JavaParser can't always resolve callees (complex generics, lambdas).
    Use --classpath with your project's dependency JARs to significantly improve resolution of
    external library types. Transitive call queries may still miss edges — a missing edge does not
    prove the call doesn't happen.
  • EXTENDS/IMPLEMENTS resolution. When the symbol solver cannot resolve a parent class or
    interface, the ingester infers the FQN from import statements and falls back to the source-level
    name. This means all inheritance edges are captured, but unresolvable external types may appear
    with a simple name (e.g. BeforeAllCallback) rather than a full FQN. Use --classpath to
    provide dependency JARs for full FQN resolution.
  • External types get tagged with your project. When a class extends or implements something from
    outside your source tree (e.g. RuntimeException, Spring interfaces), the ingester creates a
    :Class or :Interface node for it and scopes it to your project. These phantom nodes are marked
    with isExternal = true and have their name and packageName inferred from the FQN. Use
    WHERE NOT n.isExternal to filter them out. Project-internal nodes always have
    isExternal = false.
  • Same-class CALLS fallback. When full signature resolution fails for an unscoped (same-class)
    method call, the ingester falls back to name-based matching: it creates a CALLS edge if exactly
    one method with that name exists in the owning type. This avoids false positives from overloading
    while recovering many intra-class call edges that would otherwise be lost. The same fallback
    applies to same-class method references (Type::method).
  • Generated code is only indexed if you ingest it. Annotation processors, Lombok-generated
    members, and similar won't appear in the graph unless their generated source directory is passed
    to the ingester too:
java -jar memgraph-ingester.jar \
  --source target/generated-sources/annotations \
  --bolt bolt://localhost:7687 \
  --project work
  # no --wipe-project-code here!!!!

Project layout

.
├── src/main/java/io/github/ousatov/tools/memgraph/
│   ├── IngesterCli.java                    # CLI entry point (picocli Callable)
│   ├── IngestionOrchestrator.java          # Orchestrates sequential / parallel ingestion
│   ├── ParseService.java                   # JavaParser + symbol resolution
│   ├── GraphWriter.java                    # Cypher MERGE writes to Memgraph
│   ├── def/Const.java                      # Shared constants
│   ├── exception/ProcessingException.java  # Checked processing error
│   └── schema/Memgraph.java                # Schema loader (create / drop / wipe)
├── src/main/resources/io/github/ousatov/tools/memgraph/cypher/
│   ├── action/                             # Per-operation Cypher templates (upsert, wipe, resolve)
│   ├── create-schema.cypher                # Uniqueness constraints + indexes
│   ├── drop-schema.cypher                  # Schema teardown
│   └── wipe-all-data.cypher                # Full data wipe
├── src/test/java/                          # Unit + integration tests (JUnit 5, Testcontainers)
├── schema/
│   └── SCHEMA.md                           # Graph model reference (human-readable)
├── script/
│   ├── init-memgraph-claude.sh             # Appends Memgraph section to a repo's CLAUDE.md
│   ├── init-memgraph-codex.sh              # Appends Memgraph section to a repo's AGENTS.md (Codex)
│   ├── init-memgraph-gemini.sh             # Appends Memgraph section to a repo's AGENTS.md (Gemini)
│   └── init-memgraph-github.sh             # Appends Memgraph section to a repo's AGENTS.md (Copilot)
├── template/
│   ├── AI-memgraph-template.md             # Template for *.md injection
├── memgraph-platform/
│   └── docker-compose.yml                  # Memgraph + Lab (with UI)
├── pom.xml                                 # Maven build (shaded fat JAR, spotless-enforced)
└── README.md

License

MIT — see LICENSE.

Acknowledgements

Yorumlar (0)

Sonuc bulunamadi