memgraph-ingester
Health Uyari
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Gecti
- Code scan — Scanned 6 files during light audit, no dangerous patterns found
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Ingester of Java structure in Memgraph. Speed up your AI agent!
Memgraph Ingester. Speed up your AI agent!
Ingests the structural model of a Java codebase into Memgraph as a
queryable code + memory knowledge graph, combining source structure with persistent engineering
context (decisions, rules, findings, etc.).
Optionally paired with the
Memgraph MCP server,
this enables you AI agent to reason over both code and accumulated project
knowledge via graph queries instead of raw text search — improving accuracy, reducing cost, and
speeding up analysis.
Having MCP configured is not required: mgconsole utility can be used to query the graph directly which also decreases tokens usage.
You can use the code in this repo as-is, or fork it and customize it to your needs.
Memgraph is free too.
Please submit any issues or pull requests.
What it does
Memgraph Ingester creates two project-scoped graphs for a Java codebase:
- A Code graph under
(:Project)-[:CONTAINS]->(:Code) - A Memory graph under
(:Project)-[:HAS_MEMORY]->(:Memory)
Every code and memory node is scoped by a project property, so multiple Java codebases can share
the same Memgraph instance without collisions.
The Code graph stores Java source structure in a queryable, persistent form. The ingester walks
the source tree with JavaParser and symbol resolution,
then writes packages, files, classes, interfaces, annotations, methods, fields, inheritance,
and within-project call relationships.
The parser is configured for Java 25 syntax. It should handle most sources written for earlier Java
versions too, but JavaParser is not a javac replacement and may still miss unsupported or
edge-case constructs.
The Memory graph stores durable engineering context: decisions, ADRs, rules, findings, tasks,
risks, questions, ideas, and domain notes. Memory items can refer to stable :CodeRef nodes, which
are resolved back to the current code graph after ingestion. This lets agents query both
structure (code) and knowledge (memory) without relying only on raw text search.
See doc/MEMORY.md for the Memory usage guide with prompt examples and Cypher
recipes. See SCHEMA.md for the full graph model.
Requirements
- Required: Java 25 JRE to run
- Required: Memgraph instance (or Docker)
- Optional: Java 25 SDK, Maven 3.9+ to build
- Optional:
mgconsole
Quick start
- Download the latest jar (v6.0.10 the latest for now)
wget https://github.com/ousatov-ua/memgraph-ingester/releases/download/v6.0.10/memgraph-ingester.jar
- Run Memgraph
docker run -p 7687:7687 -p 7444:7444 --name memgraph memgraph/memgraph-mage:3.9.0
- Ingest the project:
Without classpath libs (weaker resolving):
cd /path/to/your/java/project
java -jar path/to/memgraph-ingester.jar \
--source path/to/src \
--bolt bolt://localhost:7687 \
--project my-project \
--wipe-project-code \
--wipe-project-memories \
--apply-schema
With classpath libs (better resolving). Example for Maven projects:
cd /path/to/your/java/project
CP=$(mvn -q dependency:build-classpath -DincludeScope=test -Dmdep.outputFile=/dev/stdout 2>/dev/null)
java -jar path/to/memgraph-ingester.jar \
--source path/to/src \
--bolt bolt://localhost:7687 \
--project my-project \
--wipe-project-code \
--wipe-project-memories \
--apply-schema \
--classpath "$CP"
- Append knowledge for your agent
# GitHub Copilot
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-github.sh \
| bash -s -- my-project
# Claude
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-claude.sh \
| bash -s -- my-project
# Codex
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-codex.sh \
| bash -s -- my-project
# Gemini
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-gemini.sh \
| bash -s -- my-project
- Enable MCP Memgraph for your AI agent (below you can find examples) OR
putmgconsolein the path
Going further
Maven dependency (optional)
<dependency>
<groupId>io.github.ousatov-ua</groupId>
<artifactId>memgraph-ingester</artifactId>
<version><!-- see latest on Maven Central --></version>
</dependency>
Start Memgraph
- With Docker Compose:
cd memgraph-platform
docker-compose up -d
- Just Docker
docker run -p 7687:7687 -p 7444:7444 --name memgraph memgraph/memgraph-mage:3.9.0
Bolt listens on localhost:7687.
Build the ingester
git clone https://github.com/ousatov-ua/memgraph-ingester.git
cd memgraph-ingester
mvn clean package -Pshade -DskipTests
Produces a shaded fat JAR at target/memgraph-ingester.jar.
Or use published shaded fat JAR
in releases page.
Apply the schema (one-time per Memgraph instance)
cat src/main/resources/io/github/ousatov/tools/memgraph/cypher/create-schema.cypher | mgconsole --host localhost --port 7687
Creates uniqueness constraints and lookup indexes for both the code graph and the memory graph. Safe
to re-run — existing constraints are reported and skipped.
You can also use the CLI. This command will apply the schema to the memgraph database first, then
ingest the project:
java -jar target/memgraph-ingester.jar \
--source /path/to/your/java/project/src/main/java \
--bolt bolt://localhost:7687 \
--project my-project \
--apply-schema
Next command will also wipe all data in the memgraph database first, then will apply the
schema and ingest the project:
java -jar target/memgraph-ingester.jar \
--source /path/to/your/java/project/src/main/java \
--bolt bolt://localhost:7687 \
--project my-project \
--wipe-all \
--apply-schema
Ingest a project
This will wipe the Code graph for this project first:
java -jar target/memgraph-ingester.jar \
--source /path/to/your/java/project/src/main/java \
--bolt bolt://localhost:7687 \
--project my-project \
--wipe-project-code
This will wipe the Code and Memory graph for this project first:
java -jar target/memgraph-ingester.jar \
--source /path/to/your/java/project/src/main/java \
--bolt bolt://localhost:7687 \
--project my-project \
--wipe-project-code \
--wipe-project-memories
Verify
MATCH (p:Project)-[:CONTAINS]->(c:Code)
RETURN p.name, c.sourceRoots, c.lastIngested;
You should see your project with a fresh lastIngested timestamp.
CLI options
| Option | Short | Required | Default | Description |
|---|---|---|---|---|
--source |
-s |
yes | Root directory to scan (e.g. src/main/java) |
|
--bolt |
-b |
yes | Bolt URL, e.g. bolt://localhost:7687 |
|
--project |
-P |
yes | Logical project name. Namespaces all nodes. | |
--user |
-u |
no | Memgraph username (empty by default) | |
--pass |
-p |
no | Memgraph password (empty by default) | |
--threads |
-t |
no | 1 | Parser threads (default 1). Each thread gets its own Bolt session. |
--wipe-project-code |
no | no | false | Delete this project's code graph before ingesting |
--wipe-project-memories |
no | no | false | Delete this project's memory graph before ingesting |
--apply-schema |
no | no | false | Apply schema before ingesting |
--wipe-all |
no | no | false | Wipe all data (schema will be dropped first) |
--incremental |
no | no | false | Skip files whose last-modified timestamp matches the stored value |
--watch |
-w |
no | false | Watch for changes in the source directory and automatically re-ingest |
--classpath |
no | no | Additional classpath entries (JARs) for symbol resolution, separated by the platform path separator. Improves CALLS edge and type resolution coverage. |
--wipe-project-code only affects code nodes matching the given --project; other codebases in the
same Memgraph instance are untouched, and the :Project anchor remains.--wipe-project-memories only affects memory nodes matching the given --project; the code graph
and
the :Project anchor remain.
Parallel ingestion
Large codebases ingest faster with multiple parser threads:
java -jar target/memgraph-ingester.jar \
--source /path/to/your/java/project/src/main/java \
--bolt bolt://localhost:7687 \
--project my-project \
--wipe-project-code \
--threads 8
Each thread holds its own JavaParser and its own Bolt session. The Driver itself is shared.
Realistic speedup — don't expect linear scaling. JavaParser work is CPU-bound and parallelizes
well, but Memgraph Community serializes writes internally, so the write path bottlenecks quickly:
| Threads | Typical speedup | Bottleneck |
|---|---|---|
| 1 | 1× (baseline) | Sequential parse + write |
| 4 | ~2.5–3× | Write serialization starts |
| 8 | ~3–4× | Diminishing returns |
| 16+ | ~3–4× | Writes fully saturated |
4–8 threads is the sweet spot on most machines. Values higher than your CPU core count rarely help.
Determinism note: with --threads > 1, file processing order is non-deterministic. MERGE is
idempotent, so results are identical, but log order will vary between runs.
Watch mode
For active development, use --watch (or -w) to monitor the source directory for changes. The
ingester will automatically re-ingest modified .java files, update call edges, and refresh code
references whenever a change is detected:
java -jar target/memgraph-ingester.jar \
--source /path/to/your/java/project/src/main/java \
--bolt bolt://localhost:7687 \
--project my-project \
--watch
Watch mode uses Java's WatchService for efficient OS-level notifications and includes a small
debounce delay to handle multiple rapid writes (e.g., from IDE saves). It recursively watches all
subdirectories under the --source root.
Using with AI agents
This repo ships scripts designed to
be dropped into any project that's been
ingested. It tells AI agents how to scope queries to the right project, how the schema is shaped,
when to reach for the graph vs. filesystem search, and how to use Memories for durable decisions
and follow-up context.
Per-repo setup
CLAUDE
Use the bundled init-memgraph-claude.sh script, which fetches
the template, substitutes the
project name, and appends the result to the local CLAUDE.md
Run it from inside the repo you just ingested:
# Point at the script in your local checkout
/path/to/memgraph-ingester/script/init-memgraph-claude.sh my-project
Or fetch-and-run straight from GitHub:
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-claude.sh \
| bash -s -- my-project
Commit the updated CLAUDE.md. Claude Code reads it on every session start.
CODEX
Use the bundled init-memgraph-codex.sh script, which fetches
the template, substitutes the
project name, and appends the result to the local AGENTS.md
Run it from inside the repo you just ingested:
# Point at the script in your local checkout
/path/to/memgraph-ingester/script/init-memgraph-codex.sh my-project
Or fetch-and-run straight from GitHub:
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-codex.sh \
| bash -s -- my-project
Commit the updated AGENTS.md. Codex reads it on every session start.
GEMINI
Use the bundled init-memgraph-gemini.sh script, which fetches
the template, substitutes the
project name, and appends the result to the local AGENTS.md
Run it from inside the repo you just ingested:
# Point at the script in your local checkout
/path/to/memgraph-ingester/script/init-memgraph-gemini.sh my-project
Or fetch-and-run straight from GitHub:
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-gemini.sh \
| bash -s -- my-project
Commit the updated AGENTS.md. Gemini reads it on every session start.
GITHUB COPILOT
Use the bundled init-memgraph-github.sh script, which fetches
the template, substitutes the
project name, and appends the result to the local AGENTS.md
Run it from inside the repo you just ingested:
# Point at the script in your local checkout
/path/to/memgraph-ingester/script/init-memgraph-github.sh my-project
Or fetch-and-run straight from GitHub:
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-github.sh \
| bash -s -- my-project
Commit the updated AGENTS.md. GitHub Copilot reads it on every session start.
MCP server setup
CLAUDE
Claude Code needs the Memgraph MCP server to actually run queries. Minimal project-scoped config in.claude.json for the target project:
{
"mcpServers": {
"memgraph": {
"command": "uvx",
"args": [
"mcp-memgraph"
],
"env": {
"MEMGRAPH_URL": "bolt://localhost:7687",
"MCP_READ_ONLY": "false"
}
}
}
}
Please set MCP_READ_ONLY to "false" if you want to have Memories captured
Verify it's registered:
claude mcp list
CODEX
Codex needs the Memgraph MCP server to actually run queries. Minimal project-scoped config in~/.codex/config.toml:
[mcp_servers.memgraph]
command = "uv"
args = [
"run",
"--with",
"mcp-memgraph",
"--python",
"3.13",
"mcp-memgraph"
]
[mcp_servers.memgraph.env]
MCP_TRANSPORT = "stdio"
MEMGRAPH_URL = "bolt://localhost:7687"
MEMGRAPH_USER = "memgraph"
MEMGRAPH_PASSWORD = ""
MEMGRAPH_DATABASE = "memgraph"
MCP_READ_ONLY = "false"
[mcp_servers.memgraph.tools.run_query]
approval_mode = "approve"
The Codex example is read-only. To let an agent create or update Memory nodes, use a writable MCP
connection, for example, by setting MCP_READ_ONLY = "false" and keeping run_query approval
enabled.
Verify it's registered:
codex mcp list
GEMINI
Codex needs the Memgraph MCP server to actually run queries. Minimal project-scoped config in~/.gemini/settings.json:
{
"mcpServers": {
"mcp-memgraph": {
"command": "uvx",
"args": [
"mcp-memgraph"
],
"env": {
"MEMGRAPH_URL": "bolt://localhost:7687",
"MCP_READ_ONLY": "false"
},
"timeout": 5000,
"trust": true
}
}
}
The example is read-only. To let an agent create or update Memory nodes, use a writable MCP
connection, for example, by setting MCP_READ_ONLY = "false" and keeping run_query approval
enabled.
Verify it's registered:
gemini mcp list
GITHUB COPILOT
GitHub Copilot needs the Memgraph MCP server to actually run queries. Minimal project-scoped config
in~/.copilot/mcp-config.json:
{
"mcpServers": {
"mcp-memgraph": {
"type": "local",
"command": "uvx",
"args": [
"mcp-memgraph"
],
"env": {
"MEMGRAPH_URL": "bolt://localhost:7687",
"MCP_READ_ONLY": "false"
},
"tools": [
"*"
]
}
}
}
Improved type resolution with --classpath
By default, the ingester resolves types against the JDK and the project source tree. To improveCALLS edge coverage, EXTENDS/IMPLEMENTS FQN resolution, and field/return type resolution for
external library types, pass dependency JARs via --classpath:
# Use Maven to collect the classpath (compile + test scopes)
CP=$(mvn -q dependency:build-classpath -DincludeScope=test -Dmdep.outputFile=/dev/stdout 2>/dev/null)
java -jar target/memgraph-ingester.jar \
--source /path/to/your/java/project/src/main/java \
--bolt bolt://localhost:7687 \
--project my-project \
--wipe-project-code \
--classpath "$CP"
Tip: Use
-DincludeScope=testto include test-scoped dependencies (JUnit, Testcontainers,
etc.) — this ensures parameter types from those libraries resolve to FQNs in method signatures
and field types. Without the matching JARs, external types fall back to simple names
(e.g.,Sessioninstead oforg.neo4j.driver.Session).
This lets JavaParser resolve method calls whose parameters use types from Neo4j Driver, Spring,
JUnit 5, picocli, etc. — dramatically increasing the number of CALLS edges in the graph.
Re-ingesting after code changes
The graph goes stale as code changes. Re-run the ingester with --wipe-project-code to refresh:
java -jar target/memgraph-ingester.jar \
--source /path/to/your/java/project/src/main/java \
--bolt bolt://localhost:7687 \
--project my-project \
--wipe-project-code
For faster re-runs, use --incremental to skip files that haven't changed since the last ingestion
(compared by filesystem lastModified timestamp):
java -jar target/memgraph-ingester.jar \
--source /path/to/your/java/project/src/main/java \
--bolt bolt://localhost:7687 \
--project my-project \
--incremental
Check freshness anytime:
MATCH (:Project {name: 'my-project'})-[:CONTAINS]->(c:Code)
RETURN c.sourceRoots, c.lastIngested;
Multiple projects in one Memgraph instance
Run the ingester once per codebase with different --project values. Each gets its own:Project -> :Code anchor chain; code nodes are composite-keyed by (key, project), so nothing
collides.
java -jar target/memgraph-ingester.jar -s ~/code/repo-a/src/main/java -b bolt://localhost:7687 -P repo-a --wipe-project-code
java -jar target/memgraph-ingester.jar -s ~/code/repo-b/src/main/java -b bolt://localhost:7687 -P repo-b --wipe-project-code
List everything that's indexed:
MATCH (p:Project)-[:CONTAINS]->(c:Code)
RETURN p.name, c.sourceRoots, c.lastIngested
ORDER BY c.lastIngested DESC;
What gets captured
| Node label | Identity |
|---|---|
:Project |
name |
:Code |
project |
:Package |
(name, project) |
:File |
(path, project) |
:Class |
(fqn, project) |
:Interface |
(fqn, project) |
:Annotation |
(fqn, project) |
:Method |
(signature, project) |
:Field |
(fqn, project) |
| Relationship | Meaning |
|---|---|
(:Project)-[:CONTAINS]->(:Code) |
Code graph anchor |
(:Code)-[:CONTAINS]->(:Package | :File) |
Top-level code membership |
(:Package)-[:CONTAINS]->(:Class | :Interface | :Annotation) |
Package contents |
(:File)-[:DEFINES]->(:Class | :Interface | :Annotation) |
Source location |
(:Class)-[:EXTENDS]->(:Class) |
Class inheritance |
(:Class)-[:IMPLEMENTS]->(:Interface) |
Interface implementation |
(:Interface)-[:EXTENDS]->(:Interface) |
Interface inheritance |
(:Class | :Interface)-[:DECLARES]->(:Method | :Field) |
Type members |
(:Method)-[:CALLS]->(:Method) |
Call graph (best-effort) |
(:*)-[:ANNOTATED_WITH]->(:Annotation) |
Annotation usage |
Memory nodes are manually authored by agents or clients and share the same project namespace.
Only the properties listed below are allowed — no extra properties may be added to any Memory node.
| Label | Key props | All allowed properties |
|---|---|---|
:Memory |
project |
project |
:Decision |
id, project |
id, project, title, topic, status, rationale, consequences, createdAt, updatedAt |
:ADR |
id, project |
id, project, number, title, status, context, decision, consequences, createdAt, updatedAt |
:Rule |
id, project |
id, project, title, topic, severity, description, createdAt, updatedAt |
:Context |
id, project |
id, project, title, topic, content, source, createdAt, updatedAt |
:Finding |
id, project |
id, project, title, topic, type, summary, evidence, createdAt, updatedAt |
:Task |
id, project |
id, project, title, status, priority, description, createdAt, updatedAt |
:Risk |
id, project |
id, project, title, topic, severity, status, mitigation, createdAt, updatedAt |
:Question |
id, project |
id, project, title, status, answer, createdAt, updatedAt |
:Idea |
id, project |
id, project, title, topic, status, notes, createdAt, updatedAt |
:CodeRef |
project, targetType, key |
project, targetType, key |
Controlled values:
- Decision
status:proposed|accepted|rejected|superseded - ADR
status:draft|accepted|rejected|superseded - Rule
severity:hard|soft|recommendation - Finding
type:bug|perf|constraint|security - Task
status:todo|doing|done|blocked|cancelled - Risk
severity:low|medium|high|critical - Risk
status:open|mitigated|accepted|obsolete - Question
status:open|answered|obsolete
| Relationship | Meaning |
|---|---|
(:Project)-[:HAS_MEMORY]->(:Memory) |
Memory graph anchor |
(:Memory)-[:HAS_DECISION | :HAS_ADR | :HAS_RULE | :HAS_CONTEXT]->(:*) |
Memory item ownership |
(:Memory)-[:HAS_FINDING | :HAS_TASK | :HAS_RISK | :HAS_QUESTION]->(:*) |
Memory item ownership |
(:Memory)-[:HAS_IDEA]->(:Idea) |
Memory item ownership |
(:Decision | :ADR | :Rule | :Context | :Finding | :Task | :Risk | :Idea)-[:REFERS_TO]->(:CodeRef) |
Memory-to-code reference |
(:CodeRef)-[:RESOLVES_TO]->(:Code | :Package | :File | :Class | :Interface | :Annotation | :Method | :Field) |
Current code node resolved after ingestion |
CodeRef.targetType is one of Code, Package, File, Class, Interface, Annotation,Method, or Field. CodeRef.key uses the matching code identity: project name for Code,
package name for Package, path for File, FQN for types/annotations/fields, and signature forMethod. The ingester refreshes RESOLVES_TO edges after each run, so memory can survive code
graph wipes and re-ingestion.
Caveats
CALLSis best-effort. JavaParser can't always resolve callees (complex generics, lambdas).
Use--classpathwith your project's dependency JARs to significantly improve resolution of
external library types. Transitive call queries may still miss edges — a missing edge does not
prove the call doesn't happen.EXTENDS/IMPLEMENTSresolution. When the symbol solver cannot resolve a parent class or
interface, the ingester infers the FQN from import statements and falls back to the source-level
name. This means all inheritance edges are captured, but unresolvable external types may appear
with a simple name (e.g.BeforeAllCallback) rather than a full FQN. Use--classpathto
provide dependency JARs for full FQN resolution.- External types get tagged with your project. When a class extends or implements something from
outside your source tree (e.g.RuntimeException, Spring interfaces), the ingester creates a:Classor:Interfacenode for it and scopes it to your project. These phantom nodes are marked
withisExternal = trueand have theirnameandpackageNameinferred from the FQN. UseWHERE NOT n.isExternalto filter them out. Project-internal nodes always haveisExternal = false. - Same-class CALLS fallback. When full signature resolution fails for an unscoped (same-class)
method call, the ingester falls back to name-based matching: it creates aCALLSedge if exactly
one method with that name exists in the owning type. This avoids false positives from overloading
while recovering many intra-class call edges that would otherwise be lost. The same fallback
applies to same-class method references (Type::method). - Generated code is only indexed if you ingest it. Annotation processors, Lombok-generated
members, and similar won't appear in the graph unless their generated source directory is passed
to the ingester too:
java -jar memgraph-ingester.jar \
--source target/generated-sources/annotations \
--bolt bolt://localhost:7687 \
--project work
# no --wipe-project-code here!!!!
Project layout
.
├── src/main/java/io/github/ousatov/tools/memgraph/
│ ├── IngesterCli.java # CLI entry point (picocli Callable)
│ ├── IngestionOrchestrator.java # Orchestrates sequential / parallel ingestion
│ ├── ParseService.java # JavaParser + symbol resolution
│ ├── GraphWriter.java # Cypher MERGE writes to Memgraph
│ ├── def/Const.java # Shared constants
│ ├── exception/ProcessingException.java # Checked processing error
│ └── schema/Memgraph.java # Schema loader (create / drop / wipe)
├── src/main/resources/io/github/ousatov/tools/memgraph/cypher/
│ ├── action/ # Per-operation Cypher templates (upsert, wipe, resolve)
│ ├── create-schema.cypher # Uniqueness constraints + indexes
│ ├── drop-schema.cypher # Schema teardown
│ └── wipe-all-data.cypher # Full data wipe
├── src/test/java/ # Unit + integration tests (JUnit 5, Testcontainers)
├── schema/
│ └── SCHEMA.md # Graph model reference (human-readable)
├── script/
│ ├── init-memgraph-claude.sh # Appends Memgraph section to a repo's CLAUDE.md
│ ├── init-memgraph-codex.sh # Appends Memgraph section to a repo's AGENTS.md (Codex)
│ ├── init-memgraph-gemini.sh # Appends Memgraph section to a repo's AGENTS.md (Gemini)
│ └── init-memgraph-github.sh # Appends Memgraph section to a repo's AGENTS.md (Copilot)
├── template/
│ ├── AI-memgraph-template.md # Template for *.md injection
├── memgraph-platform/
│ └── docker-compose.yml # Memgraph + Lab (with UI)
├── pom.xml # Maven build (shaded fat JAR, spotless-enforced)
└── README.md
License
MIT — see LICENSE.
Acknowledgements
- Evgeniy Voronyuk — testing support
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi