Skip to main content

Technical Blog

Hello GraphRAG: Building a Graph-Aware RAG Pipeline with Neo4j

graphragneo4jpythonrag

A step-by-step walkthrough of wiring a minimal GraphRAG prototype in Python using Neo4j and an LLM.

Retrieval-Augmented Generation (RAG) is great when your knowledge lives in documents. But once relationships start to matter — who depends on what, which component talks to which — you usually outgrow a simple vector store.

That is where GraphRAG comes in.

Why GraphRAG?

A graph lets you model:

  • Entities: machines, components, people, documents
  • Relations: DEPENDS_ON, BUILT_BY, LOCATED_IN
  • Context: paths across the graph that give you why something matters

Instead of retrieving a single chunk, you can walk the graph to gather a small, high-signal subgraph and feed that into your prompt.

Minimal Python setup

Below is a tiny (but realistic) starting point for working with Neo4j in Python.

from neo4j import GraphDatabase
from dataclasses import dataclass
from typing import List
 
 
@dataclass
class GraphNode:
    id: str
    label: str
    properties: dict
 
 
class GraphRAGClient:
    def __init__(self, uri: str, user: str, password: str):
        self._driver = GraphDatabase.driver(uri, auth=(user, password))
 
    def close(self) -> None:
        self._driver.close()
 
    def query_subgraph(self, component_id: str, depth: int = 2) -> List[GraphNode]:
        cypher = """
        MATCH p = (c:Component {id: $component_id})-[*1..$depth]-(n)
        WITH nodes(p) AS ns
        UNWIND ns AS n
        RETURN DISTINCT n
        """
        with self._driver.session() as session:
            records = session.run(cypher, component_id=component_id, depth=depth)
            return [
                GraphNode(
                    id=record["n"].get("id"),
                    label=list(record["n"].labels)[0],
                    properties=dict(record["n"])
                )
                for record in records
            ]
 
 
if __name__ == "__main__":
    client = GraphRAGClient(
        uri="neo4j+s://demo-instance.databases.neo4j.io",
        user="neo4j",
        password="demo-password",
    )
 
    nodes = client.query_subgraph(component_id="pump-42", depth=2)
    for node in nodes:
        print(node.label, node.id, node.properties.get("name"))
 
    client.close()

This is not production-ready, but it is intentionally compact:

  • It gives you a GraphRAGClient abstraction you can grow over time.
  • It demonstrates a useful graph query pattern for local neighborhoods.
  • It is small enough to paste into a notebook and start experimenting today.

From here you can:

  1. Add embedding-backed search to find the starting nodes.
  2. Expand the subgraph with domain-specific traversals.
  3. Format that subgraph into a prompt for your LLM of choice.