GraphRAG
What It Is
Section titled “What It Is”GraphRAG combines traditional vector-based retrieval with a knowledge graph to provide structured, relationship-aware context to LLMs. Instead of retrieving isolated text chunks, GraphRAG traverses entity relationships in a graph to assemble context that spans multiple documents and captures how concepts connect.
The Problem It Solves
Section titled “The Problem It Solves”Standard RAG retrieves chunks of text that are semantically similar to the query. This works for factual lookups but breaks down when the answer requires connecting information across multiple documents or understanding relationships between entities.
- “Which team members have worked on projects that use the same technology stack?” requires joining person-project-technology relationships.
- “What are the downstream effects of deprecating Service X?” requires traversing a dependency graph.
- “Compare the approaches used by Team A and Team B for authentication” requires pulling and connecting information from separate sources.
Vector similarity alone cannot express graph traversals, joins, or multi-hop reasoning.
How It Works
Section titled “How It Works” Query │ ┌─────┴──────┐ │ │ ▼ ▼ Entity Vector Extraction Search │ │ ▼ │ Knowledge │ Graph │ Traversal │ │ │ └──────┬──────┘ │ Context Assembly │ ▼ LLM Generation- Entity extraction — Extract entities and intent from the query using NER or an LLM.
- Graph traversal — Look up extracted entities in the knowledge graph and traverse relationships to gather structured context (neighbors, paths, subgraphs).
- Vector retrieval — Run standard embedding search in parallel for unstructured context.
- Context assembly — Merge graph-derived facts (entity properties, relationships, paths) with vector-retrieved passages into a structured prompt.
- Generation — Send the enriched context to the LLM for answer generation.
The knowledge graph can be prebuilt from documents using an entity extraction pipeline, or maintained as a curated knowledge base.
When to Use It
Section titled “When to Use It”- Queries frequently require connecting information across multiple documents.
- Your domain has clear entity relationships (org charts, dependency maps, product catalogs, medical knowledge bases).
- Users ask comparative, relational, or multi-hop questions.
- You need explainable retrieval — the graph traversal path shows why certain context was included.
- Standard RAG recall is poor for relationship queries despite good chunk quality.
When NOT to Use It
Section titled “When NOT to Use It”- Queries are simple factual lookups that vector search handles well. GraphRAG adds significant complexity for marginal improvement on single-hop questions.
- You do not have clean, structured data to build a knowledge graph from. Garbage-in-garbage-out applies even more to graphs than to vector stores.
- The knowledge base changes so frequently that maintaining a graph is impractical. Graph construction and updates are expensive.
- Your team cannot invest in graph infrastructure (Neo4j, Neptune, or similar) and the associated operational overhead.
Trade-offs
Section titled “Trade-offs”- Graph construction cost — Building and maintaining a knowledge graph from unstructured documents requires an entity extraction pipeline, which introduces its own error rate and latency.
- Query routing complexity — Not every query benefits from graph traversal. You need a router to decide when to activate the graph path vs. pure vector retrieval.
- Staleness risk — The knowledge graph can drift from source documents if the extraction pipeline is not run on every update.
- Infrastructure overhead — Requires a graph database in addition to the vector store. Two systems to maintain, monitor, and back up.
Implementation Example
Section titled “Implementation Example”from dataclasses import dataclass, field
@dataclassclass Entity: name: str entity_type: str properties: dict = field(default_factory=dict)
@dataclassclass Relationship: source: str target: str relation_type: str properties: dict = field(default_factory=dict)
class KnowledgeGraph: def __init__(self): self._entities: dict[str, Entity] = {} self._adjacency: dict[str, list[Relationship]] = {}
def add_entity(self, entity: Entity) -> None: self._entities[entity.name] = entity if entity.name not in self._adjacency: self._adjacency[entity.name] = []
def add_relationship(self, rel: Relationship) -> None: if rel.source not in self._adjacency: self._adjacency[rel.source] = [] self._adjacency[rel.source].append(rel)
def get_neighbors(self, entity_name: str, max_depth: int = 2) -> list[dict]: visited: set[str] = set() results: list[dict] = [] queue: list[tuple[str, int]] = [(entity_name, 0)]
while queue: current, depth = queue.pop(0) if current in visited or depth > max_depth: continue visited.add(current)
for rel in self._adjacency.get(current, []): results.append({ "source": current, "relation": rel.relation_type, "target": rel.target, "depth": depth, }) if rel.target not in visited: queue.append((rel.target, depth + 1))
return results
def get_entity(self, name: str) -> Entity | None: return self._entities.get(name)
def extract_entities(query: str) -> list[str]: return [token for token in query.split() if token[0].isupper()]
def build_graph_context( graph: KnowledgeGraph, entities: list[str], max_depth: int = 2,) -> str: context_lines = [] for entity_name in entities: entity = graph.get_entity(entity_name) if not entity: continue context_lines.append(f"Entity: {entity.name} ({entity.entity_type})") neighbors = graph.get_neighbors(entity_name, max_depth) for rel in neighbors: context_lines.append( f" {rel['source']} --[{rel['relation']}]--> {rel['target']}" ) return "\n".join(context_lines)
def graph_rag_query( query: str, graph: KnowledgeGraph, vector_results: list[str],) -> str: entities = extract_entities(query) graph_context = build_graph_context(graph, entities) vector_context = "\n".join(vector_results) prompt = ( f"Answer the following question using the provided context.\n\n" f"Graph Context (entity relationships):\n{graph_context}\n\n" f"Document Context (retrieved passages):\n{vector_context}\n\n" f"Question: {query}" ) return promptTool Landscape
Section titled “Tool Landscape”| Tool | Type | Notes |
|---|---|---|
| Neo4j + LangChain | Graph DB + framework | Native Cypher queries combined with LLM chains |
| Microsoft GraphRAG | Open-source library | Builds community summaries from documents, designed for global queries |
| Amazon Neptune | Managed graph DB | Enterprise-grade, integrates with AWS Bedrock for LLM |
| FalkorDB | Open-source graph DB | Low-latency graph queries optimized for RAG workloads |
| LlamaIndex Knowledge Graph | Framework | Property graph index with automatic entity extraction |
Related Patterns
Section titled “Related Patterns”- Hybrid Search — Combine with GraphRAG for both semantic and structural retrieval.
- Entity Resolution Graph — Clean entity data is a prerequisite for a useful knowledge graph.
- Graph of Thoughts — Graph reasoning at inference time complements graph-structured retrieval.