Graph of Thoughts
What It Is
Section titled “What It Is”Graph of Thoughts (GoT) structures LLM reasoning as a directed graph rather than a linear chain. Each node is a partial solution or reasoning step. Edges represent transformations — refinement, aggregation, or branching. This allows the model to explore multiple approaches in parallel, merge the best parts, and iteratively refine, producing higher-quality outputs for complex multi-step problems.
The Problem It Solves
Section titled “The Problem It Solves”Chain-of-thought (CoT) prompting produces a single linear chain of reasoning. If the model makes a wrong step early, the entire chain follows the wrong path. There is no backtracking, no parallel exploration, and no way to combine partial solutions from different approaches.
- Sorting a list: CoT must process linearly. GoT can split the list, sort sub-lists in parallel, and merge.
- Writing code: CoT generates one solution. GoT can explore multiple approaches, evaluate each, and combine the best elements.
- Planning: CoT commits to one plan. GoT can maintain multiple plan branches and converge on the strongest.
Linear reasoning is a bottleneck for problems that benefit from divide-and-conquer, parallel exploration, or iterative refinement.
How It Works
Section titled “How It Works” Initial Thought │ ┌─────┼─────┐ │ │ │ ▼ ▼ ▼ Branch Branch Branch A B C │ │ │ ▼ ▼ ▼ Score Score Score │ │ │ └──┬──┘ │ │ │ ▼ │ Merge(A,B) │ │ │ └────┬───┘ │ ▼ Refine(merged) │ ▼ Final Answer- Decompose — Break the problem into sub-problems or generate alternative approaches.
- Generate — For each branch, produce a partial solution using the LLM.
- Score — Evaluate each branch using the LLM as a judge or an external evaluator.
- Aggregate — Merge the best branches. Aggregation can be simple selection, weighted combination, or LLM-driven synthesis.
- Refine — Pass the aggregated result through one or more refinement steps.
- Repeat — Loop until a quality threshold is met or a fixed number of iterations complete.
The graph is constructed programmatically by an orchestrator, not by the LLM itself. The LLM is called at each node for generation, scoring, and aggregation.
When to Use It
Section titled “When to Use It”- Problems that are naturally decomposable into independent sub-problems (sorting, code generation, document analysis).
- Tasks where exploring multiple approaches and picking the best yields significantly better results than a single linear attempt.
- Scenarios where partial solutions from different approaches can be meaningfully combined.
- Quality-critical tasks where the cost of multiple LLM calls is justified by output quality improvement.
When NOT to Use It
Section titled “When NOT to Use It”- Simple questions with single-step answers. The overhead of graph construction and multiple LLM calls is not justified.
- Latency-sensitive requests. GoT requires multiple sequential and parallel LLM calls, which adds significant latency.
- Tasks where the LLM already achieves near-perfect accuracy with a single call or simple CoT.
- When budget constraints prevent the 3-10x token cost increase over a single call.
Trade-offs
Section titled “Trade-offs”- Token cost — GoT uses 3-10x more tokens than a single LLM call due to branching, scoring, and aggregation. Each node is a separate inference call.
- Latency — Even with parallel branch execution, the sequential depth of the graph (decompose, generate, score, merge, refine) adds significant latency.
- Orchestration complexity — Building the graph, managing state, handling failures at individual nodes, and implementing the scoring/aggregation logic requires substantial engineering.
- Diminishing returns — For many tasks, Tree-of-Thought or even simple best-of-N sampling achieves most of GoT’s quality improvement at lower complexity.
Implementation Example
Section titled “Implementation Example”from dataclasses import dataclass, fieldfrom enum import Enum
class NodeType(Enum): GENERATE = "generate" SCORE = "score" AGGREGATE = "aggregate" REFINE = "refine"
@dataclassclass ThoughtNode: node_id: str node_type: NodeType content: str = "" score: float = 0.0 children: list[str] = field(default_factory=list) parents: list[str] = field(default_factory=list)
class GraphOfThoughts: def __init__(self): self._nodes: dict[str, ThoughtNode] = {} self._counter = 0
def _next_id(self) -> str: self._counter += 1 return f"node_{self._counter}"
def add_thought( self, node_type: NodeType, content: str, parents: list[str] | None = None, ) -> str: node_id = self._next_id() node = ThoughtNode( node_id=node_id, node_type=node_type, content=content, parents=parents or [], ) self._nodes[node_id] = node for parent_id in node.parents: if parent_id in self._nodes: self._nodes[parent_id].children.append(node_id) return node_id
def score_node(self, node_id: str, score: float) -> None: self._nodes[node_id].score = score
def get_top_branches(self, node_ids: list[str], k: int = 2) -> list[str]: scored = [(nid, self._nodes[nid].score) for nid in node_ids] scored.sort(key=lambda x: x[1], reverse=True) return [nid for nid, _ in scored[:k]]
def get_node(self, node_id: str) -> ThoughtNode: return self._nodes[node_id]
def solve_with_got(problem: str, num_branches: int = 3) -> str: graph = GraphOfThoughts() root = graph.add_thought(NodeType.GENERATE, problem)
branch_ids = [] for i in range(num_branches): branch_content = f"Approach {i + 1} for: {problem}" bid = graph.add_thought(NodeType.GENERATE, branch_content, [root]) graph.score_node(bid, evaluate_branch(branch_content)) branch_ids.append(bid)
top_ids = graph.get_top_branches(branch_ids, k=2) top_contents = [graph.get_node(nid).content for nid in top_ids] merged = f"Combined solution from: {'; '.join(top_contents)}" merge_id = graph.add_thought(NodeType.AGGREGATE, merged, top_ids)
refined = f"Refined: {merged}" graph.add_thought(NodeType.REFINE, refined, [merge_id]) return refined
def evaluate_branch(content: str) -> float: return len(content) / 100.0Tool Landscape
Section titled “Tool Landscape”| Tool | Type | Notes |
|---|---|---|
| LangGraph | Framework | Arbitrary graph-based LLM workflows with state management |
| Microsoft AutoGen | Multi-agent framework | Supports branching conversation patterns between agents |
| DSPy | Optimization framework | Composable modules that can implement GoT-style pipelines |
| Custom orchestrator | DIY | Most GoT implementations are bespoke due to problem-specific graph structures |
Related Patterns
Section titled “Related Patterns”- GraphRAG — Graph structure for retrieval context. GoT is graph structure for reasoning.
- Entity Resolution Graph — Complementary graph pattern focused on data quality rather than reasoning.