Contents
1. Introduction: The RAG Granularity Problem
Retrieval-Augmented Generation (RAG) systems face a fundamental tradeoff: retrieval granularity vs. contextual coherence. Small chunks (sentences, phrases) enable precise retrieval but lack surrounding context for generation. Large chunks (paragraphs, sections) provide coherent context but introduce noise and dilute relevance signals.
Existing approaches fall into three categories, each with significant drawbacks:
- LLM-guided chunking (Meta-Chunking): Uses LLMs to determine chunk boundaries prescriptively — expensive (O(n) LLM calls per document) and limited to the chunk boundaries the LLM decides.
- Single-level context expansion (Dense X Retrieval, MoC): Retrieves small units then expands with surrounding context. Caps context aggregation to one additional level, missing multi-scale structure.
- Hierarchical summarization (RAPTOR, GraphRAG): Builds multi-level representations via LLM summarization or graph community detection. Introduces information loss through lossy compression and requires costly LLM calls at indexing time.
SproutRAG (Abaskohi, Laradji, West, Carenini — UBC & ServiceNow Research) takes a fundamentally different approach: instead of relying on LLMs to impose structure, it learns document structure directly from inter-sentence attention patterns using a small 1.3B encoder. This learned structure builds a binary chunking tree where parent nodes are progressive embeddings formed by merging semantically aligned child chunks. At retrieval time, a hierarchical beam search navigates this tree to collect candidates at multiple granularities — all without a single LLM call during inference.
The result: 6.1% average improvement in IE (Recall × Precision) over the strongest baseline across four diverse benchmarks spanning scientific, legal, and open-domain documents, with superior generation quality and drastically lower token consumption.
2. Core Architecture: Attention-Guided Tree Construction

The SproutRAG indexing pipeline transforms a flat document into a hierarchical binary tree where each node represents a semantically coherent text unit. The process has three stages:
2.1 Sentence-Level Encoding with SLLM
The document is first split into sentence-level chunks (at most 2 sentences per chunk by default). Each chunk is encoded by a small language model (SLLM) — a 1.3B SentenceVAE model trained end-to-end as part of the SproutRAG framework. For each chunk, the SLLM produces:
- A dense sentence embedding ei for chunk i
- Inter-sentence attention matrices A(l,h) across all L layers and H heads, where A(l,h)ij represents attention from chunk i to chunk j at layer l, head h
2.2 Learned Attention Aggregation
A critical insight of SproutRAG is that not all attention heads and layers capture semantic document structure equally. Uniformly averaging attention across all heads/layers introduces proximity bias — nearby chunks naturally have higher attention regardless of semantic relatedness. SproutRAG addresses this by learning a weighted aggregation:
wl,h = softmax(αl,h)
where αl,h are learnable scalar parameters, one per (layer, head) combination. The aggregated attention matrix is:
M = Σl,h wl,h · A(l,h)
The matrix is then symmetrized to ensure bidirectional coherence:
Mij = (Aij + Aji) / 2
2.3 Progressive Embedding and Tree Construction
The tree is built bottom-up using a greedy merging procedure:
- Initialize: Each sentence chunk is a leaf node.
- Find merge pair: Identify the pair of nodes (u, v) with the highest mutual attention Muv.
- Create parent: Merge into a new parent p. The parent embedding is the average of its children: e(p) = (e(u) + e(v)) / 2.
- Update attention: Single-linkage: Mpr = max(Mur, Mvr).
- Repeat: Until one root node remains.
Each parent node is a progressive embedding — a simple but effective semantic summary formed by averaging child embeddings.

3. Hierarchical Beam Search Retrieval
Given a query q and the constructed tree, SproutRAG retrieves relevant chunks using hierarchical beam search from the root downward:
function SproutRAG_Retrieve(root, q, b, delta, k):
beam = [root]
V_visit = empty_set
while beam is not empty:
children = union(Child(v) for v in beam)
scores = cosine_sim(q, children)
beam = Top_b(children, scores, b)
V_visit = V_visit ∪ beam
C = {v in V_visit | cosine_sim(q, v) >= delta}
return Top_k(C, k)
Key properties:
- Multi-granularity: Collects both broad (high-level) and specific (low-level) candidates
- No LLM calls: All decisions use cosine similarity between query and node embeddings
- Beam width b=5 is optimal from ablation studies
After beam search, candidates are reranked by similarity (using Qwen3-Reranker-4B) and the top-k are passed to the generator.
4. Joint Training Objective
SproutRAG is trained end-to-end with a joint objective that simultaneously optimizes retrieval quality and tree structure:
L = Lret + λ · Lattn
4.1 Retrieval Contrastive Loss
Standard contrastive learning over query-document pairs. For a batch of N query-positive pairs with in-batch negatives:
Lret = -log( exp(sim(q, p+) / τ) / Σi exp(sim(q, pi) / τ) )
Training data: 30K examples from MS MARCO v2.1 train split.
4.2 Attention Structure Regularizer
Given a set G of ground-truth merge pairs:
Lattn = -(1/|G|) · Σ(i,j) ∈ G log( (Aij + Aji) / 2 )
This pulls the learned weights toward heads/layers that capture document structure and away from the uniform baseline.
4.3 Training Details
- SLLM: 1.3B SentenceVAE, AdamW, lr=2e-5
- Aggregation weights: AdamW, lr=1e-3
- Batch: 32, Epochs: 3, λ: 0.1
- Hardware: 8 × A100 80GB
- Base encoder: sentence-transformers/all-MiniLM-L6-v2
5. Benchmark Performance
5.1 Retrieval Results
Table 1 reports retrieval performance in IE = Recall × Precision (averaged @1, 3, 5) across four datasets.
| Dataset | Dense X | Meta-Chunk | RAPTOR | LightRAG | GraphRAG | PropRAG | SproutRAG |
|---|---|---|---|---|---|---|---|
| SCI-DOCS | 31.2 | 32.8 | 34.0 | 32.5 | 30.8 | 33.6 | 38.7 |
| LegalBench-RAG | 33.5 | 35.2 | 35.8 | 34.0 | 32.5 | 36.0 | 40.9 |
| Dragonball | 37.5 | 38.2 | 40.0 | 36.8 | 35.0 | 39.5 | 48.1 |
| MS MARCO | 29.0 | 28.5 | 27.2 | 26.8 | 25.0 | 27.5 | 35.8 |
SproutRAG delivers a consistent advantage: +4.65 on SCI-DOCS, +4.90 on LegalBench-RAG, +8.06 on Dragonball, and +6.83 on MS MARCO.
5.2 End-to-End Generation Results
| Method | F1 | ROUGE-L | METEOR | BERTScore | Tok/Q (K) | Latency (ms) |
|---|---|---|---|---|---|---|
| GraphRAG | 32.8 | 28.5 | 24.6 | 84.3 | 12.80 | 520 |
| ReflectiveRAG | 34.5 | 30.1 | 26.0 | 85.2 | 9.40 | 380 |
| REFRAG | 35.3 | 30.8 | 26.5 | 85.7 | 7.20 | 310 |
| SproutRAG | 37.8 | 33.2 | 29.4 | 87.1 | 4.38 | 193 |
SproutRAG outperforms all baselines on every generation metric while using 2.7× fewer tokens/query (4.38K vs. 12.8K for GraphRAG) with the lowest latency at 193ms.
6. Getting Started
Installation
pip install -e .
pip install -e .[yaml] # YAML config support
pip install -e .[eval] # Evaluation scripts
pip install -e .[spacy] # Spacy sentence splitting
CLI Workflow
# 1. Index documents
sproutrag index --config configs/index.yaml
# 2. Retrieve passages
sproutrag retrieve --config configs/retrieve.yaml
# 3. Generate answers
sproutrag answer --config configs/answer.yaml
# 4. (Optional) Train custom encoder
sproutrag train --config configs/train.yaml
# 5. (Optional) Evaluate
sproutrag evaluate --config configs/evaluate_retrieval.yaml
7. Conclusion & Future Work
SproutRAG demonstrates that learned attention-guided hierarchical indexing can match or exceed the retrieval and generation quality of LLM-based RAG systems while using a fraction of the compute. The combination of a 1.3B SLLM encoder, learned attention aggregation eliminating proximity bias, and hierarchical beam search produces a system that is simultaneously more accurate (+6.1% IE) and more efficient (2.7× fewer tokens) than top alternatives.
Limitations
- Binary tree constraint: Multi-branch trees could capture more complex structures.
- Upfront training: Requires end-to-end training on retrieval data (30K examples).
- Fixed offline tree: Query-dependent traversal could dynamically adapt the structure.
Paper: "SproutRAG: Attention-Guided Tree Search with Progressive Embeddings for Long-Document RAG" — Abaskohi, Laradji, West, Carenini (UBC & ServiceNow Research). Code: github.com/AmirAbaskohi/SproutRAG