SproutRAG: Attention-Guided Tree Search With Progressive Embeddings

1. Introduction: The RAG Granularity Problem

Retrieval-Augmented Generation (RAG) systems face a fundamental tradeoff: retrieval granularity vs. contextual coherence. Small chunks (sentences, phrases) enable precise retrieval but lack surrounding context for generation. Large chunks (paragraphs, sections) provide coherent context but introduce noise and dilute relevance signals.

Existing approaches fall into three categories, each with significant drawbacks:

LLM-guided chunking (Meta-Chunking): Uses LLMs to determine chunk boundaries prescriptively — expensive (O(n) LLM calls per document) and limited to the chunk boundaries the LLM decides.
Single-level context expansion (Dense X Retrieval, MoC): Retrieves small units then expands with surrounding context. Caps context aggregation to one additional level, missing multi-scale structure.
Hierarchical summarization (RAPTOR, GraphRAG): Builds multi-level representations via LLM summarization or graph community detection. Introduces information loss through lossy compression and requires costly LLM calls at indexing time.

SproutRAG (Abaskohi, Laradji, West, Carenini — UBC & ServiceNow Research) takes a fundamentally different approach: instead of relying on LLMs to impose structure, it learns document structure directly from inter-sentence attention patterns using a small 1.3B encoder. This learned structure builds a binary chunking tree where parent nodes are progressive embeddings formed by merging semantically aligned child chunks. At retrieval time, a hierarchical beam search navigates this tree to collect candidates at multiple granularities — all without a single LLM call during inference.

The result: 6.1% average improvement in IE (Recall × Precision) over the strongest baseline across four diverse benchmarks spanning scientific, legal, and open-domain documents, with superior generation quality and drastically lower token consumption.

2. Core Architecture: Attention-Guided Tree Construction

The SproutRAG indexing pipeline transforms a flat document into a hierarchical binary tree where each node represents a semantically coherent text unit. The process has three stages:

2.1 Sentence-Level Encoding with SLLM

The document is first split into sentence-level chunks (at most 2 sentences per chunk by default). Each chunk is encoded by a small language model (SLLM) — a 1.3B SentenceVAE model trained end-to-end as part of the SproutRAG framework. For each chunk, the SLLM produces:

A dense sentence embedding e_i for chunk i
Inter-sentence attention matrices A^(l,h) across all L layers and H heads, where A^(l,h)_ij represents attention from chunk i to chunk j at layer l, head h

2.2 Learned Attention Aggregation

A critical insight of SproutRAG is that not all attention heads and layers capture semantic document structure equally. Uniformly averaging attention across all heads/layers introduces proximity bias — nearby chunks naturally have higher attention regardless of semantic relatedness. SproutRAG addresses this by learning a weighted aggregation:

w_l,h = softmax(α_l,h)

where α_l,h are learnable scalar parameters, one per (layer, head) combination. The aggregated attention matrix is:

M = Σ_l,h w_l,h · A^(l,h)

The matrix is then symmetrized to ensure bidirectional coherence:

M_ij = (A_ij + A_ji) / 2

2.3 Progressive Embedding and Tree Construction

The tree is built bottom-up using a greedy merging procedure:

Initialize: Each sentence chunk is a leaf node.
Find merge pair: Identify the pair of nodes (u, v) with the highest mutual attention M_uv.
Create parent: Merge into a new parent p. The parent embedding is the average of its children: e(p) = (e(u) + e(v)) / 2.
Update attention: Single-linkage: M_pr = max(M_ur, M_vr).
Repeat: Until one root node remains.

Each parent node is a progressive embedding — a simple but effective semantic summary formed by averaging child embeddings.

3. Hierarchical Beam Search Retrieval

Given a query q and the constructed tree, SproutRAG retrieves relevant chunks using hierarchical beam search from the root downward:

function SproutRAG_Retrieve(root, q, b, delta, k):
    beam = [root]
    V_visit = empty_set
    while beam is not empty:
        children = union(Child(v) for v in beam)
        scores = cosine_sim(q, children)
        beam = Top_b(children, scores, b)
        V_visit = V_visit ∪ beam
    C = {v in V_visit | cosine_sim(q, v) >= delta}
    return Top_k(C, k)

Key properties:

Multi-granularity: Collects both broad (high-level) and specific (low-level) candidates
No LLM calls: All decisions use cosine similarity between query and node embeddings
Beam width b=5 is optimal from ablation studies

After beam search, candidates are reranked by similarity (using Qwen3-Reranker-4B) and the top-k are passed to the generator.

4. Joint Training Objective

SproutRAG is trained end-to-end with a joint objective that simultaneously optimizes retrieval quality and tree structure:

L = L_ret + λ · L_attn

4.1 Retrieval Contrastive Loss

Standard contrastive learning over query-document pairs. For a batch of N query-positive pairs with in-batch negatives:

L_ret = -log( exp(sim(q, p⁺) / τ) / Σ_i exp(sim(q, p_i) / τ) )

Training data: 30K examples from MS MARCO v2.1 train split.

4.2 Attention Structure Regularizer

Given a set G of ground-truth merge pairs:

L_attn = -(1/|G|) · Σ_{(i,j) ∈ G} log( (A_ij + A_ji) / 2 )

This pulls the learned weights toward heads/layers that capture document structure and away from the uniform baseline.

4.3 Training Details

SLLM: 1.3B SentenceVAE, AdamW, lr=2e-5
Aggregation weights: AdamW, lr=1e-3
Batch: 32, Epochs: 3, λ: 0.1
Hardware: 8 × A100 80GB
Base encoder: sentence-transformers/all-MiniLM-L6-v2

5. Benchmark Performance

5.1 Retrieval Results

Table 1 reports retrieval performance in IE = Recall × Precision (averaged @1, 3, 5) across four datasets.

Dataset	Dense X	Meta-Chunk	RAPTOR	LightRAG	GraphRAG	PropRAG	SproutRAG
SCI-DOCS	31.2	32.8	34.0	32.5	30.8	33.6	38.7
LegalBench-RAG	33.5	35.2	35.8	34.0	32.5	36.0	40.9
Dragonball	37.5	38.2	40.0	36.8	35.0	39.5	48.1
MS MARCO	29.0	28.5	27.2	26.8	25.0	27.5	35.8

SproutRAG delivers a consistent advantage: +4.65 on SCI-DOCS, +4.90 on LegalBench-RAG, +8.06 on Dragonball, and +6.83 on MS MARCO.

5.2 End-to-End Generation Results

Method	F1	ROUGE-L	METEOR	BERTScore	Tok/Q (K)	Latency (ms)
GraphRAG	32.8	28.5	24.6	84.3	12.80	520
ReflectiveRAG	34.5	30.1	26.0	85.2	9.40	380
REFRAG	35.3	30.8	26.5	85.7	7.20	310
SproutRAG	37.8	33.2	29.4	87.1	4.38	193

SproutRAG outperforms all baselines on every generation metric while using 2.7× fewer tokens/query (4.38K vs. 12.8K for GraphRAG) with the lowest latency at 193ms.

6. Getting Started

Installation

pip install -e .
pip install -e .[yaml]   # YAML config support
pip install -e .[eval]    # Evaluation scripts
pip install -e .[spacy]   # Spacy sentence splitting

CLI Workflow

# 1. Index documents
sproutrag index --config configs/index.yaml

# 2. Retrieve passages
sproutrag retrieve --config configs/retrieve.yaml

# 3. Generate answers
sproutrag answer --config configs/answer.yaml

# 4. (Optional) Train custom encoder
sproutrag train --config configs/train.yaml

# 5. (Optional) Evaluate
sproutrag evaluate --config configs/evaluate_retrieval.yaml

7. Conclusion & Future Work

SproutRAG demonstrates that learned attention-guided hierarchical indexing can match or exceed the retrieval and generation quality of LLM-based RAG systems while using a fraction of the compute. The combination of a 1.3B SLLM encoder, learned attention aggregation eliminating proximity bias, and hierarchical beam search produces a system that is simultaneously more accurate (+6.1% IE) and more efficient (2.7× fewer tokens) than top alternatives.

Limitations

Binary tree constraint: Multi-branch trees could capture more complex structures.
Upfront training: Requires end-to-end training on retrieval data (30K examples).
Fixed offline tree: Query-dependent traversal could dynamically adapt the structure.

Paper: "SproutRAG: Attention-Guided Tree Search with Progressive Embeddings for Long-Document RAG" — Abaskohi, Laradji, West, Carenini (UBC & ServiceNow Research). Code: github.com/AmirAbaskohi/SproutRAG

1. Introduction: The RAG Granularity Problem

2. Core Architecture: Attention-Guided Tree Construction

2.1 Sentence-Level Encoding with SLLM

2.2 Learned Attention Aggregation

2.3 Progressive Embedding and Tree Construction

3. Hierarchical Beam Search Retrieval

4. Joint Training Objective

4.1 Retrieval Contrastive Loss

4.2 Attention Structure Regularizer

4.3 Training Details

5. Benchmark Performance

5.1 Retrieval Results

5.2 End-to-End Generation Results

6. Getting Started

Installation

CLI Workflow

7. Conclusion & Future Work

Limitations

Like this:

Comments

Leave a Reply Cancel reply

1. Introduction: The RAG Granularity Problem

2. Core Architecture: Attention-Guided Tree Construction

2.1 Sentence-Level Encoding with SLLM

2.2 Learned Attention Aggregation

2.3 Progressive Embedding and Tree Construction

3. Hierarchical Beam Search Retrieval

4. Joint Training Objective

4.1 Retrieval Contrastive Loss

4.2 Attention Structure Regularizer

4.3 Training Details

5. Benchmark Performance

5.1 Retrieval Results

5.2 End-to-End Generation Results

6. Getting Started

Installation

CLI Workflow

7. Conclusion & Future Work

Limitations

Share this:

Like this:

Sponsored Links

Comments

Leave a Reply Cancel reply