RAG Chunking: Where Most Quality Issues Come From
Problem
I built a RAG pipeline to query technical documentation. The documents contained tables, figures, and cross-references like “See Table 3 above.” I used standard fixed-size chunking with 512 tokens and 50-token overlap.
When I asked “What does Table 3 show?”, the RAG returned chunks with partial table data but no header row. When I asked about “the configuration mentioned in Section 2.1”, I got the reference but not the actual configuration.
I spent weeks tuning chunk sizes. 256 tokens gave more precise matches but lost context. 1024 tokens preserved context but diluted semantic meaning. Neither worked consistently.
Then I found a Reddit discussion that confirmed my experience: “If you’ve debugged RAG pipelines you know chunking is where most quality issues come from.”
Why Chunking Destroys Quality
The fundamental problem is this: chunking flattens structured documents into text blobs, destroying the relationships that carry semantic meaning.
BEFORE CHUNKING (Structure): Section 1 Paragraph A (context for Table 1) Table 1 (data rows with header) Figure 1 (visualizes Table 1) Section 2 "As shown in Table 1 above..."
AFTER CHUNKING (Destroyed): Chunk 1: [Paragraph A (partial), Table 1 header row] Chunk 2: [Table 1 data rows (partial), Figure 1 caption] Chunk 3: [Figure 1 description, Section 2 start] Chunk 4: ["As shown in Table 1..." - but Table 1 is in Chunk 1]
Query: "What does Figure 1 show?"Retrieved: Chunk 2 (caption only, no Table 1 context)Result: Cannot answer because Table 1 is in a different chunkThis isn’t a tuning problem. It’s a structural problem.
Six Limitations That Cause Quality Issues
1. Context Boundary Loss
When documents split at arbitrary boundaries, related content ends up in different chunks. Paragraphs split mid-sentence. Code blocks break across chunks. Key context disappears when retrieval only returns one chunk.
I tried adding chunk overlap to recover context. It helped slightly but doubled storage costs and still didn’t fix structural breaks.
2. Semantic Drift
Fixed-size chunking ignores semantic boundaries. A 512-token chunk might contain the end of one topic, the middle of another, and the beginning of a third.
Document: ## Authentication Setup [2 paragraphs about auth configuration]
## API Endpoints [3 paragraphs about API design]
Chunk at 512 tokens: [Last paragraph of Authentication] [All 3 paragraphs of API Endpoints] [First paragraph of next section]
Query: "How do I configure authentication?"Vector similarity matches this chunk because it contains "authentication"But the chunk is mostly about API endpointsResult: Wrong answer about authenticationVector embeddings capture mixed semantics, making retrieval unpredictable.
3. Table and Figure Fragmentation
Tables are particularly vulnerable. A table might split so the header row ends up in one chunk and data rows in another. Without the header, the data rows are meaningless.
Original Table: | Parameter | Value | Description | |-----------|-------|-------------| | timeout | 30 | Connection timeout in seconds | | retries | 3 | Number of retry attempts |
After Chunking: Chunk A: "| Parameter | Value | Description |" Chunk B: "| timeout | 30 | Connection timeout..."
Query: "What is the timeout setting?"Retrieved: Chunk B (data row without header)Result: Cannot determine which column contains the settingFigure captions get separated from figures. Code blocks truncate and lose indentation.
4. Broken Cross-Section References
Documents contain internal references: “As shown in Table 3,” “Refer to Section 2.1,” “See Figure 5 on page 12.”
Chunking breaks these relationships. The reference and the target end up in different chunks. Retrieval returns the reference without the target. The LLM cannot resolve the reference.
5. Chunk Size Tuning Trap
No optimal chunk size exists. Different documents, queries, and use cases require different sizes.
I experimented with three sizes:
Size 256: + More precise matches - More context loss - More chunks to manage
Size 512: + Moderate context - Still breaks structure - Common default but not optimal
Size 1024: + More context preserved - Semantic meaning diluted - Larger chunks contain multiple topicsOverlap helps but doubles storage and doesn’t fix structural breaks. I spent weeks tuning. There’s no right answer.
6. Metadata Loss
Traditional chunking discards section headers, page numbers, formatting, and document hierarchy. Some specialized chunkers attempt to preserve metadata, but this requires extra configuration and isn’t standard practice.
The Workarounds (And Their Costs)
Developers have created patches for chunking problems. Each workaround adds complexity.
Context Enrichment
Retrieve a chunk, then fetch neighboring chunks to recover lost context:
def retrieve_with_context_overlap(vectorstore, retriever, query, num_neighbors=1): """ Retrieve chunk, then fetch neighbors before and after. This PATCHES the context loss problem. """ relevant_chunks = retriever.get_relevant_documents(query)
for chunk in relevant_chunks: current_index = chunk.metadata.get('index') start_index = max(0, current_index - num_neighbors) end_index = current_index + num_neighbors + 1
# Fetch neighbors to recover what chunking broke neighbor_chunks = [] for i in range(start_index, end_index): neighbor_chunk = get_chunk_by_index(vectorstore, i) if neighbor_chunk: neighbor_chunks.append(neighbor_chunk)
# Concatenate to simulate unchunked context concatenated_text = concatenate_with_overlap(neighbor_chunks)This works for context loss but doesn’t fix table fragmentation or broken references.
Hierarchical Chunking
Instead of flat chunks, maintain parent-child relationships:
FLAT CHUNKING: Chunk 1, Chunk 2, Chunk 3, Chunk 4... (no relationships)
HIERARCHICAL CHUNKING: Document Section 1 (parent) Paragraph A (child) Table 1 (child) Figure 1 (child) Section 2 (parent) Paragraph B (child)This preserves structure better but still requires size tuning and doesn’t fully solve reference problems.
Semantic Chunking
Split at semantic boundaries instead of fixed sizes. This helps with semantic drift but still fragments tables and breaks references.
Approach | Context | Tables | References | Tuning───────────────────|─────────|────────|────────────|────────Fixed-size | Broken | Poor | Broken | YesSemantic | Partial | Poor | Partial | LessHierarchical | OK | Better | Partial | StillChunkless RAG | OK | OK | OK | NoneWhen Chunking Still Works
Chunking works fine for simple cases: flat text documents without tables, figures, or cross-references. Q&A over FAQ documents or simple article summarization.
For technical documentation with tables, figures, and internal references, chunking creates problems that workarounds only partially solve.
The Alternative Perspective
The Reddit discussion pointed toward a different approach: “stop flattening documents into chunks, use the structure for retrieval instead.”
Chunkless RAG proposes preserving document structure and using it directly for retrieval. Instead of destroying structure with chunking, maintain the tree and navigate it semantically.
This doesn’t mean chunking is always wrong. It means chunking is the wrong default for structured documents. The quality issues I experienced came from applying a flattening approach to content that relied on structure for meaning.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit: Docling Agent + Chunkless RAG Discussion
- 👨💻 RAG Techniques: Context Enrichment
- 👨💻 Haiku RAG Documentation
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments