Skip to content

RAG Chunking: Where Most Quality Issues Come From

Problem

I built a RAG pipeline to query technical documentation. The documents contained tables, figures, and cross-references like “See Table 3 above.” I used standard fixed-size chunking with 512 tokens and 50-token overlap.

When I asked “What does Table 3 show?”, the RAG returned chunks with partial table data but no header row. When I asked about “the configuration mentioned in Section 2.1”, I got the reference but not the actual configuration.

I spent weeks tuning chunk sizes. 256 tokens gave more precise matches but lost context. 1024 tokens preserved context but diluted semantic meaning. Neither worked consistently.

Then I found a Reddit discussion that confirmed my experience: “If you’ve debugged RAG pipelines you know chunking is where most quality issues come from.”

Why Chunking Destroys Quality

The fundamental problem is this: chunking flattens structured documents into text blobs, destroying the relationships that carry semantic meaning.

Document Structure vs Chunked Blob
BEFORE CHUNKING (Structure):
Section 1
Paragraph A (context for Table 1)
Table 1 (data rows with header)
Figure 1 (visualizes Table 1)
Section 2
"As shown in Table 1 above..."
AFTER CHUNKING (Destroyed):
Chunk 1: [Paragraph A (partial), Table 1 header row]
Chunk 2: [Table 1 data rows (partial), Figure 1 caption]
Chunk 3: [Figure 1 description, Section 2 start]
Chunk 4: ["As shown in Table 1..." - but Table 1 is in Chunk 1]
Query: "What does Figure 1 show?"
Retrieved: Chunk 2 (caption only, no Table 1 context)
Result: Cannot answer because Table 1 is in a different chunk

This isn’t a tuning problem. It’s a structural problem.

Six Limitations That Cause Quality Issues

1. Context Boundary Loss

When documents split at arbitrary boundaries, related content ends up in different chunks. Paragraphs split mid-sentence. Code blocks break across chunks. Key context disappears when retrieval only returns one chunk.

I tried adding chunk overlap to recover context. It helped slightly but doubled storage costs and still didn’t fix structural breaks.

2. Semantic Drift

Fixed-size chunking ignores semantic boundaries. A 512-token chunk might contain the end of one topic, the middle of another, and the beginning of a third.

Semantic Drift Example
Document:
## Authentication Setup
[2 paragraphs about auth configuration]
## API Endpoints
[3 paragraphs about API design]
Chunk at 512 tokens:
[Last paragraph of Authentication]
[All 3 paragraphs of API Endpoints]
[First paragraph of next section]
Query: "How do I configure authentication?"
Vector similarity matches this chunk because it contains "authentication"
But the chunk is mostly about API endpoints
Result: Wrong answer about authentication

Vector embeddings capture mixed semantics, making retrieval unpredictable.

3. Table and Figure Fragmentation

Tables are particularly vulnerable. A table might split so the header row ends up in one chunk and data rows in another. Without the header, the data rows are meaningless.

Table Fragmentation Problem
Original Table:
| Parameter | Value | Description |
|-----------|-------|-------------|
| timeout | 30 | Connection timeout in seconds |
| retries | 3 | Number of retry attempts |
After Chunking:
Chunk A: "| Parameter | Value | Description |"
Chunk B: "| timeout | 30 | Connection timeout..."
Query: "What is the timeout setting?"
Retrieved: Chunk B (data row without header)
Result: Cannot determine which column contains the setting

Figure captions get separated from figures. Code blocks truncate and lose indentation.

4. Broken Cross-Section References

Documents contain internal references: “As shown in Table 3,” “Refer to Section 2.1,” “See Figure 5 on page 12.”

Chunking breaks these relationships. The reference and the target end up in different chunks. Retrieval returns the reference without the target. The LLM cannot resolve the reference.

5. Chunk Size Tuning Trap

No optimal chunk size exists. Different documents, queries, and use cases require different sizes.

I experimented with three sizes:

Chunk Size Trade-offs
Size 256:
+ More precise matches
- More context loss
- More chunks to manage
Size 512:
+ Moderate context
- Still breaks structure
- Common default but not optimal
Size 1024:
+ More context preserved
- Semantic meaning diluted
- Larger chunks contain multiple topics

Overlap helps but doubles storage and doesn’t fix structural breaks. I spent weeks tuning. There’s no right answer.

6. Metadata Loss

Traditional chunking discards section headers, page numbers, formatting, and document hierarchy. Some specialized chunkers attempt to preserve metadata, but this requires extra configuration and isn’t standard practice.

The Workarounds (And Their Costs)

Developers have created patches for chunking problems. Each workaround adds complexity.

Context Enrichment

Retrieve a chunk, then fetch neighboring chunks to recover lost context:

Context Enrichment Workaround
def retrieve_with_context_overlap(vectorstore, retriever, query, num_neighbors=1):
"""
Retrieve chunk, then fetch neighbors before and after.
This PATCHES the context loss problem.
"""
relevant_chunks = retriever.get_relevant_documents(query)
for chunk in relevant_chunks:
current_index = chunk.metadata.get('index')
start_index = max(0, current_index - num_neighbors)
end_index = current_index + num_neighbors + 1
# Fetch neighbors to recover what chunking broke
neighbor_chunks = []
for i in range(start_index, end_index):
neighbor_chunk = get_chunk_by_index(vectorstore, i)
if neighbor_chunk:
neighbor_chunks.append(neighbor_chunk)
# Concatenate to simulate unchunked context
concatenated_text = concatenate_with_overlap(neighbor_chunks)

This works for context loss but doesn’t fix table fragmentation or broken references.

Hierarchical Chunking

Instead of flat chunks, maintain parent-child relationships:

Hierarchical vs Flat Chunking
FLAT CHUNKING:
Chunk 1, Chunk 2, Chunk 3, Chunk 4... (no relationships)
HIERARCHICAL CHUNKING:
Document
Section 1 (parent)
Paragraph A (child)
Table 1 (child)
Figure 1 (child)
Section 2 (parent)
Paragraph B (child)

This preserves structure better but still requires size tuning and doesn’t fully solve reference problems.

Semantic Chunking

Split at semantic boundaries instead of fixed sizes. This helps with semantic drift but still fragments tables and breaks references.

Chunking Approaches Comparison
Approach | Context | Tables | References | Tuning
───────────────────|─────────|────────|────────────|────────
Fixed-size | Broken | Poor | Broken | Yes
Semantic | Partial | Poor | Partial | Less
Hierarchical | OK | Better | Partial | Still
Chunkless RAG | OK | OK | OK | None

When Chunking Still Works

Chunking works fine for simple cases: flat text documents without tables, figures, or cross-references. Q&A over FAQ documents or simple article summarization.

For technical documentation with tables, figures, and internal references, chunking creates problems that workarounds only partially solve.

The Alternative Perspective

The Reddit discussion pointed toward a different approach: “stop flattening documents into chunks, use the structure for retrieval instead.”

Chunkless RAG proposes preserving document structure and using it directly for retrieval. Instead of destroying structure with chunking, maintain the tree and navigate it semantically.

This doesn’t mean chunking is always wrong. It means chunking is the wrong default for structured documents. The quality issues I experienced came from applying a flattening approach to content that relied on structure for meaning.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments