What is Embedding Dimension? How I Learned to Stop Over-Provisioning My Vector Database

Mar 26, 2026

1. Purpose

I recently hit a storage wall with my vector database. After loading 2 million documents into Pinecone, my monthly bill looked like a phone number. When I investigated, I realized the culprit wasn’t my data volume—it was my embedding dimension choice.

This post explains what embedding dimension is, why the choice matters, and how I learned to match dimensions to actual use cases instead of blindly picking the largest available.

2. The Problem: My Vector Database Was Too Fat

I was building a semantic search system for a client’s documentation portal. Here’s what my initial setup looked like:

from openai import OpenAI
import chromadb

client = OpenAI()
db = chromadb.Client()

# I picked OpenAI's ada-002 because "bigger is better", right?
def embed_texts(texts):
    response = client.embeddings.create(
        model="text-embedding-ada-002",  # 1536 dimensions
        input=texts
    )
    return [item.embedding for item in response.data]

# Embedding 100k documents
documents = load_documents()  # 100,000 docs
embeddings = embed_texts(documents)

# Storage calculation I didn't do upfront:
# 100,000 docs * 1536 floats * 4 bytes/float = ~614 MB raw
# Plus indexing overhead = ~1.8 GB
print(f"Embedding shape: {len(embeddings[0])}")  # 1536

I chose ada-002 (1536 dimensions) because it was the “standard” choice. But for simple semantic search over technical documentation, I was over-provisioning by a factor of 4.

The client’s docs were mostly API references, configuration guides, and troubleshooting steps. They didn’t need subtle semantic nuance—they needed fast, accurate keyword-ish matching with some synonym awareness.

3. What Is Embedding Dimension, Really?

After my storage shock, I dove into understanding what dimension actually means.

Embedding dimension is simply the length of the vector that represents your data. If you have a 384-dimensional embedding, you have a list of 384 floating-point numbers. Each number captures some learned feature of your input.

Here’s how I visualized it:

Small dimension (384):
[0.12, -0.45, 0.78, ..., 0.33]  # 384 numbers
-> Less semantic detail captured
-> Lower storage (384 * 4 = 1,536 bytes per embedding)
-> Faster similarity search

Large dimension (1536):
[0.12, -0.45, 0.78, ..., -0.21]  # 1536 numbers
-> More semantic detail captured
-> Higher storage (1536 * 4 = 6,144 bytes per embedding)
-> Slower similarity search

3.1 Common Models and Their Dimensions

I created a quick reference table:

Model	Dimension	When to Use
all-MiniLM-L6-v2	384	Simple search, edge devices
all-mpnet-base-v2	768	Balanced workloads
text-embedding-ada-002	1536	Complex RAG, nuanced content
Cohere embed-v3	1024	Enterprise search
Gemini Embedding	Varies	Google AI applications

4. Testing Different Dimensions

I ran a practical comparison to see the real impact:

from sentence_transformers import SentenceTransformer
import numpy as np
import time

# Test documents
docs = [
    "How to reset the admin password",
    "Changing administrator credentials",
    "Password recovery for admin users",
    "Configuring database connections",
    "Setting up MySQL connection pool"
]

# Model 1: Small dimension (384)
model_small = SentenceTransformer('all-MiniLM-L6-v2')
embeddings_small = model_small.encode(docs)

# Model 2: Large dimension (768)
model_large = SentenceTransformer('all-mpnet-base-v2')
embeddings_large = model_large.encode(docs)

print(f"Small model dimension: {embeddings_small.shape[1]}")  # 384
print(f"Large model dimension: {embeddings_large.shape[1]}")  # 768
print(f"Storage ratio: {embeddings_large.nbytes / embeddings_small.nbytes:.1f}x")

# Compare similarity results
from sklearn.metrics.pairwise import cosine_similarity

def find_similar(query, embeddings, model):
    query_emb = model.encode([query])
    similarities = cosine_similarity(query_emb, embeddings)[0]
    return sorted(zip(docs, similarities), key=lambda x: x[1], reverse=True)

query = "reset admin password"
print("\nSmall model results:")
for doc, score in find_similar(query, embeddings_small, model_small)[:3]:
    print(f"  {score:.3f}: {doc}")

print("\nLarge model results:")
for doc, score in find_similar(query, embeddings_large, model_large)[:3]:
    print(f"  {score:.3f}: {doc}")

Output:

Small model dimension: 384
Large model dimension: 768
Storage ratio: 2.0x

Small model results:
  0.892: How to reset the admin password
  0.745: Password recovery for admin users
  0.612: Changing administrator credentials

Large model results:
  0.912: How to reset the admin password
  0.768: Password recovery for admin users
  0.634: Changing administrator credentials

Both models found the right answer. The larger model had slightly higher similarity scores, but the ranking was identical. For my use case, that extra 2% accuracy wasn’t worth 2x the storage.

5. Why Dimension Choice Matters

The trade-offs became clear when I mapped them out:

5.1 Larger Dimensions (1024-3072+)

Pros:

Capture subtle semantic distinctions
Better for long-form, complex content
Superior for cross-lingual retrieval

Cons:

Linear storage increase (1536 dims = 4x storage of 384)
Slower vector similarity calculations
More GPU memory during inference
Higher API costs (some providers charge per token in embedding)

5.2 Smaller Dimensions (384-512)

Pros:

Fast indexing and queries
Lower storage footprint
Better for edge/mobile deployment
Often “good enough” for simple tasks

Cons:

May miss subtle semantic nuances
Poorer performance on cross-lingual tasks
Less effective for very long documents

6. Common Mistakes I Made

Mistake 1: Over-Dimensioning

I was using 1536 dimensions for simple classification. Here’s what I should have done:

def recommend_dimension(use_case: str) -> dict:
    """Match dimension to actual needs."""

    recommendations = {
        "semantic_search_simple": {
            "dimension": 384,
            "model": "all-MiniLM-L6-v2",
            "reason": "Fast queries, sufficient for simple search"
        },
        "rag_general": {
            "dimension": 1536,
            "model": "text-embedding-ada-002",
            "reason": "Captures nuanced context for Q&A systems"
        },
        "semantic_deduplication": {
            "dimension": 768,
            "model": "all-mpnet-base-v2",
            "reason": "Balances precision and performance"
        },
        "edge_deployment": {
            "dimension": 256,
            "model": "all-MiniLM-L6-v2 (with PCA)",
            "reason": "Optimized for constrained devices"
        }
    }

    return recommendations.get(use_case, {
        "dimension": 768,
        "model": "all-mpnet-base-v2",
        "reason": "Safe default for most applications"
    })

# For my docs portal:
print(recommend_dimension("semantic_search_simple"))
# {'dimension': 384, 'model': 'all-MiniLM-L6-v2', 'reason': 'Fast queries, sufficient for simple search'}

Mistake 2: Ignoring Dimensionality Reduction

I didn’t know I could reduce dimensions after the fact:

from sklearn.decomposition import PCA
import numpy as np

# Start with 768-dim embeddings
embeddings_768 = model_large.encode(docs)

# Reduce to 256 dimensions with PCA
pca = PCA(n_components=256)
embeddings_256 = pca.fit_transform(embeddings_768)

print(f"Original: {embeddings_768.shape}")
print(f"Reduced: {embeddings_256.shape}")
print(f"Explained variance: {sum(pca.explained_variance_ratio_):.2%}")

Mistake 3: Mixing Dimensions in the Same Collection

Vector databases typically require uniform dimensions. I tried mixing models and got errors:

import chromadb

client = chromadb.Client()
collection = client.create_collection(name="mixed_docs")

# This works fine
collection.add(
    documents=["doc1", "doc2"],
    embeddings=[[0.1] * 384, [0.2] * 384],  # 384 dimensions
    ids=["1", "2"]
)

# This throws an error - wrong dimension!
# collection.add(
#     documents=["doc3"],
#     embeddings=[[0.1] * 1536],  # 1536 dimensions - MISMATCH
#     ids=["3"]
# )

7. The Solution: Right-Sizing My Embeddings

For the documentation portal, I switched to a smaller model:

from sentence_transformers import SentenceTransformer
import chromadb

# Use smaller dimension model for simple semantic search
model = SentenceTransformer('all-MiniLM-L6-v2')  # 384 dimensions
db = chromadb.Client()
collection = db.create_collection(name="docs")

def embed_and_store(texts):
    embeddings = model.encode(texts)
    collection.add(
        documents=texts,
        embeddings=embeddings.tolist(),
        ids=[str(i) for i in range(len(texts))]
    )

# Storage comparison:
# Before: 100k docs * 1536 * 4 bytes = 614 MB raw
# After:  100k docs * 384 * 4 bytes = 153 MB raw
# Savings: 75% reduction

print(f"New embedding dimension: {model.get_sentence_embedding_dimension()}")
# Output: New embedding dimension: 384

The search quality remained excellent for the documentation use case, and my storage costs dropped by 75%.

8. When to Actually Use Larger Dimensions

I don’t want to swing too far the other way. Larger dimensions are genuinely necessary for:

RAG over legal/medical documents: Nuance matters enormously
Cross-lingual retrieval: Different languages need more semantic space
Long-form content analysis: Book summaries, research papers
Fine-grained sentiment: Distinguishing “good” from “excellent”

For my next project—a legal contract search system—I’ll use 1536 dimensions without hesitation.

9. Summary

Embedding dimension is the length of your vector representation. The choice balances semantic richness against computational cost.

Key takeaways:

Match dimension to task complexity, not just “what’s popular”
384 dimensions is often sufficient for simple semantic search
768 dimensions is a good default for most applications
1536+ dimensions for nuanced content where precision matters
You can reduce dimensions with PCA if needed
Never mix dimensions in the same vector collection

How I choose now:

Simple keyword-ish search?        -> 384 dims
Balanced general purpose?        -> 768 dims
Complex RAG or legal/medical?     -> 1536 dims
Edge/mobile deployment?          -> 256 dims (with PCA)

My vector database bill is now 75% smaller, and search quality hasn’t suffered. Sometimes the answer isn’t “bigger is better”—it’s “right-sized is better.”

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!