How to Use Milvus Vector Database with Python for Semantic Search

Mar 3, 2026

I tried setting up semantic search for my document search application and hit a wall with vector database configuration. Every tutorial seemed to require Docker, Kubernetes, or some complex deployment setup. I just wanted to store some vectors and search them with cosine similarity.

After installing PyMilvus, I got this error when trying to create a connection:

from pymilvus import MilvusClient

client = MilvusClient("http://localhost:19530")

ConnectionError: Failed to connect to Milvus server at http://localhost:19530

The server wasn’t running because I hadn’t set it up. The documentation assumed I was deploying a full Milvus cluster, but I just needed something for local development.

Then I found Milvus Lite - it’s embedded directly in the Python client and stores data in a local file. No Docker, no server setup, just import and go.

Milvus Lite: Zero-Config Vector Database

The solution was simpler than I expected:

from pymilvus import MilvusClient

# This creates a local file database - no server needed
client = MilvusClient("./documents.db")

# Drop existing collection if it exists
if client.has_collection("documents"):
    client.drop_collection("documents")

# Create collection with 384 dimensions (sentence-transformers)
client.create_collection(
    collection_name="documents",
    dimension=384,
    metric_type="COSINE"
)

That’s it. The database file is created locally and persists between runs. No connection strings, no authentication, no infrastructure setup.

The key insight is that Milvus Lite uses the exact same API as the full Milvus Server. You develop locally with a file, then change one line to connect to a production server:

# Development
client = MilvusClient("./documents.db")

# Production - just change the connection string
client = MilvusClient("http://localhost:19530")
# Or Zilliz Cloud (managed)
client = MilvusClient(uri="https://xxx.api.gcp-us-west1.zillizcloud.com", token="...")

The Complete Semantic Search Flow

Here’s how semantic search works with Milvus:

User Query
    ↓
[Embed Query Vector] - Turn text into 384-dimensional numbers
    ↓
[Milvus Vector DB] - Compare with stored vectors
    ↓
[Cosine Similarity Search] - Find nearest neighbors
    ↓
Top-K Results + Scores - Most similar documents

The vector embeddings capture semantic meaning. “configure Redis” and “Redis setup” end up close to each other in vector space even though they use different words.

Full Working Example

Here’s a complete example showing the full pipeline:

from pymilvus import MilvusClient
from sentence_transformers import SentenceTransformer
import numpy as np

# Initialize Milvus Lite
client = MilvusClient("./semantic_search.db")

# Setup collection
if client.has_collection("documents"):
    client.drop_collection("documents")

client.create_collection(
    collection_name="documents",
    dimension=384,  # Matches sentence-transformers/all-MiniLM-L6-v2
    metric_type="COSINE"
)

# Sample documents
documents = [
    {"id": "doc_1", "content": "How to configure Redis caching"},
    {"id": "doc_2", "content": "Redis setup and installation guide"},
    {"id": "doc_3", "content": "Python caching strategies"},
]

# Create embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
for doc in documents:
    doc["vector"] = model.encode(doc["content"]).tolist()

# Insert into Milvus
client.insert("documents", documents)

# Load collection (required for search)
client.load_collection("documents")

# Search
query = "Redis configuration"
query_vector = model.encode(query).tolist()

results = client.search(
    collection_name="documents",
    data=[query_vector],
    limit=3,
    output_fields=["content"]
)

# Display results
for hits in results:
    for hit in hits:
        print(f"Score: {hit['distance']:.4f}")
        print(f"Content: {hit['entity']['content']}")
        print()

Output:

Score: 0.8921
Content: How to configure Redis caching

Score: 0.8145
Content: Redis setup and installation guide

Score: 0.4523
Content: Python caching strategies

Note that Milvus returns distance (lower = more similar for COSINE). For cosine similarity with normalized vectors, you can convert to similarity: similarity = 1 - distance.

Understanding Metric Types

This was a common pitfall I encountered - choosing the wrong metric type:

Metric Type	Best For	Range	Interpretation
COSINE	Normalized vectors	0 to 2	Lower = more similar
L2	Unnormalized vectors	0 to infinity	Lower = more similar
IP	Normalized vectors, want dot product	-1 to 1	Higher = more similar

If your embeddings are normalized (unit length), use COSINE. If they’re not normalized, use L2. IP with normalized vectors equals cosine similarity.

Most modern embedding models (OpenAI’s text-embedding-3-small, sentence-transformers) produce normalized vectors, so COSINE is usually the right choice.

Common Mistakes I Made

Mistake 1: Wrong Embedding Dimension

I created a collection with 384 dimensions but was using OpenAI’s embeddings which are 1536 dimensions:

# WRONG
client.create_collection("docs", dimension=384)  # For sentence-transformers
embedding = openai.embeddings.create(model="text-embedding-3-small", input="text")  # Returns 1536 dims
client.insert("docs", [{"vector": embedding.data[0].embedding}])  # Dimension mismatch error

# CORRECT
client.create_collection("docs", dimension=1536)  # Match your embedding model

Mistake 2: Forgetting to Load Collection

Search returns empty results if you don’t load the collection first:

# WRONG - no results
client.insert("docs", documents)
results = client.search("docs", data=[query_vector])

# CORRECT
client.insert("docs", documents)
client.load_collection("docs")  # Required before search
results = client.search("docs", data=[query_vector])

Mistake 3: Manual IDs with auto_id=True

# WRONG - Error
client.create_collection("docs", dimension=384, auto_id=True)
client.insert("docs", [{"id": "doc_1", "vector": [...]}])  # Can't specify ID with auto_id

# CORRECT
client.create_collection("docs", dimension=384, auto_id=False)
client.insert("docs", [{"id": "doc_1", "vector": [...]}])

Custom Schema for Real-World Data

For production, you’ll want a custom schema with multiple fields:

from pymilvus import MilvusClient, DataType

client = MilvusClient("./products.db")

# Define schema
schema = client.create_schema(
    auto_id=False,
    enable_dynamic_field=True,
    description="Product catalog"
)

# Add fields
schema.add_field("product_id", DataType.VARCHAR, is_primary=True, max_length=100)
schema.add_field("title", DataType.VARCHAR, max_length=500)
schema.add_field("price", DataType.DOUBLE)
schema.add_field("category", DataType.VARCHAR, max_length=100)
schema.add_field("embedding", DataType.FLOAT_VECTOR, dim=384)

# Create index
index_params = client.prepare_index_params()
index_params.add_index(
    field_name="embedding",
    index_type="AUTOINDEX",
    metric_type="COSINE"
)

# Create collection
client.create_collection(
    collection_name="products",
    schema=schema,
    index_params=index_params
)

# Insert data
products = [
    {
        "product_id": "p1",
        "title": "Wireless Headphones",
        "price": 149.99,
        "category": "Electronics",
        "embedding": model.encode("Wireless Headphones").tolist()
    }
]
client.insert("products", products)

# Search with filter
client.load_collection("products")
results = client.search(
    collection_name="products",
    data=[model.encode("headphones").tolist()],
    filter="price < 200 and category == 'Electronics'",
    limit=5,
    output_fields=["title", "price"]
)

The filter parameter lets you combine semantic search with structured queries - find semantically similar products that also match your price and category criteria.

Performance Considerations

Milvus Lite works great for development and small datasets (< 100K vectors). For larger scale, consider:

Mode	Best For	Storage	Limits
Milvus Lite	Development, personal projects	Local file	< 1M vectors
Milvus Server	Team environments, production	On-premises	Scales to billions
Zilliz Cloud	Managed production	Cloud	Auto-scaling

The code is identical across all three modes - you only change the connection string.

Quick Reference: Essential Methods

# Connection
client = MilvusClient("./local.db")           # File-based
client = MilvusClient("http://localhost:19530")  # Server
client = MilvusClient(uri="...", token="...")   # Cloud

# Collection Management
client.create_collection(name, dimension, metric_type="COSINE")
client.has_collection(name)
client.drop_collection(name)
client.list_collections()

# Data Operations
client.insert(collection, data)
client.delete(collection, ids)
client.update(collection, data)
client.get(collection, ids, output_fields)

# Search
client.load_collection(collection)  # Required first!
client.search(collection, data, limit, filter, output_fields)

Why 384 dimensions? SentenceTransformer’s all-MiniLM-L6-v2 uses 384 dimensions as a balance between accuracy and speed. More dimensions capture more nuance but increase storage and computation. OpenAI’s text-embedding-3-small uses 1536 dimensions for higher quality.

Hybrid search: Milvus supports combining multiple vector fields (e.g., title embedding + description embedding) using rankers like RRF (Reciprocal Rank Fusion) to merge results from different searches.

Index types: AUTOINDEX lets Milvus choose the optimal index automatically. FLAT gives exact search (100% recall) but is slower for large datasets. HNSW is faster for approximate search on large datasets.

Final Thoughts

Milvus Lite removes the friction from getting started with vector search. Install PyMilvus, connect to a local file, and you have a fully functional vector database. The same code scales to production - just change the connection string.

The key takeaways: use COSINE for normalized embeddings, always load_collection() before search, match your embedding dimensions, and leverage filters for hybrid semantic + structured queries.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!