Skip to content

How to Use Milvus Vector Database with Python for Semantic Search

I tried setting up semantic search for my document search application and hit a wall with vector database configuration. Every tutorial seemed to require Docker, Kubernetes, or some complex deployment setup. I just wanted to store some vectors and search them with cosine similarity.

After installing PyMilvus, I got this error when trying to create a connection:

milvus_connection_error.py
from pymilvus import MilvusClient
client = MilvusClient("http://localhost:19530")
error.txt
ConnectionError: Failed to connect to Milvus server at http://localhost:19530

The server wasn’t running because I hadn’t set it up. The documentation assumed I was deploying a full Milvus cluster, but I just needed something for local development.

Then I found Milvus Lite - it’s embedded directly in the Python client and stores data in a local file. No Docker, no server setup, just import and go.

Milvus Lite: Zero-Config Vector Database

The solution was simpler than I expected:

milvus_setup.py
from pymilvus import MilvusClient
# This creates a local file database - no server needed
client = MilvusClient("./documents.db")
# Drop existing collection if it exists
if client.has_collection("documents"):
client.drop_collection("documents")
# Create collection with 384 dimensions (sentence-transformers)
client.create_collection(
collection_name="documents",
dimension=384,
metric_type="COSINE"
)

That’s it. The database file is created locally and persists between runs. No connection strings, no authentication, no infrastructure setup.

The key insight is that Milvus Lite uses the exact same API as the full Milvus Server. You develop locally with a file, then change one line to connect to a production server:

connection_strings.py
# Development
client = MilvusClient("./documents.db")
# Production - just change the connection string
client = MilvusClient("http://localhost:19530")
# Or Zilliz Cloud (managed)
client = MilvusClient(uri="https://xxx.api.gcp-us-west1.zillizcloud.com", token="...")

The Complete Semantic Search Flow

Here’s how semantic search works with Milvus:

semantic_search_flow.md
User Query
[Embed Query Vector] - Turn text into 384-dimensional numbers
[Milvus Vector DB] - Compare with stored vectors
[Cosine Similarity Search] - Find nearest neighbors
Top-K Results + Scores - Most similar documents

The vector embeddings capture semantic meaning. “configure Redis” and “Redis setup” end up close to each other in vector space even though they use different words.

Full Working Example

Here’s a complete example showing the full pipeline:

search_example.py
from pymilvus import MilvusClient
from sentence_transformers import SentenceTransformer
import numpy as np
# Initialize Milvus Lite
client = MilvusClient("./semantic_search.db")
# Setup collection
if client.has_collection("documents"):
client.drop_collection("documents")
client.create_collection(
collection_name="documents",
dimension=384, # Matches sentence-transformers/all-MiniLM-L6-v2
metric_type="COSINE"
)
# Sample documents
documents = [
{"id": "doc_1", "content": "How to configure Redis caching"},
{"id": "doc_2", "content": "Redis setup and installation guide"},
{"id": "doc_3", "content": "Python caching strategies"},
]
# Create embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
for doc in documents:
doc["vector"] = model.encode(doc["content"]).tolist()
# Insert into Milvus
client.insert("documents", documents)
# Load collection (required for search)
client.load_collection("documents")
# Search
query = "Redis configuration"
query_vector = model.encode(query).tolist()
results = client.search(
collection_name="documents",
data=[query_vector],
limit=3,
output_fields=["content"]
)
# Display results
for hits in results:
for hit in hits:
print(f"Score: {hit['distance']:.4f}")
print(f"Content: {hit['entity']['content']}")
print()

Output:

output.txt
Score: 0.8921
Content: How to configure Redis caching
Score: 0.8145
Content: Redis setup and installation guide
Score: 0.4523
Content: Python caching strategies

Note that Milvus returns distance (lower = more similar for COSINE). For cosine similarity with normalized vectors, you can convert to similarity: similarity = 1 - distance.

Understanding Metric Types

This was a common pitfall I encountered - choosing the wrong metric type:

Metric TypeBest ForRangeInterpretation
COSINENormalized vectors0 to 2Lower = more similar
L2Unnormalized vectors0 to infinityLower = more similar
IPNormalized vectors, want dot product-1 to 1Higher = more similar

If your embeddings are normalized (unit length), use COSINE. If they’re not normalized, use L2. IP with normalized vectors equals cosine similarity.

Most modern embedding models (OpenAI’s text-embedding-3-small, sentence-transformers) produce normalized vectors, so COSINE is usually the right choice.

Common Mistakes I Made

Mistake 1: Wrong Embedding Dimension

I created a collection with 384 dimensions but was using OpenAI’s embeddings which are 1536 dimensions:

dimension_mismatch.py
# WRONG
client.create_collection("docs", dimension=384) # For sentence-transformers
embedding = openai.embeddings.create(model="text-embedding-3-small", input="text") # Returns 1536 dims
client.insert("docs", [{"vector": embedding.data[0].embedding}]) # Dimension mismatch error
# CORRECT
client.create_collection("docs", dimension=1536) # Match your embedding model

Mistake 2: Forgetting to Load Collection

Search returns empty results if you don’t load the collection first:

collection_not_loaded.py
# WRONG - no results
client.insert("docs", documents)
results = client.search("docs", data=[query_vector])
# CORRECT
client.insert("docs", documents)
client.load_collection("docs") # Required before search
results = client.search("docs", data=[query_vector])

Mistake 3: Manual IDs with auto_id=True

auto_id_error.py
# WRONG - Error
client.create_collection("docs", dimension=384, auto_id=True)
client.insert("docs", [{"id": "doc_1", "vector": [...]}]) # Can't specify ID with auto_id
# CORRECT
client.create_collection("docs", dimension=384, auto_id=False)
client.insert("docs", [{"id": "doc_1", "vector": [...]}])

Custom Schema for Real-World Data

For production, you’ll want a custom schema with multiple fields:

custom_schema.py
from pymilvus import MilvusClient, DataType
client = MilvusClient("./products.db")
# Define schema
schema = client.create_schema(
auto_id=False,
enable_dynamic_field=True,
description="Product catalog"
)
# Add fields
schema.add_field("product_id", DataType.VARCHAR, is_primary=True, max_length=100)
schema.add_field("title", DataType.VARCHAR, max_length=500)
schema.add_field("price", DataType.DOUBLE)
schema.add_field("category", DataType.VARCHAR, max_length=100)
schema.add_field("embedding", DataType.FLOAT_VECTOR, dim=384)
# Create index
index_params = client.prepare_index_params()
index_params.add_index(
field_name="embedding",
index_type="AUTOINDEX",
metric_type="COSINE"
)
# Create collection
client.create_collection(
collection_name="products",
schema=schema,
index_params=index_params
)
# Insert data
products = [
{
"product_id": "p1",
"title": "Wireless Headphones",
"price": 149.99,
"category": "Electronics",
"embedding": model.encode("Wireless Headphones").tolist()
}
]
client.insert("products", products)
# Search with filter
client.load_collection("products")
results = client.search(
collection_name="products",
data=[model.encode("headphones").tolist()],
filter="price < 200 and category == 'Electronics'",
limit=5,
output_fields=["title", "price"]
)

The filter parameter lets you combine semantic search with structured queries - find semantically similar products that also match your price and category criteria.

Performance Considerations

Milvus Lite works great for development and small datasets (< 100K vectors). For larger scale, consider:

ModeBest ForStorageLimits
Milvus LiteDevelopment, personal projectsLocal file< 1M vectors
Milvus ServerTeam environments, productionOn-premisesScales to billions
Zilliz CloudManaged productionCloudAuto-scaling

The code is identical across all three modes - you only change the connection string.

Quick Reference: Essential Methods

essential_methods.py
# Connection
client = MilvusClient("./local.db") # File-based
client = MilvusClient("http://localhost:19530") # Server
client = MilvusClient(uri="...", token="...") # Cloud
# Collection Management
client.create_collection(name, dimension, metric_type="COSINE")
client.has_collection(name)
client.drop_collection(name)
client.list_collections()
# Data Operations
client.insert(collection, data)
client.delete(collection, ids)
client.update(collection, data)
client.get(collection, ids, output_fields)
# Search
client.load_collection(collection) # Required first!
client.search(collection, data, limit, filter, output_fields)

Why 384 dimensions? SentenceTransformer’s all-MiniLM-L6-v2 uses 384 dimensions as a balance between accuracy and speed. More dimensions capture more nuance but increase storage and computation. OpenAI’s text-embedding-3-small uses 1536 dimensions for higher quality.

Hybrid search: Milvus supports combining multiple vector fields (e.g., title embedding + description embedding) using rankers like RRF (Reciprocal Rank Fusion) to merge results from different searches.

Index types: AUTOINDEX lets Milvus choose the optimal index automatically. FLAT gives exact search (100% recall) but is slower for large datasets. HNSW is faster for approximate search on large datasets.

Final Thoughts

Milvus Lite removes the friction from getting started with vector search. Install PyMilvus, connect to a local file, and you have a fully functional vector database. The same code scales to production - just change the connection string.

The key takeaways: use COSINE for normalized embeddings, always load_collection() before search, match your embedding dimensions, and leverage filters for hybrid semantic + structured queries.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments