How to Use PostgreSQL Extensions (pgvector, PostGIS) with MikroORM
I was building a RAG (Retrieval Augmented Generation) application and wanted to use pgvector for semantic search with MikroORM. I checked the documentation, searched GitHub issues, and even asked on Reddit. The answer was always the same: MikroORM doesn’t have built-in support for PostgreSQL extensions.
That’s when I realized I needed a different approach. Here’s what I learned about integrating pgvector, PostGIS, and other PostgreSQL extensions with MikroORM.
The Problem
PostgreSQL extensions like pgvector, PostGIS, and pg_search provide powerful capabilities that have no equivalent in MikroORM’s ORM abstraction:
- pgvector: Vector similarity search for AI/ML applications
- PostGIS: Geospatial queries for location-based features
- pg_search: Advanced full-text search with ranking
When I tried to use these extensions, I hit three walls:
- No native types: MikroORM doesn’t provide TypeScript types for
vector,geometry, ortsvector - Missing operators: Extension-specific operators like
<=>(cosine distance) aren’t available in QueryBuilder - Sparse documentation: Community knowledge is thin for integration patterns
First Attempt: Using Unknown Type
I started by trying to define a vector column with MikroORM’s unknown type:
import { Entity, PrimaryKey, Property } from '@mikro-orm/core'
@Entity()export class Document { @PrimaryKey() id!: number
@Property() content!: string
@Property({ type: 'unknown', columnType: 'vector(1536)' }) embedding!: number[]}This worked for schema generation, but I couldn’t query with it. The <=> operator for cosine distance doesn’t exist in MikroORM’s QueryBuilder.
// This doesn't work - MikroORM doesn't know about vector operatorsconst similar = await em.find(Document, { embedding: { $distance: { $lt: 0.2 } } // Nope!})I was stuck. The entity could store vectors, but I couldn’t use them.
The Solution: Kysely Integration
MikroORM 7 ships with Kysely as the query runner. This was the key I was missing. I could use Kysely for raw queries while keeping MikroORM for standard CRUD operations.
Setting Up Raw Queries
import { EntityManager } from '@mikro-orm/postgresql'import { Document } from '../entities/Document'
export class VectorRepository { constructor(private readonly em: EntityManager) {}
async findSimilar(embedding: number[], limit = 5): Promise<Array<Document & { distance: number }>> { const knex = this.em.getConnection().getKnex()
const results = await knex.raw(` SELECT id, content, embedding, embedding <=> ?::vector as distance FROM documents ORDER BY distance LIMIT ? `, [`[${embedding.join(',')}]`, limit])
return results.rows }}This worked. I could now perform vector similarity searches with the <=> operator while still using MikroORM for entity management.
Why This Approach Works
The hybrid pattern gives me the best of both worlds:
- MikroORM handles: Entity relationships, migrations, change tracking, standard CRUD
- Kysely handles: Extension-specific queries, custom operators, raw SQL when needed
No need to abandon the ORM entirely.
Defining Custom Types
To maintain type safety, I created custom TypeScript types for extension columns:
// pgvector type - adjust dimensions based on your embedding modelexport type Vector = number[]// PostGIS geometry typeexport interface Geometry { type: 'Point' | 'Polygon' | 'LineString' coordinates: number[] | number[][] srid?: number}Then in my entities:
import { Entity, PrimaryKey, Property } from '@mikro-orm/core'import { Vector } from '../types/vector'
@Entity()export class Document { @PrimaryKey() id!: number
@Property() content!: string
@Property({ type: 'unknown', columnType: 'vector(1536)' }) embedding!: Vector
@Property({ type: 'unknown', columnType: 'tsvector' }) searchVector!: unknown}import { Entity, PrimaryKey, Property } from '@mikro-orm/core'import { Geometry } from '../types/geometry'
@Entity()export class Location { @PrimaryKey() id!: number
@Property() name!: string
@Property({ type: 'unknown', columnType: 'geometry(Point, 4326)' }) coordinates!: Geometry}MikroORM won’t validate these types at runtime, but TypeScript will catch type mismatches during development.
Real-World Examples
pgvector: Semantic Search
Here’s a complete repository for vector similarity search:
import { EntityManager } from '@mikro-orm/postgresql'import { Document } from '../entities/Document'
export class VectorRepository { constructor(private readonly em: EntityManager) {}
async insertWithEmbedding(content: string, embedding: number[]): Promise<Document> { const doc = this.em.create(Document, { content, embedding }) await this.em.persistAndFlush(doc) return doc }
async findSimilar(embedding: number[], limit = 5): Promise<Array<Document & { distance: number }>> { const knex = this.em.getConnection().getKnex()
const results = await knex.raw(` SELECT id, content, embedding, embedding <=> ?::vector as distance FROM documents ORDER BY distance LIMIT ? `, [`[${embedding.join(',')}]`, limit])
return results.rows }
async searchHybrid(query: string, embedding: number[], limit = 10) { const knex = this.em.getConnection().getKnex()
// Combine semantic (vector) and keyword (full-text) search const results = await knex.raw(` SELECT id, content, embedding <=> ?::vector as semantic_distance, ts_rank(search_vector, plainto_tsquery(?)) as keyword_rank FROM documents WHERE search_vector @@ plainto_tsquery(?) ORDER BY (semantic_distance * 0.7) + ((1 - keyword_rank) * 0.3) LIMIT ? `, [`[${embedding.join(',')}]`, query, query, limit])
return results.rows }}The searchHybrid method combines vector similarity with PostgreSQL’s built-in full-text search. This gives better results than either approach alone.
PostGIS: Location Queries
For geospatial data:
import { EntityManager } from '@mikro-orm/postgresql'import { Location } from '../entities/Location'
export class LocationRepository { constructor(private readonly em: EntityManager) {}
async findNearby(lat: number, lng: number, radiusKm: number): Promise<Location[]> { const knex = this.em.getConnection().getKnex()
const results = await knex.raw(` SELECT id, name, ST_Distance(location, ST_MakePoint(?, ?)) as distance FROM locations WHERE ST_DWithin(location, ST_MakePoint(?, ?)::geography, ?) ORDER BY distance `, [lng, lat, lng, lat, radiusKm * 1000])
return results.rows }}pg_search: Full-Text Search
For advanced text search with ranking:
import { EntityManager } from '@mikro-orm/postgresql'
export class SearchRepository { constructor(private readonly em: EntityManager) {}
async searchDocuments(searchTerm: string) { const knex = this.em.getConnection().getKnex()
const results = await knex.raw(` SELECT id, title, ts_rank(search_vector, plainto_tsquery(?)) as rank FROM documents WHERE search_vector @@ plainto_tsquery(?) ORDER BY rank DESC `, [searchTerm, searchTerm])
return results.rows }}Hybrid ORM Pattern
The best practice I found is using MikroORM for CRUD operations and raw queries for extension-specific operations:
async function updateDocumentWithSearch( em: EntityManager, id: number, content: string) { // 1. Use MikroORM for the update const doc = await em.findOneOrFail(Document, id) doc.content = content await em.flush()
// 2. Update search vector via raw query (alternative to triggers) await em.getConnection().getKnex().raw(` UPDATE documents SET search_vector = to_tsvector('english', ?) WHERE id = ? `, [content, id])}This keeps entity management clean while handling extension-specific operations separately.
Using View Entities
For complex extension queries, I found database views helpful:
-- Create a view for vector similarityCREATE VIEW document_similarities ASSELECT d1.id as source_id, d2.id as target_id, d1.embedding <=> d2.embedding as distanceFROM documents d1CROSS JOIN documents d2WHERE d1.id != d2.id;Then define a MikroORM entity for the view:
import { Entity, PrimaryKey, Property } from '@mikro-orm/core'
@Entity({ expression: 'document_similarities' })export class DocumentSimilarity { @PrimaryKey() sourceId!: number
@PrimaryKey() targetId!: number
@Property() distance!: number}
// Query the view through MikroORMconst similarDocs = await em.find(DocumentSimilarity, { sourceId: documentId, distance: { $lt: 0.2 }})This gives type-safe access to extension-generated data.
Migrations for Extensions
Don’t forget to enable extensions in your migrations:
import { Migration } from '@mikro-orm/migrations'
export class ExtensionSetup extends Migration { async up(): Promise<void> { // Enable extensions this.addSql('CREATE EXTENSION IF NOT EXISTS vector;') this.addSql('CREATE EXTENSION IF NOT EXISTS postgis;')
// Create extension-specific indexes this.addSql(` CREATE INDEX documents_embedding_idx ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); `)
this.addSql(` CREATE INDEX locations_coords_idx ON locations USING GIST (coordinates); `)
this.addSql(` CREATE INDEX documents_search_idx ON documents USING GIN (search_vector); `) }
async down(): Promise<void> { this.addSql('DROP INDEX IF EXISTS documents_embedding_idx;') this.addSql('DROP INDEX IF EXISTS locations_coords_idx;') this.addSql('DROP INDEX IF EXISTS documents_search_idx;') }}I learned this the hard way. I deployed to production without the indexes and queries were 100x slower. Extension-specific indexes (HNSW, IVFFlat, GIST, GIN) are critical for performance.
Common Mistakes to Avoid
1. Ignoring extension indexes
Without indexes, vector similarity searches do full table scans. For a table with 100,000 documents, my query went from 2 seconds to 5ms after adding an IVFFlat index.
2. Type mismatches
Using wrong vector dimensions causes runtime errors. If your embedding model produces 1536 dimensions, your column must be vector(1536).
3. N+1 queries with extensions
Fetching extension data in loops is slow. Use batch queries:
// BAD: N+1 queriesfor (const doc of documents) { const similar = await vectorRepo.findSimilar(doc.embedding)}
// GOOD: Batch queryconst embeddings = documents.map(d => d.embedding)const allSimilar = await knex.raw(` SELECT * FROM documents WHERE embedding <=> ANY(?)`, [embeddings])4. Forgetting migrations in production
I’ve seen production deployments fail because extensions weren’t enabled. Always include extension setup in migrations, not manual database commands.
5. Abandoning type safety
Using any everywhere defeats the purpose of TypeScript. Define proper types for extension columns.
Why This Matters
PostgreSQL extensions unlock capabilities that pure ORMs can’t touch:
- AI Applications: pgvector enables RAG pipelines and semantic search
- Location Services: PostGIS powers mapping and logistics applications
- Search Quality: pg_search provides sophisticated ranking beyond LIKE queries
The hybrid approach I’ve shown keeps MikroORM’s benefits (migrations, relationships, change tracking) while accessing PostgreSQL’s full power.
When to Use This Approach
This pattern makes sense when:
- You need vector similarity search for AI/ML features
- Your application has geospatial requirements
- Full-text search with ranking is important
- You’re invested in MikroORM but need PostgreSQL-specific features
It’s overkill for simple applications. But as your requirements grow, knowing how to integrate extensions while keeping your ORM is a valuable tool.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 MikroORM Query Builder Documentation
- 👨💻 pgvector GitHub Repository
- 👨💻 PostGIS Documentation
- 👨💻 PostgreSQL Full-Text Search
- 👨💻 MikroORM Migrations Guide
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments