Introduction: The “Vector DB” Tax #
In the rush to adopt Generative AI, engineering teams often fall into the trap of “Infrastructure Sprawl.” You need to build a RAG (Retrieval Augmented Generation) system. You read the tutorials. You sign up for a specialized Vector Database SaaS (like Pinecone, Weaviate, or Milvus).
Suddenly, you have a new invoice, a new distinct system to monitor, new security perimeters to define, and—critically—data consistency challenges between your transactional data and your vector data.
There is a better way.
For 95% of enterprise use cases, you do not need a specialized vector database. You need a database that supports vector operations. Enter PostgreSQL with pgvector.
By combining the power of the Spring ecosystem with the extensibility of Postgres, Spring AI provides a seamless abstraction to build RAG applications that are both low-cost and high-performance.
In this guide, we will build a production-ready RAG solution using spring ai pgvector. We will cover everything from the architectural trade-offs and Docker setup to advanced HNSW indexing and metadata filtering.
Why PostgreSQL + pgvector? #
Before writing code, we must justify the architectural decision. Why choose pgvector over a purpose-built engine?
1. Cost Consolidation (The FinOps Angle) #
Most enterprises already run managed PostgreSQL (AWS RDS, Azure Database for PostgreSQL, Google Cloud SQL). Enabling pgvector is often just a plugin installation or a flag toggle. You utilize existing compute and storage reserves, eliminating the minimum monthly commit of a new SaaS vendor.
2. ACID Compliance & Data Locality #
This is the “killer feature.” In a standalone vector DB, if you update a user’s profile in Postgres, you must asynchronously update their embeddings in the vector DB. This eventual consistency leads to stale search results.
With pgvector, your vectors live alongside your relational data. You can update a row and its vector embedding in a single atomic transaction.
3. Operational Simplicity #
One backup strategy. One connection pool. One set of firewall rules. Your DevOps team will thank you.
Architecture Overview #
We will build a standard RAG pipeline:
- ETL Phase: Read documents -> Split into Chunks -> Generate Embeddings (via OpenAI or Ollama) -> Store in Postgres.
- Retrieval Phase: User Query -> Generate Query Embedding -> Vector Similarity Search (Postgres) -> Return Context.
- Generation Phase: Context + Prompt -> LLM -> Final Response.
We will rely heavily on the Spring AI abstraction, specifically the PgvectorVectorStore.
Step 1: Infrastructure Setup #
First, we need a PostgreSQL instance with the pgvector extension installed. The easiest way to achieve this locally is via Docker Compose.
Create a compose.yaml file:
services:
postgres:
image: pgvector/pgvector:pg16
container_name: spring-ai-pgvector
environment:
- POSTGRES_USER=springai
- POSTGRES_PASSWORD=secret
- POSTGRES_DB=vectordb
ports:
- "5432:5432"
volumes:
- ./data/postgres:/var/lib/postgresql/data
restart: always
pgadmin:
image: dpage/pgadmin4
container_name: pgadmin
environment:
- [email protected]
- PGADMIN_DEFAULT_PASSWORD=admin
ports:
- "5050:80"
depends_on:
- postgres
Run it:
docker-compose up -d
Verifying the Extension #
Log into the database and ensure the extension works (Spring AI usually handles this, but it’s good to know):
CREATE EXTENSION IF NOT EXISTS vector;
Step 2: Spring Boot Project Configuration #
We will use Spring Boot 3.2+ and the latest Spring AI milestone (or stable version depending on when you read this).
Maven Dependencies (pom.xml)
#
You need the spring-ai-pgvector-store-spring-boot-starter and an embedding client (like OpenAI).
<dependencies>
<!-- Spring Web -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Spring Data JDBC (Required for Pgvector) -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<!-- PostgreSQL Driver -->
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<scope>runtime</scope>
</dependency>
<!-- Spring AI OpenAI (for Embeddings & Chat) -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<!-- Spring AI PGVector Store -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>1.0.0-SNAPSHOT</version> <!-- Check for latest version -->
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
Application Configuration (application.yml)
#
Spring AI creates a schema automatically. We need to configure the database connection, the OpenAI API key, and the vector store specifics.
spring:
application:
name: spring-ai-pgvector-demo
datasource:
url: jdbc:postgresql://localhost:5432/vectordb
username: springai
password: secret
driver-class-name: org.postgresql.Driver
ai:
openai:
api-key: ${OPENAI_API_KEY}
embedding:
options:
model: text-embedding-3-small # Low cost, high performance
vectorstore:
pgvector:
# The dimension must match the embedding model
# text-embedding-3-small default is 1536
dimensions: 1536
index-type: HNSW # Crucial for performance!
distance-type: COSINE_DISTANCE
initialize-schema: true
Key Configuration Note:
- Dimensions: This MUST match your embedding model. OpenAI’s
text-embedding-3-smalloutputs 1536 dimensions. If you usetext-embedding-3-large(3072) or an Ollama model (e.g., Llama3 is 4096), you must adjust this. - Index Type: We selected
HNSW. We will discuss why this is vital later in the article.
Step 3: Implementing the ETL Pipeline #
Before we can search, we need data. Let’s create a service to load documents into Postgres.
Spring AI provides the Document class and the VectorStore interface.
package com.springdevpro.rag.service;
import org.springframework.ai.document.Document;
import org.springframework.ai.reader.JsonReader;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import java.util.List;
import java.util.logging.Logger;
@Service
public class IngestionService {
private static final Logger LOGGER = Logger.getLogger(IngestionService.class.getName());
private final VectorStore vectorStore;
public IngestionService(VectorStore vectorStore) {
this.vectorStore = vectorStore;
}
/**
* Loads a JSON resource, splits it into chunks, and saves to pgvector.
*/
@Transactional
public void ingestData(Resource jsonResource) {
LOGGER.info("Starting data ingestion...");
// 1. Read Data
JsonReader jsonReader = new JsonReader(jsonResource, "content", "meta_author", "meta_date");
List<Document> rawDocuments = jsonReader.get();
// 2. Split Data (Chunking)
// TokenTextSplitter creates chunks based on token count, ideal for LLM context windows.
TokenTextSplitter textSplitter = new TokenTextSplitter();
List<Document> splitDocuments = textSplitter.apply(rawDocuments);
// 3. Store (Generate Embeddings + Save to DB)
// This single line calls the EmbeddingModel API and performs the INSERT into Postgres
vectorStore.add(splitDocuments);
LOGGER.info("Ingested " + splitDocuments.size() + " document chunks into pgvector.");
}
}
Understanding Chunking #
Why did we use TokenTextSplitter?
If you insert a 50-page PDF as a single vector, the specific details get “diluted” in the mathematical representation. Furthermore, when you retrieve that document, it won’t fit in the LLM’s context window. Splitting documents into smaller, semantic chunks (e.g., 400-800 tokens) is the single most important factor for RAG accuracy.
Step 4: Intelligent Retrieval (The “R” in RAG) #
Now, let’s build the search functionality. We aren’t just doing a SELECT *. We are doing a cosine similarity search.
package com.springdevpro.rag.service;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;
import java.util.List;
@Service
public class RetrievalService {
private final VectorStore vectorStore;
public RetrievalService(VectorStore vectorStore) {
this.vectorStore = vectorStore;
}
public List<Document> similaritySearch(String query) {
// Basic search with top 5 results
return vectorStore.similaritySearch(query);
}
public List<Document> advancedSearch(String query) {
// Advanced search configuration
SearchRequest request = SearchRequest.query(query)
.withTopK(3) // Only top 3 most relevant
.withSimilarityThreshold(0.75); // Filter out low-relevance noise
return vectorStore.similaritySearch(request);
}
}
The Importance of SimilarityThreshold
#
In a production system, you don’t always want to return an answer. If the user asks “How do I bake a cake?” to your “IT Support Bot,” the database might find the “closest” document is about “baking” server images. By setting a threshold (e.g., 0.75), you ensure that if no relevant documents exist, you return an empty list, allowing the LLM to say “I don’t know” rather than hallucinating.
Step 5: Advanced - Metadata Filtering #
This is where pgvector shines compared to simple array storage. You can mix vector search with structured SQL filtering.
Imagine you want to search for “deployment errors” but only in documents written by “Alice” in “2024”.
With Spring AI’s portable filter expression language:
public List<Document> searchWithFilters(String query, String author, String year) {
// Spring AI converts this portable expression into SQL WHERE clauses
// specifically adapted for JSONB columns in Postgres
String filterExpression = "meta_author == '" + author + "' && meta_date >= '" + year + "-01-01'";
SearchRequest request = SearchRequest.query(query)
.withTopK(5)
.withFilterExpression(filterExpression);
return vectorStore.similaritySearch(request);
}
Under the hood, spring-ai-pgvector translates this into a PostgreSQL query utilizing JSONB operators (->>) to filter the metadata column before (or during) the vector scan.
Step 6: Performance Tuning (IVFFlat vs. HNSW) #
If you ignore this section, your application will work fine with 1,000 documents but will crash your database with 10 million.
pgvector supports two main indexing types:
-
IVFFlat (Inverted File Flat):
- Pros: Builds faster, uses less memory.
- Cons: Slower search, lower recall (accuracy). Requires the index to be rebuilt if data distribution changes significantly.
- Use case: Small datasets (< 100k vectors).
-
HNSW (Hierarchical Navigable Small Worlds):
- Pros: Extremely fast search (approximate), handles updates better, high recall.
- Cons: Uses more memory, takes longer to build initially.
- Use case: Production RAG systems.
Spring AI allows you to configure this in application.yml:
spring:
ai:
vectorstore:
pgvector:
index-type: HNSW
m: 16 # Max connections per layer (Memory vs Speed trade-off)
ef-construction: 64 # Size of dynamic list during construction
Recommendation: Always start with HNSW for production RAG. The latency difference is milliseconds vs. seconds at scale.
Step 7: The Full RAG Controller #
Let’s tie it all together with the ChatClient (formerly AiClient) to generate an answer.
package com.springdevpro.rag.controller;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.web.bind.annotation.*;
@RestController
@RequestMapping("/api/rag")
public class RagController {
private final ChatClient chatClient;
private final VectorStore vectorStore;
public RagController(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) {
this.vectorStore = vectorStore;
// Build the client with default system prompt instructions if needed
this.chatClient = chatClientBuilder.build();
}
@GetMapping("/ask")
public String ask(@RequestParam String question) {
// Use the Fluent API with the QuestionAnswerAdvisor
// This Advisor automatically:
// 1. Takes the user question
// 2. Queries the VectorStore
// 3. Stuffs the results into the System Prompt
// 4. Sends it to the LLM
return chatClient.prompt()
.user(question)
.advisors(new QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults()))
.call()
.content();
}
}
This code snippet demonstrates the power of Spring AI. The QuestionAnswerAdvisor encapsulates the entire RAG pattern (Retrieval -> Augmentation -> Generation) into a single method chain.
Pricing Analysis: Is it really “Low Cost”? #
Let’s look at the numbers for a hypothetical dataset of 1 Million Vectors (approx. 1GB of vector data).
Option A: Pinecone (Standard)
- ~ $70 - $100 / month (depending on pod type and availability).
- Data transfer costs.
Option B: PostgreSQL (Managed RDS/Cloud SQL)
- Storage: 1GB on Postgres is negligible (~$0.10).
- Compute: You likely already have an RDS instance running at 20-30% CPU utilization.
- Marginal Cost: Near $0.
Even if you provision a new, small dedicated Postgres instance for vectors (e.g., AWS db.t4g.micro or small), your cost is roughly $15-$30/month, significantly lower than specialized vector SaaS starting tiers.
Conclusion #
The spring ai pgvector combination is more than just a convenience; it is a strategic architectural advantage. It allows Java developers to build sophisticated AI applications using tools they have trusted for decades.
By leveraging PostgreSQL for storage and Spring AI for the abstraction layer, you gain:
- Simplicity: No new infrastructure to manage.
- Cost Efficiency: Eliminate the “Vector Tax.”
- Transactional Integrity: Keep your data and vectors in sync.
- Performance: HNSW indexes provide production-grade latency.
As Spring AI continues to evolve (moving from 0.8.x to 1.0.0), the integration with pgvector will only get tighter. For Spring developers looking to enter the AI space, this is the path of least resistance and highest value.