Introduction: The Enterprise AI Dilemma #
The modern enterprise is drowning in data but starving for knowledge. We have terabytes of PDFs, internal wikis, and markdown documentation, yet finding specific answers requires navigating clunky search bars or asking colleagues.
Large Language Models (LLMs) like GPT-4 changed the landscape, but they suffer from two critical flaws in an enterprise context:
- Hallucinations: They make things up when they don’t know the answer.
- Cut-off Dates & Privacy: They do not know your private company data, and training a custom model is prohibitively expensive and risky.
The solution is RAG (Retrieval-Augmented Generation). By combining the reasoning capabilities of an LLM with the factual accuracy of your internal data, you create a “Spring AI Knowledge Base” that is accurate, verifiable, and secure.
In this deep-dive tutorial, we will build a full-stack RAG application using Spring AI, the official framework that is rapidly becoming the standard for Java developers entering the Generative AI space. We will move beyond the basics and tackle enterprise concerns like modularity, vector store selection, and scalable ingestion pipelines.
Architecture: The RAG Pipeline #
Before writing code, we must understand the flow. A Spring AI knowledge base relies on a pipeline that transforms unstructured text into mathematical vectors.
The Two Flows of RAG #
-
The Ingestion Flow (Offline/Async):
- Load: Read documents (PDF, Json, Text).
- Split (Chunk): Break documents into manageable pieces (Token-based splitting).
- Embed: Send text chunks to an Embedding Model (e.g.,
text-embedding-3-small) to get a vector array (e.g., 1536 floats). - Store: Save the text + vector + metadata into a Vector Database.
-
The Retrieval Flow (Runtime):
- Query Embedding: The user asks a question. This question is embedded into a vector.
- Semantic Search: The database calculates the “Cosine Similarity” to find chunks closest to the user’s intent, not just keyword matches.
- Augmentation: The retrieved chunks are stuffed into a “System Prompt” as context.
- Generation: The LLM generates an answer based only on the provided context.
The Tech Stack #
- Java 21: For the latest language features.
- Spring Boot 3.3.x: The foundation.
- Spring AI 1.0.0-SNAPSHOT: The AI abstraction layer.
- PostgreSQL with
pgvector: The industry-standard open-source vector store. We choose this over niche vector DBs because most enterprises already run Postgres. - Docker Compose: For local infrastructure orchestration.
Phase 1: Infrastructure Setup (Docker) #
To build a Spring AI knowledge base, we need a database capable of performing vector math. PostgreSQL requires the pgvector extension.
Create a compose.yaml file in your project root:
services:
postgres:
image: pgvector/pgvector:pg16
container_name: spring-ai-pgvector
environment:
- POSTGRES_USER=myuser
- POSTGRES_PASSWORD=secret
- POSTGRES_DB=knowledge_base
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:
Run it:
docker-compose up -d
Architectural Note: In a real production environment (AWS RDS or Azure Postgres), ensure the pgvector extension is enabled via your cloud provider’s console.
Phase 2: Project Initialization and Dependencies #
We will use Maven. If you are using Spring Initializr, select “Spring AI”, “Spring Web”, and “PostgreSQL Driver”.
Add the Spring Milestone repositories (since Spring AI is evolving fast):
<repositories>
<repository>
<id>spring-milestones</id>
<name>Spring Milestones</name>
<url>https://repo.spring.io/milestone</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
Add the dependencies to pom.xml:
<dependencies>
<!-- Spring AI OpenAI (Chat + Embeddings) -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<!-- Vector Store: Postgres -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
</dependency>
<!-- Document Readers (PDF, Tika, etc) -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-tika-document-reader</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<scope>runtime</scope>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>1.0.0-SNAPSHOT</version> <!-- Check for latest version -->
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
Configuration (application.yml)
#
Here we configure OpenAI API keys and the Vector Store connection.
spring:
application:
name: spring-ai-knowledge-base
datasource:
url: jdbc:postgresql://localhost:5432/knowledge_base
username: myuser
password: secret
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4o # Or gpt-3.5-turbo
vectorstore:
pgvector:
index-type: HNSW # Hierarchical Navigable Small World - faster for large datasets
dimension: 1536 # Must match OpenAI text-embedding-3-small
initialize-schema: true # Creates tables automatically
Security Tip: Never hardcode API keys. Use environment variables.
Phase 3: The Ingestion Service (ETL) #
This is the most critical part of a Spring AI knowledge base. If you feed garbage into the vector store, you will get garbage out (“Garbage In, Garbage Out”).
We need a service that can take a directory of documents, split them, and store them.
3.1 The Document Loading Strategy #
We’ll use TikaDocumentReader because it handles almost anything (PDF, Docx, PPT).
package com.springdevpro.kb.ingestion;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.reader.tika.TikaDocumentReader;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.core.io.Resource;
import org.springframework.core.io.support.ResourcePatternResolver;
import org.springframework.stereotype.Service;
import org.springframework.ai.document.Document;
import java.io.IOException;
import java.util.List;
@Service
public class IngestionService {
private final Logger log = LoggerFactory.getLogger(IngestionService.class);
private final VectorStore vectorStore;
private final ResourcePatternResolver resourcePatternResolver;
public IngestionService(VectorStore vectorStore, ResourcePatternResolver resourcePatternResolver) {
this.vectorStore = vectorStore;
this.resourcePatternResolver = resourcePatternResolver;
}
public void ingestDocs() throws IOException {
// 1. Load Resources
Resource[] resources = resourcePatternResolver.getResources("classpath:docs/*.pdf");
if (resources.length == 0) {
log.warn("No documents found in classpath:docs/");
return;
}
for (Resource resource : resources) {
log.info("Processing: {}", resource.getFilename());
// 2. Read
TikaDocumentReader reader = new TikaDocumentReader(resource);
List<Document> documents = reader.get();
// 3. Split (Chunking)
// Why TokenTextSplitter? Because LLMs have context windows measured in tokens.
// We want chunks around 500-1000 tokens with some overlap to preserve context.
TokenTextSplitter splitter = new TokenTextSplitter(1000, 400, 10, 5000, true);
List<Document> splitDocuments = splitter.apply(documents);
// 4. Enrich Metadata (Optional but recommended)
for (Document doc : splitDocuments) {
doc.getMetadata().put("filename", resource.getFilename());
doc.getMetadata().put("ingestion_date", System.currentTimeMillis());
}
// 5. Embed & Store
// Spring AI handles the embedding call internally when we call add()
vectorStore.add(splitDocuments);
log.info("Ingested {} chunks for {}", splitDocuments.size(), resource.getFilename());
}
}
}
Deep Dive: Why Chunking Matters? #
Many tutorials skip this. If you upload a 50-page PDF as one vector:
- The vector represents the average meaning of the whole PDF, diluting specific details.
- When retrieved, the entire text won’t fit in the LLM’s context window.
TokenTextSplitter is superior to simple string splitting because it respects sentence boundaries and token limits. The minChunkSizeChars and chunkSize must be tuned based on your data density.
Phase 4: The Retrieval Service (Semantic Search) #
Now that our Spring AI knowledge base is populated, we need to search it. We are not using SQL LIKE %query%. We are using vectorStore.similaritySearch.
package com.springdevpro.kb.retrieval;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.stream.Collectors;
@Service
public class KnowledgeRetrievalService {
private final VectorStore vectorStore;
public KnowledgeRetrievalService(VectorStore vectorStore) {
this.vectorStore = vectorStore;
}
public List<String> search(String query) {
// We request the top 3 most similar chunks.
// Similarity threshold: 0.0 to 1.0.
// 0.7 is a good baseline for "relevant enough".
List<Document> similarDocuments = vectorStore.similaritySearch(
SearchRequest.query(query)
.withTopK(3)
.withSimilarityThreshold(0.7)
);
return similarDocuments.stream()
.map(Document::getContent)
.collect(Collectors.toList());
}
}
Phase 5: The Generation Service (The “AI” in Spring AI) #
This is where RAG comes together. We use the ChatClient to combine the User’s Query + The Retrieved Data.
The System Prompt #
The magic lies in the Prompt Engineering. We must instruct the AI strictly:
“You are a helpful assistant. Use only the provided context to answer. If you don’t know, say you don’t know.”
package com.springdevpro.kb.chat;
import com.springdevpro.kb.retrieval.KnowledgeRetrievalService;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.prompt.PromptTemplate;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.Map;
@Service
public class AiChatService {
private final ChatClient chatClient;
private final KnowledgeRetrievalService retrievalService;
// Define the prompt template
private final String ragPromptTemplate = """
You are an intelligent assistant for the Spring DevPro company.
You are assisting a user with technical questions.
Use the following pieces of context to answer the question at the end.
If the context does not contain the answer, say "I am sorry, but our internal knowledge base does not contain information regarding this topic."
Do not make up answers.
CONTEXT:
{context}
QUESTION:
{question}
""";
public AiChatService(ChatClient.Builder chatClientBuilder, KnowledgeRetrievalService retrievalService) {
this.chatClient = chatClientBuilder.build();
this.retrievalService = retrievalService;
}
public String generateAnswer(String userQuery) {
// 1. Retrieve Context
List<String> contentList = retrievalService.search(userQuery);
String context = String.join("\n\n", contentList);
// 2. Construct Prompt
PromptTemplate promptTemplate = new PromptTemplate(ragPromptTemplate);
Map<String, Object> promptParameters = Map.of(
"context", context,
"question", userQuery
);
// 3. Call LLM
return chatClient.prompt(promptTemplate.create(promptParameters))
.call()
.content();
}
}
Architectural Highlight: ChatClient
#
In recent Spring AI versions, ChatClient is the preferred fluid API over the raw ChatModel. It handles observation, advisors (history), and error handling more gracefully.
Phase 6: The API Layer #
We expose this via a simple REST Controller.
package com.springdevpro.kb.api;
import com.springdevpro.kb.chat.AiChatService;
import com.springdevpro.kb.ingestion.IngestionService;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import java.io.IOException;
import java.util.Map;
@RestController
@RequestMapping("/api/v1/kb")
public class KnowledgeBaseController {
private final AiChatService aiChatService;
private final IngestionService ingestionService;
public KnowledgeBaseController(AiChatService aiChatService, IngestionService ingestionService) {
this.aiChatService = aiChatService;
this.ingestionService = ingestionService;
}
@PostMapping("/chat")
public ResponseEntity<Map<String, String>> chat(@RequestBody Map<String, String> request) {
String question = request.get("question");
String answer = aiChatService.generateAnswer(question);
return ResponseEntity.ok(Map.of("answer", answer));
}
@PostMapping("/trigger-ingestion")
public ResponseEntity<String> triggerIngestion() {
try {
ingestionService.ingestDocs();
return ResponseEntity.ok("Ingestion started successfully.");
} catch (IOException e) {
return ResponseEntity.internalServerError().body("Ingestion failed: " + e.getMessage());
}
}
}
Phase 7: Advanced Enterprise Features #
To make this Spring AI knowledge base production-ready, we need to address metadata filtering, performance, and cost.
7.1 Metadata Filtering (RBAC for AI) #
In a real company, Engineering docs shouldn’t be visible to HR, and Salary docs shouldn’t be visible to Engineering.
Spring AI allows filtering at the Vector Store level. This is efficient because it filters before performing the expensive vector similarity search.
Update KnowledgeRetrievalService:
import org.springframework.ai.vectorstore.filter.FilterExpressionBuilder;
// ... inside search method ...
FilterExpressionBuilder b = new FilterExpressionBuilder();
List<Document> similarDocuments = vectorStore.similaritySearch(
SearchRequest.query(query)
.withTopK(3)
// Only search documents where metadata 'department' == 'engineering'
.withFilterExpression(b.eq("department", "engineering").build())
);
When ingesting documents, you would parse the folder structure (e.g., /docs/engineering/guide.pdf) to populate this metadata field.
7.2 Conversational Memory #
The implementation above is stateless (single-turn). To support “Follow up questions”, you need to store chat history.
Spring AI provides MessageChatMemoryAdvisor.
// Inside AiChatService configuration
this.chatClient = chatClientBuilder
.defaultSystem("You are a helpful assistant.")
.advisors(new MessageChatMemoryAdvisor(new InMemoryChatMemory())) // Or usage JDBC Chat Memory
.build();
7.3 Cost Management & Token Usage #
OpenAI charges by the token. To optimize costs:
- Reduce Context: Only send the top 2-3 chunks, not top 10.
- Summarization: If retrieved chunks are long, use a cheaper model (GPT-3.5-turbo) to summarize them before sending to the expensive model (GPT-4) for the final answer.
- Local Embeddings: Use
Ollamarunningnomic-embed-textlocally for embeddings to save costs on the massive ingestion process, while using OpenAI only for the “smart” generation part.
Phase 8: Testing and Validation #
How do you test an AI? You can’t use simple JUnit assertions because the output is non-deterministic. However, you can use Spring AI Evaluation (experimental).
For now, manual validation involves:
- Place a unique fact in a PDF (e.g., “The internal code for Project X is ‘BlueMonkey’”).
- Ingest the PDF.
- Ask the API: “What is the internal code for Project X?”
- Verify the answer.
Also, monitor the logs. Spring AI enables debug logging for prompt generation:
logging:
level:
org.springframework.ai: DEBUG
This allows you to see exactly what prompt is being sent to OpenAI, including the stuffed context. This is vital for debugging “hallucinations”.
Conclusion: The Future of Spring AI Knowledge Bases #
We have successfully built an Enterprise Knowledge Base. We moved from raw documents to a semantic search engine capable of answering complex queries with high accuracy.
The power of using Spring AI lies in its abstraction. Today we used OpenAI and Postgres. Tomorrow, if privacy regulations change, we can switch to Ollama (Llama 3) and Milvus or Qdrant by changing just a few lines of configuration in application.yml, without rewriting our business logic.
Next Steps for Your Project #
- Frontend: Build a React/Angular UI to visualize the chat.
- Streaming: Enable
Flux<String>responses to give the “typing” effect users expect. - Citations: Return the source filename along with the answer so users can verify the truth.
Building a Spring AI knowledge base is no longer a science project—it is a core requirement for modern enterprise architecture. By following this guide, you have laid the foundation for a system that transforms static data into actionable intelligence.
If you found this guide helpful, check out our upcoming series on “Spring AI with Local LLMs (Ollama)” and “Event-Driven AI with Spring Cloud Stream”.