Introduction #
The landscape of enterprise software development has shifted seismically with the adoption of Generative AI. However, for Java developers in the enterprise space, the challenge has not been accessing these models, but integrating them securely, maintainably, and efficiently into existing Spring Boot ecosystems.
While OpenAI’s public API ignited the revolution, Azure OpenAI has become the standard for large-scale corporate deployments. It offers the same powerful models (GPT-4o, GPT-3.5 Turbo) but wraps them in the security, compliance, and regional availability guarantees of the Microsoft Azure cloud.
Simultaneously, Spring AI has emerged as the de-facto framework for Java developers. It creates a portable, modular abstraction layer that decouples your business logic from specific AI providers.
In this guide, we will move beyond simple “Hello World” chat demos. We will engineer a production-ready solution using Spring AI and Azure OpenAI, addressing real-world concerns like Managed Identities, Structured Outputs, Retrieval Augmented Generation (RAG), and Observability.
Why Azure OpenAI with Spring AI? #
Before writing code, it is crucial to understand the architectural fit.
The Enterprise Gap #
Directly consuming public AI APIs often violates corporate data policies. Issues include:
- Data Residency: Public APIs may process data in regions incompatible with GDPR or CCPA.
- Private Networking: Enterprises need APIs accessible only via Private Links/VNETs, not the open internet.
- Identity Management: Rotating static API keys is a security nightmare.
The Solution #
- Azure OpenAI resolves the infrastructure concerns (Private endpoints, RBAC, regional deployment).
- Spring AI resolves the application concerns (Standard
ChatClientinterface, prompt templating, output parsing).
Together, they allow you to write standard Java code that adheres to strict enterprise compliance requirements.
Prerequisites and Infrastructure Setup #
To follow this guide, you will need:
- JDK 17 or 21 (Spring AI requires 17+).
- Spring Boot 3.2.x or 3.3.x.
- An Azure Subscription with OpenAI access enabled.
Step 1: Provisioning Azure OpenAI #
- Navigate to the Azure Portal.
- Create an Azure OpenAI resource. Note: Choose a region that supports the models you need (e.g., East US 2, Sweden Central).
- Critical Step: Once the resource is created, go to Model Deployments. You must deploy a model to use it.
- Model:
gpt-4oorgpt-35-turbo. - Deployment Name: Write this down (e.g.,
my-gpt4-deployment). In Azure, the API targets the Deployment Name, not the Model Name.
- Model:
Step 2: Key vs. Tokenless Security #
For development, you can grab the Key and Endpoint from the “Keys and Endpoint” blade. For production (which we will cover later), we will use Azure Active Directory (Entra ID).
Project Initialization #
Let’s bootstrap a Spring Boot application. We will use Maven for dependency management.
Dependency Management #
Spring AI uses a BOM (Bill of Materials) to manage versions. Ensure you are using the Milestone or Snapshot repositories if the General Availability (GA) version hasn’t been released for the feature set you need.
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>1.0.0-SNAPSHOT</version> <!-- Check for latest version -->
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- The Core Azure OpenAI Starter -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-azure-openai-spring-boot-starter</artifactId>
</dependency>
</dependencies>
Configuration #
In your application.yml, configure the connection details. Note specifically that we use spring.ai.azure.openai.*, not the standard OpenAI namespace.
spring:
ai:
azure:
openai:
api-key: ${AZURE_OPENAI_API_KEY}
endpoint: ${AZURE_OPENAI_ENDPOINT} # e.g., https://my-resource.openai.azure.com/
chat:
options:
deployment-name: my-gpt4-deployment # The name you chose in Azure Portal
temperature: 0.7
max-tokens: 2000
Core Implementation: The Chat Client #
Spring AI 1.0 introduced a fluent ChatClient API that simplifies interaction significantly compared to the lower-level ChatModel.
Basic Configuration Bean #
First, configure the builder in a configuration class.
@Configuration
class AiConfig {
@Bean
ChatClient chatClient(ChatClient.Builder builder) {
return builder
.defaultSystem("You are a senior enterprise architect assistant proficient in Spring Boot and Cloud patterns.")
.build();
}
}
The Controller Layer #
Here is a REST controller demonstrating a basic interaction.
@RestController
@RequestMapping("/api/v1/architect")
public class ArchitectureController {
private final ChatClient chatClient;
public ArchitectureController(ChatClient chatClient) {
this.chatClient = chatClient;
}
@GetMapping("/advice")
public String getAdvice(@RequestParam String topic) {
return chatClient.prompt()
.user(u -> u.text("Give me three best practices for {topic}")
.param("topic", topic))
.call()
.content();
}
}
When you hit /api/v1/architect/advice?topic=Microservices, Spring AI handles the HTTP request to Azure, negotiates the API version, and returns the generated text.
Enterprise Feature 1: Structured Outputs (JSON) #
In enterprise systems, we rarely want unstructured text blocks. We want JSON that maps to Java Objects (POJOs/Records) to drive business logic.
Spring AI provides the BeanOutputConverter.
Define the Data Contract #
Let’s say we want to analyze a legacy code snippet and extract technical debt items.
public record CodeAnalysis(
String complexityLevel,
List<String> securityRisks,
List<String> refactoringSuggestions,
double estimatedRefactoringHours
) {}
Implementing the Converter #
We simply tell the ChatClient to return this specific entity.
@PostMapping("/analyze")
public CodeAnalysis analyzeLegacyCode(@RequestBody String codeSnippet) {
return chatClient.prompt()
.user(u -> u.text("Analyze the following Java code for technical debt: \n\n {code}")
.param("code", codeSnippet))
.call()
.entity(CodeAnalysis.class); // Magic happens here
}
How it works:
- Spring AI automatically appends instructions to the prompt, telling the LLM to output valid JSON matching the schema of
CodeAnalysis. - It sets the
response_formatto JSON (if supported by the model). - Upon receiving the response, it deserializes the JSON string into your Java Record.
This turns the LLM into a probabilistic transformation engine, capable of integrating directly into data pipelines.
Enterprise Feature 2: Retrieval Augmented Generation (RAG) #
Deploying ChatGPT is not enough; enterprises need AI to know their data. RAG is the architecture of retrieving relevant private data and injecting it into the prompt context.
With Azure, the natural choice for the vector store is Azure AI Search.
Dependencies #
Add the vector store dependency:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-azure-vector-store-spring-boot-starter</artifactId>
</dependency>
Configuration #
You need an Azure AI Search resource.
spring:
ai:
vectorstore:
azure:
url: ${AZURE_SEARCH_ENDPOINT}
api-key: ${AZURE_SEARCH_KEY}
index-name: spring-ai-docs
initialize-schema: true # Creates index if missing
Ingesting Data (ETL) #
Before querying, you must load your documents.
@Service
public class IngestionService {
private final VectorStore vectorStore;
@Value("classpath:company-policies.pdf")
private Resource policyPdf;
public IngestionService(VectorStore vectorStore) {
this.vectorStore = vectorStore;
}
public void ingest() {
// 1. Read
TikaDocumentReader reader = new TikaDocumentReader(policyPdf);
List<Document> documents = reader.get();
// 2. Split (Token splitter helps manage context window)
TokenTextSplitter splitter = new TokenTextSplitter();
List<Document> splitDocs = splitter.apply(documents);
// 3. Store (Embeds using Azure OpenAI Embeddings -> Azure AI Search)
vectorStore.add(splitDocs);
}
}
Retrieval (The “R” in RAG) #
Now, modify your ChatClient to look up data before answering.
@GetMapping("/policy")
public String askPolicy(@RequestParam String question) {
return chatClient.prompt()
.user(question)
.advisors(new QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults()))
.call()
.content();
}
The QuestionAnswerAdvisor intercepts the request, vectorizes the user’s question, queries Azure AI Search for similar document chunks, appends them to the system prompt (“Use the following context…”), and then calls Azure OpenAI.
Security: Moving to Managed Identities #
Using api-key in production is a security smell. The “Spring Way” on Azure is using DefaultAzureCredential.
Why Managed Identity? #
- No Secrets: No API keys in
application.ymlor Git. - RBAC: Grant specific permissions (Cognitive Services OpenAI User) to the App Service Identity.
- Rotation: Azure handles credential rotation automatically.
Implementation #
Remove the api-key from your YAML. Add the Azure Identity dependency:
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-identity</artifactId>
</dependency>
Spring AI’s Azure auto-configuration detects the absence of an API key and the presence of azure-identity. It will attempt to obtain a token using the DefaultAzureCredential chain (Environment Vars -> Workload Identity -> Managed Identity -> Azure CLI).
Ensure your Azure App Service (or Container App) has the “Cognitive Services OpenAI User” role assigned on the Azure OpenAI Resource.
Resilience and Observability #
GenAI APIs are slower and more prone to transient errors (Rate Limits, Overloaded Model) than standard REST APIs.
Handling Rate Limits (HTTP 429) #
Azure OpenAI has strict Token-Per-Minute (TPM) limits. Spring AI uses the RestClient under the hood. You should configure a robust RetryTemplate.
Spring AI provides default retry logic, but for Azure, you often want exponential backoff.
@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
return builder
.withRequestCustomizer(customizer -> {
// Advanced customization of the underlying RestClient
})
.build();
}
Note: As of the current version, Spring AI’s auto-configuration provides sensible defaults for retries on 429 and 5xx errors. Verify spring.ai.retry.max-attempts in your properties.
Monitoring Token Usage #
Cost control is vital. Azure charges by token input/output. Spring AI exposes usage metadata.
ChatResponse response = chatClient.prompt().user("...").call().chatResponse();
Usage usage = response.getMetadata().getUsage();
logger.info("Prompt Tokens: {}, Generation Tokens: {}",
usage.getPromptTokens(),
usage.getGenerationTokens());
For a comprehensive view, integrate Micrometer. Spring AI instruments the chat client automatically. If you use Azure Monitor (Application Insights), you will see traces for your AI calls, including latency and token counts, appearing alongside your SQL and HTTP metrics.
Best Practices for Production #
1. Context Window Management #
The GPT-4o model has a massive context window (128k tokens), but relying on it is expensive and slow.
- Don’t dump the whole database. Use RAG to fetch only the top 3-5 relevant chunks.
- Summarize History. If implementing a chat bot, do not send the entire conversation history forever. Summarize older turns or use a rolling window of the last 10 messages.
2. Prompt Engineering as Code #
Do not hardcode prompts in Java strings. Use UserMessage.from("classpath:/prompts/system-prompt.st").
This allows you to update prompts without recompiling code and facilitates A/B testing of prompts.
3. Deployment Slots #
Azure OpenAI supports deployment slots (similar to App Service). You can have a gpt-35-turbo model for your “Dev” environment and gpt-4o for “Prod”, or test new model versions (e.g., 0613 vs 1106) safely. Spring AI supports this via the deployment-name property, which can be overridden per environment profile.
4. Content Safety #
Azure applies content filters (Hate, Violence, Self-harm) by default. Spring AI wraps these exceptions. Ensure your global exception handler catches OpenCiSafetyException (or the Azure specific equivalent) to fail gracefully if a user input violates policy.
Troubleshooting Guide #
Error: 404 Not Found
- Cause: Usually, the
endpointis correct, but thedeployment-nameis wrong. - Fix: In Azure OpenAI, you create a Resource, then a Model Deployment. The Spring configuration
spring.ai.azure.openai.chat.options.deployment-namemust match the custom name you gave the deployment, not the generic model name (like “gpt-3.5-turbo”).
Error: 401 Unauthorized
- Cause: Wrong API Key or Key rotation. If using Managed Identity, the role assignment takes 5-10 minutes to propagate.
- Fix: Verify the role “Cognitive Services OpenAI User” is assigned to the correct principal.
Error: BeanOutputConverter fails
- Cause: The LLM was “lazy” and didn’t return valid JSON.
- Fix: Iterate on your prompt. Add “You must respond strictly with valid JSON.” Reducing
temperatureto 0.1 helps significantly with deterministic formatting.
Conclusion #
Integrating Azure OpenAI with Spring AI allows Java enterprises to leapfrog into the Generative AI era without discarding their established patterns of security, observability, and architecture.
By combining the portability of Spring AI with the governance of Azure, you build systems that are not just impressive demos, but sustainable, secure, and valuable business assets.
The future of Spring is intelligent. Start building your ChatClient today.
References: