Azure OpenAI + Spring AI: Enterprise Deployment Guide

Introduction

The landscape of enterprise software development has shifted seismically with the adoption of Generative AI. However, for Java developers in the enterprise space, the challenge has not been accessing these models, but integrating them securely, maintainably, and efficiently into existing Spring Boot ecosystems.

While OpenAI's public API ignited the revolution, Azure OpenAI has become the standard for large-scale corporate deployments. It offers the same powerful models (GPT-4o, GPT-3.5 Turbo) but wraps them in the security, compliance, and regional availability guarantees of the Microsoft Azure cloud.

Simultaneously, Spring AI has emerged as the de-facto framework for Java developers. It creates a portable, modular abstraction layer that decouples your business logic from specific AI providers.

In this guide, we will move beyond simple "Hello World" chat demos. We will engineer a production-ready solution using Spring AI and Azure OpenAI, addressing real-world concerns like Managed Identities, Structured Outputs, Retrieval Augmented Generation (RAG), and Observability.

Why Azure OpenAI with Spring AI?

Before writing code, it is crucial to understand the architectural fit.

The Enterprise Gap

Directly consuming public AI APIs often violates corporate data policies. Issues include:

Data Residency: Public APIs may process data in regions incompatible with GDPR or CCPA.
Private Networking: Enterprises need APIs accessible only via Private Links/VNETs, not the open internet.
Identity Management: Rotating static API keys is a security nightmare.

The Solution

Azure OpenAI resolves the infrastructure concerns (Private endpoints, RBAC, regional deployment).
Spring AI resolves the application concerns (Standard ChatClient interface, prompt templating, output parsing).

Together, they allow you to write standard Java code that adheres to strict enterprise compliance requirements.

Prerequisites and Infrastructure Setup

To follow this guide, you will need:

JDK 17 or 21 (Spring AI requires 17+).
Spring Boot 3.2.x or 3.3.x.
An Azure Subscription with OpenAI access enabled.

Step 1: Provisioning Azure OpenAI

Navigate to the Azure Portal.
Create an Azure OpenAI resource. Note: Choose a region that supports the models you need (e.g., East US 2, Sweden Central).
Critical Step: Once the resource is created, go to Model Deployments. You must deploy a model to use it.
- Model: gpt-4o or gpt-35-turbo.
- Deployment Name: Write this down (e.g., my-gpt4-deployment). In Azure, the API targets the Deployment Name, not the Model Name.

Step 2: Key vs. Tokenless Security

For development, you can grab the Key and Endpoint from the "Keys and Endpoint" blade. For production (which we will cover later), we will use Azure Active Directory (Entra ID).

Project Initialization

Let's bootstrap a Spring Boot application. We will use Maven for dependency management.

Dependency Management

Spring AI uses a BOM (Bill of Materials) to manage versions. Ensure you are using the Milestone or Snapshot repositories if the General Availability (GA) version hasn't been released for the feature set you need.

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.0.0-SNAPSHOT</version> <!-- Check for latest version -->
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <!-- The Core Azure OpenAI Starter -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-azure-openai-spring-boot-starter</artifactId>
    </dependency>
</dependencies>

Configuration

In your application.yml, configure the connection details. Note specifically that we use spring.ai.azure.openai.*, not the standard OpenAI namespace.

spring:
  ai:
    azure:
      openai:
        api-key: ${AZURE_OPENAI_API_KEY}
        endpoint: ${AZURE_OPENAI_ENDPOINT} # e.g., https://my-resource.openai.azure.com/
        chat:
          options:
            deployment-name: my-gpt4-deployment # The name you chose in Azure Portal
            temperature: 0.7
            max-tokens: 2000

Core Implementation: The Chat Client

Spring AI 1.0 introduced a fluent ChatClient API that simplifies interaction significantly compared to the lower-level ChatModel.

Basic Configuration Bean

First, configure the builder in a configuration class.

@Configuration
class AiConfig {

    @Bean
    ChatClient chatClient(ChatClient.Builder builder) {
        return builder
                .defaultSystem("You are a senior enterprise architect assistant proficient in Spring Boot and Cloud patterns.")
                .build();
    }
}

The Controller Layer

Here is a REST controller demonstrating a basic interaction.

@RestController
@RequestMapping("/api/v1/architect")
public class ArchitectureController {

    private final ChatClient chatClient;

    public ArchitectureController(ChatClient chatClient) {
        this.chatClient = chatClient;
    }

    @GetMapping("/advice")
    public String getAdvice(@RequestParam String topic) {
        return chatClient.prompt()
                .user(u -> u.text("Give me three best practices for {topic}")
                        .param("topic", topic))
                .call()
                .content();
    }
}

When you hit /api/v1/architect/advice?topic=Microservices, Spring AI handles the HTTP request to Azure, negotiates the API version, and returns the generated text.

Enterprise Feature 1: Structured Outputs (JSON)

In enterprise systems, we rarely want unstructured text blocks. We want JSON that maps to Java Objects (POJOs/Records) to drive business logic.

Spring AI provides the BeanOutputConverter.

Define the Data Contract

Let's say we want to analyze a legacy code snippet and extract technical debt items.

public record CodeAnalysis(
    String complexityLevel,
    List<String> securityRisks,
    List<String> refactoringSuggestions,
    double estimatedRefactoringHours
) {}

Implementing the Converter

We simply tell the ChatClient to return this specific entity.

@PostMapping("/analyze")
public CodeAnalysis analyzeLegacyCode(@RequestBody String codeSnippet) {
    return chatClient.prompt()
            .user(u -> u.text("Analyze the following Java code for technical debt: \n\n {code}")
                    .param("code", codeSnippet))
            .call()
            .entity(CodeAnalysis.class); // Magic happens here
}

How it works:

Spring AI automatically appends instructions to the prompt, telling the LLM to output valid JSON matching the schema of CodeAnalysis.
It sets the response_format to JSON (if supported by the model).
Upon receiving the response, it deserializes the JSON string into your Java Record.

This turns the LLM into a probabilistic transformation engine, capable of integrating directly into data pipelines.

Enterprise Feature 2: Retrieval Augmented Generation (RAG)

Deploying ChatGPT is not enough; enterprises need AI to know their data. RAG is the architecture of retrieving relevant private data and injecting it into the prompt context.

With Azure, the natural choice for the vector store is Azure AI Search.

Dependencies

Add the vector store dependency:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-azure-vector-store-spring-boot-starter</artifactId>
</dependency>

Configuration

You need an Azure AI Search resource.

spring:
  ai:
    vectorstore:
      azure:
        url: ${AZURE_SEARCH_ENDPOINT}
        api-key: ${AZURE_SEARCH_KEY}
        index-name: spring-ai-docs
        initialize-schema: true # Creates index if missing

Ingesting Data (ETL)

Before querying, you must load your documents.

@Service
public class IngestionService {

    private final VectorStore vectorStore;

    @Value("classpath:company-policies.pdf")
    private Resource policyPdf;

    public IngestionService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    public void ingest() {
        // 1. Read
        TikaDocumentReader reader = new TikaDocumentReader(policyPdf);
        List<Document> documents = reader.get();

        // 2. Split (Token splitter helps manage context window)
        TokenTextSplitter splitter = new TokenTextSplitter();
        List<Document> splitDocs = splitter.apply(documents);

        // 3. Store (Embeds using Azure OpenAI Embeddings -> Azure AI Search)
        vectorStore.add(splitDocs);
    }
}

Retrieval (The "R" in RAG)

Now, modify your ChatClient to look up data before answering.

@GetMapping("/policy")
public String askPolicy(@RequestParam String question) {
    return chatClient.prompt()
            .user(question)
            .advisors(new QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults()))
            .call()
            .content();
}

The QuestionAnswerAdvisor intercepts the request, vectorizes the user's question, queries Azure AI Search for similar document chunks, appends them to the system prompt ("Use the following context..."), and then calls Azure OpenAI.

Security: Moving to Managed Identities

Using api-key in production is a security smell. The "Spring Way" on Azure is using DefaultAzureCredential.

Why Managed Identity?

No Secrets: No API keys in application.yml or Git.
RBAC: Grant specific permissions (Cognitive Services OpenAI User) to the App Service Identity.
Rotation: Azure handles credential rotation automatically.

Implementation

Remove the api-key from your YAML. Add the Azure Identity dependency:

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-identity</artifactId>
</dependency>

Spring AI's Azure auto-configuration detects the absence of an API key and the presence of azure-identity. It will attempt to obtain a token using the DefaultAzureCredential chain (Environment Vars -> Workload Identity -> Managed Identity -> Azure CLI).

Ensure your Azure App Service (or Container App) has the "Cognitive Services OpenAI User" role assigned on the Azure OpenAI Resource.

Resilience and Observability

GenAI APIs are slower and more prone to transient errors (Rate Limits, Overloaded Model) than standard REST APIs.

Handling Rate Limits (HTTP 429)

Azure OpenAI has strict Token-Per-Minute (TPM) limits. Spring AI uses the RestClient under the hood. You should configure a robust RetryTemplate.

Spring AI provides default retry logic, but for Azure, you often want exponential backoff.

@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
    return builder
        .withRequestCustomizer(customizer -> {
            // Advanced customization of the underlying RestClient
        })
        .build();
}

Note: As of the current version, Spring AI's auto-configuration provides sensible defaults for retries on 429 and 5xx errors. Verify spring.ai.retry.max-attempts in your properties.

Monitoring Token Usage

Cost control is vital. Azure charges by token input/output. Spring AI exposes usage metadata.

ChatResponse response = chatClient.prompt().user("...").call().chatResponse();
Usage usage = response.getMetadata().getUsage();

logger.info("Prompt Tokens: {}, Generation Tokens: {}", 
    usage.getPromptTokens(), 
    usage.getGenerationTokens());

For a comprehensive view, integrate Micrometer. Spring AI instruments the chat client automatically. If you use Azure Monitor (Application Insights), you will see traces for your AI calls, including latency and token counts, appearing alongside your SQL and HTTP metrics.

Best Practices for Production

1. Context Window Management

The GPT-4o model has a massive context window (128k tokens), but relying on it is expensive and slow.

Don't dump the whole database. Use RAG to fetch only the top 3-5 relevant chunks.
Summarize History. If implementing a chat bot, do not send the entire conversation history forever. Summarize older turns or use a rolling window of the last 10 messages.

2. Prompt Engineering as Code

Do not hardcode prompts in Java strings. Use UserMessage.from("classpath:/prompts/system-prompt.st"). This allows you to update prompts without recompiling code and facilitates A/B testing of prompts.

3. Deployment Slots

Azure OpenAI supports deployment slots (similar to App Service). You can have a gpt-35-turbo model for your "Dev" environment and gpt-4o for "Prod", or test new model versions (e.g., 0613 vs 1106) safely. Spring AI supports this via the deployment-name property, which can be overridden per environment profile.

4. Content Safety

Azure applies content filters (Hate, Violence, Self-harm) by default. Spring AI wraps these exceptions. Ensure your global exception handler catches OpenCiSafetyException (or the Azure specific equivalent) to fail gracefully if a user input violates policy.

Troubleshooting Guide

Error: 404 Not Found

Cause: Usually, the endpoint is correct, but the deployment-name is wrong.
Fix: In Azure OpenAI, you create a Resource, then a Model Deployment. The Spring configuration spring.ai.azure.openai.chat.options.deployment-name must match the custom name you gave the deployment, not the generic model name (like "gpt-3.5-turbo").

Error: 401 Unauthorized

Cause: Wrong API Key or Key rotation. If using Managed Identity, the role assignment takes 5-10 minutes to propagate.
Fix: Verify the role "Cognitive Services OpenAI User" is assigned to the correct principal.

Error: BeanOutputConverter fails

Cause: The LLM was "lazy" and didn't return valid JSON.
Fix: Iterate on your prompt. Add "You must respond strictly with valid JSON." Reducing temperature to 0.1 helps significantly with deterministic formatting.

Conclusion

Integrating Azure OpenAI with Spring AI allows Java enterprises to leapfrog into the Generative AI era without discarding their established patterns of security, observability, and architecture.

By combining the portability of Spring AI with the governance of Azure, you build systems that are not just impressive demos, but sustainable, secure, and valuable business assets.

The future of Spring is intelligent. Start building your ChatClient today.

References:

Introduction​

Why Azure OpenAI with Spring AI?​

The Enterprise Gap​

The Solution​

Prerequisites and Infrastructure Setup​

Step 1: Provisioning Azure OpenAI​

Step 2: Key vs. Tokenless Security​

Project Initialization​

Dependency Management​

Configuration​

Core Implementation: The Chat Client​

Basic Configuration Bean​

The Controller Layer​

Enterprise Feature 1: Structured Outputs (JSON)​

Define the Data Contract​

Implementing the Converter​

Enterprise Feature 2: Retrieval Augmented Generation (RAG)​

Dependencies​

Configuration​

Ingesting Data (ETL)​

Retrieval (The "R" in RAG)​

Security: Moving to Managed Identities​

Why Managed Identity?​

Implementation​

Resilience and Observability​

Handling Rate Limits (HTTP 429)​

Monitoring Token Usage​

Best Practices for Production​

1. Context Window Management​

2. Prompt Engineering as Code​

3. Deployment Slots​

4. Content Safety​

Troubleshooting Guide​

Conclusion​

Introduction

Why Azure OpenAI with Spring AI?

The Enterprise Gap

The Solution

Prerequisites and Infrastructure Setup

Step 1: Provisioning Azure OpenAI

Step 2: Key vs. Tokenless Security

Project Initialization

Dependency Management

Configuration

Core Implementation: The Chat Client

Basic Configuration Bean

The Controller Layer

Enterprise Feature 1: Structured Outputs (JSON)

Define the Data Contract

Implementing the Converter

Enterprise Feature 2: Retrieval Augmented Generation (RAG)

Dependencies

Configuration

Ingesting Data (ETL)

Retrieval (The "R" in RAG)

Security: Moving to Managed Identities

Why Managed Identity?

Implementation

Resilience and Observability

Handling Rate Limits (HTTP 429)

Monitoring Token Usage

Best Practices for Production

1. Context Window Management

2. Prompt Engineering as Code

3. Deployment Slots

4. Content Safety

Troubleshooting Guide

Conclusion