Spring AI ChatResponse Source Code Analysis

Table of Contents

An architect’s dissection of how Spring AI transforms disparate LLM outputs into a stable, portable domain model — and why that matters for every enterprise that touches AI.

Introduction
#

The conversation with a Large Language Model is a request-response cycle. The Prompt object, as explored in a previous analysis, defines the structured request. But equally critical is the response. Without a carefully designed response model, an AI framework cannot deliver portability, observability, or maintainability — the very qualities that distinguish a production-grade system from a prototype.

Consider the outputs from different LLM providers. OpenAI returns a ChatCompletion object with a list of choices, each carrying a message and a finish_reason. Anthropic responds with a ContentBlock-based structure, possibly interleaving text and tool use. Google’s Gemini uses a GenerateContentResponse with candidates and a different safety-rating schema. Amazon Bedrock provides a streaming-oriented model. Alibaba’s DashScope uses yet another shape. Each provider exposes its own SDK DTOs, all structurally different and tuned to their unique API evolution.

If an enterprise application directly consumes these native responses, it becomes tightly coupled to one provider. Testing requires mocking provider-specific DTOs. Observability must be re-implemented for each integration. Cost tracking becomes a brittle exercise. Switching providers — or even upgrading a provider’s SDK — can ripple through the entire application.

Spring AI solves this with ChatResponse, a portable, immutable domain object that normalizes every provider’s output into a consistent model. Together, Prompt and ChatResponse form the symmetrical request-response contract at the heart of the Spring AI architecture. This article examines the design of ChatResponse from an internal framework perspective: the structural decisions, the adapter conversions, the metadata standardization, and the patterns that make it a robust foundation for enterprise AI.

The Problem with Provider-Native Responses
#

Before diving into the solution, let’s appreciate the scope of the problem. The table below compares response schemas across five major providers for a simple multi-turn chat completion.

Aspect	OpenAI	Anthropic	Gemini (Vertex)	Bedrock (Anthropic)	DashScope
Top-level wrapper	`ChatCompletion`	`Message` (content array)	`GenerateContentResponse`	`ConverseResponse`	`GenerationResult`
Choice/candidate	`choices[]` (list of `Choice`)	Single response with `content[]`	`candidates[]` (list of `Candidate`)	`output.message`	`output.choices[]`
Role	`message.role` (assistant)	`role` field on message	`content.role`	`role` string	`message.role`
Content type	String or array (for tools)	Array of `ContentBlock` (text, tool_use)	`parts[]` with `text` or `functionCall`	`content` list of `ContentBlock`	`message.content` string
Token usage	`usage.prompt_tokens`, `completion_tokens`	`usage.input_tokens`, `output_tokens`	`usageMetadata` with `promptTokenCount`	Not in same shape	`usage.input_tokens`, `output_tokens`
Finish reason	`choice.finish_reason` (string)	`stop_reason` (string)	`finishReason` (enum)	`stopReason`	`finish_reason`
Tool calls	`message.tool_calls[]`	`content` block of type `tool_use`	`functionCall` inside `parts[]`	`toolUse` blocks	`message.tool_calls[]`
Metadata	`id`, `model`, `created`, `system_fingerprint`	`id`, `model`, `type`	Safety ratings, citation metadata	`metrics` latency	`request_id`

This heterogeneity makes a mockery of “just use the SDK.” A business service that calls openaiService.chatCompletion() is now architecturally chained to OpenAI. Even if you hide the call behind an interface, the return types of different providers are incompatible. Any attempt to abstract this with a generic Object sacrifices type safety and tooling.

Moreover, token accounting — essential for cost governance — uses entirely different field names and units across providers. Model information, if available at all, appears in different locations. The finish reason, crucial for controlling conversation flow, has different semantics: OpenAI’s tool_calls vs Anthropic’s stop_reason: "tool_use".

Without a normalized response model, the framework cannot hope to offer a unified advisor pipeline, streaming abstraction, or tool-calling loop. Spring AI’s ChatResponse is the architectural cornerstone that solves this fragmentation.

Where ChatResponse Fits in Spring AI Architecture
#

To see ChatResponse in context, let’s position it within the overall flow.

flowchart LR subgraph Application Service["Business Service"] end subgraph Spring AI Core CC["ChatClient"] CM["ChatModel Interface"] Advisors["Advisor Chain"] PR["Prompt"] CR["ChatResponse"] end subgraph Provider Adapters OAI["OpenAI Adapter"] ANT["Anthropic Adapter"] GEM["Gemini Adapter"] BED["Bedrock Adapter"] end subgraph LLM Providers OAISDK[OpenAI API] ANTSDK[Anthropic API] GEMSDK[Gemini API] BEDSDK[Bedrock API] end Service -->|calls| CC CC --> Advisors --> CM CM -->|accepts| PR CM -->|returns| CR CM --> OAI CM --> ANT CM --> GEM CM --> BED OAI --> OAISDK ANT --> ANTSDK GEM --> GEMSDK BED --> BEDSDK OAISDK --> OAI ANTSDK --> ANT GEMSDK --> GEM BEDSDK --> BED OAI -->|converts to| CR ANT -->|converts to| CR GEM -->|converts to| CR BED -->|converts to| CR

ChatModel is the portable interface: ChatResponse call(Prompt prompt).
Provider adapters are concrete implementations (e.g., OpenAiChatModel) that translate the Prompt into a provider request and the provider response back into a ChatResponse.
ChatResponse acts as the normalized output envelope that flows back through the advisor chain and ultimately to the application.

This design decouples the entire framework — including advisors, observability, and testing — from provider specifics.

ChatResponse Deep Dive
#

ChatResponse is the centerpiece of the response model. Let’s examine its structure and the design intent behind it.

Core Responsibilities
#

Encapsulate the model’s output as a collection of Generation objects.
Carry normalized metadata about the response (e.g., token usage, model ID).
Serve as a stable return type for ChatModel.call(Prompt), enabling the framework to build provider-independent features on top of it.
Enable easy testing — you can create a ChatResponse with synthetic Generation data in unit tests without spinning up a real LLM.

Internal Structure
#

In Spring AI’s source, ChatResponse is a final, immutable class:

public class ChatResponse {

    private final List<Generation> generations;
    private final ChatResponseMetadata metadata;

    public ChatResponse(List<Generation> generations) {
        this(generations, ChatResponseMetadata.EMPTY);
    }

    public ChatResponse(List<Generation> generations, ChatResponseMetadata metadata) {
        this.generations = Collections.unmodifiableList(new ArrayList<>(generations));
        this.metadata = metadata;
    }

    public List<Generation> getGenerations() { return generations; }
    public ChatResponseMetadata getMetadata() { return metadata; }

    // Convenience accessor for the first result
    public Generation getResult() {
        if (generations == null || generations.isEmpty()) {
            return null;
        }
        return generations.get(0);
    }
}

Key observations:

List: A response may contain multiple candidate completions (e.g., OpenAI’s n parameter). Spring AI models each candidate as a separate Generation. This maps directly to the LLM concept of “choices” or “candidates.”
ChatResponseMetadata: A separate object for usage, model info, and provider-specific data. Decoupling it from the generations keeps the Generation object focused on content.
Immutability: The list is made unmodifiable. Any transformation (e.g., by an advisor) will produce a new ChatResponse. This ensures thread safety and predictable behavior.

Why a Domain Object and Not a Raw DTO
#

If Spring AI merely passed through the provider’s DTO, every consumer would have to write an instanceof chain to extract the content. Business logic would depend on the provider’s package. Testing a service that calls an LLM would require mocking OpenAiApi.ChatCompletion, and a switch to Anthropic would require rewriting all those mocks.

ChatResponse is a domain object — it represents an AI response in the language of the application, not the provider. It’s conceptually similar to how Spring Data’s Page abstracts away the underlying database’s pagination mechanics. By owning the model, Spring AI can evolve it independently, add convenience methods, and integrate it deeply with Spring’s ecosystem (e.g., AOP for logging, @Controller return type handling in the future).

Generation Object Analysis
#

Generation is a distinct concept inside ChatResponse that warrants separate analysis.

classDiagram class ChatResponse { +List~Generation~ generations +ChatResponseMetadata metadata +Generation getResult() } class Generation { +AssistantMessage output +GenerationMetadata metadata } ChatResponse "1" *-- "0..*" Generation

Why Generation Exists
#

A single LLM call can produce multiple completions. OpenAI’s n=3 gives three choices; Gemini returns multiple candidates. Even when only one is requested, the structural consistency of a list is valuable. By distinguishing Generation from ChatResponse, Spring AI:

Separates the content (the assistant’s message) from the response-level metadata (usage, model). You can iterate over generations without dragging along token counts.
Enables future multi-generation support — e.g., showing the user different creative options — without changing the ChatResponse contract.
Maps cleanly to LLM “candidate” concepts, making provider adapters straightforward.

Internal Structure
#

public class Generation {

    private final AssistantMessage output;
    private final GenerationMetadata metadata;

    public Generation(AssistantMessage output) {
        this(output, GenerationMetadata.NULL);
    }

    public Generation(AssistantMessage output, GenerationMetadata metadata) {
        this.output = output;
        this.metadata = metadata;
    }

    public AssistantMessage getOutput() { return output; }
    public GenerationMetadata getMetadata() { return metadata; }
}

GenerationMetadata carries generation-level attributes like finishReason, role, and sometimes content-filtering flags. By placing them here rather than directly on AssistantMessage, the framework keeps the message object pristine and reusable — an AssistantMessage can be part of a conversation history without carrying session-specific metadata.

The design mirrors the “candidate” abstraction found in Google’s PaLM and Gemini APIs, where each candidate has a content and a safety rating. Spring AI generalizes this into a portable pair.

AssistantMessage Design Analysis
#

The AssistantMessage class embodies the AI’s side of the conversation.

Place in the Message Hierarchy
#

As introduced in the Prompt architecture article, Spring AI models every conversation turn as a Message. AssistantMessage extends AbstractMessage with a MessageType.ASSISTANT. It holds the text content and, optionally, tool calls or other structured outputs.

public class AssistantMessage extends AbstractMessage {

    private final List<ToolCall> toolCalls;

    public AssistantMessage(String content) {
        super(content, MessageType.ASSISTANT);
        this.toolCalls = List.of();
    }

    public AssistantMessage(String content, List<ToolCall> toolCalls) {
        super(content, MessageType.ASSISTANT);
        this.toolCalls = toolCalls != null ? toolCalls : List.of();
    }

    public List<ToolCall> getToolCalls() { return toolCalls; }
}

Why a Typed Message Instead of a String
#

Modeling the AI’s reply as an AssistantMessage instead of a plain string gives:

Role clarity: The MessageType is preserved, so the conversation can be reconstructed for future prompts without manual tagging.
Tool call integration: When the LLM requests a function call, the response contains structured tool call data. Spring AI stores this inside the AssistantMessage rather than as a separate side channel, keeping the conversation model unified.
Metadata attachment: Like all messages, AssistantMessage inherits a properties map for custom metadata — useful for sourcing, safety flags, or observability trace IDs.

By returning AssistantMessage-based Generation objects, the framework ensures that the output of one call can be directly fed into the input of the next call (as message history) with no transformation — a cornerstone of agentic loops.

Metadata Normalization Strategy
#

The greatest challenge in building a multi-provider response model is normalizing heterogeneous metadata. Spring AI introduces ChatResponseMetadata and GenerationMetadata to address this.

ChatResponseMetadata
#

public class ChatResponseMetadata {

    private final String id;
    private final String model;
    private final Usage usage;
    private final Map<String, Object> providerMetadata;

    // constructor, getters...

    public static final ChatResponseMetadata EMPTY = new ChatResponseMetadata(null, null, null, Map.of());
}

id: The provider’s response ID (for logging/debugging).
model: The model that generated the response (e.g., “gpt-4o”).
usage: A normalized Usage object.
providerMetadata: An escape hatch carrying provider-specific fields that are not yet part of the common model. This preserves extensibility without breaking portability.

GenerationMetadata
#

public class GenerationMetadata {

    private final String finishReason;
    private final String role;

    // constructor, getters...

    public static final GenerationMetadata NULL = new GenerationMetadata(null, null);
}

finishReason: Normalized string like “stop”, “length”, “tool_calls”, “content_filter”. Providers use different enums/strings, but Spring AI maps them to a consistent set.
role: Ensures the assistant role is explicit, even if the provider doesn’t include it.

How Normalization Works in Practice
#

Consider token usage. OpenAI returns usage.prompt_tokens and usage.completion_tokens. Anthropic returns usage.input_tokens and usage.output_tokens. Gemini returns usageMetadata.promptTokenCount and usageMetadata.candidatesTokenCount. Each adapter populates a common Usage object:

public class Usage {

    private final Integer promptTokens;
    private final Integer completionTokens;
    private final Integer totalTokens;

    // ...
}

The adapter is responsible for the mapping. For Anthropic:

// Inside AnthropicChatModel adapter
Usage usage = new Usage(
    nativeResponse.getUsage().getInputTokens(),
    nativeResponse.getUsage().getOutputTokens(),
    // totalTokens may be computed if not provided
);

This centralized mapping means that any framework component — observability, cost tracking, rate limiting — works with a single Usage type. Enterprise dashboards that accumulate token usage across providers become trivial.

Usage Tracking Design
#

Token usage is not just a technical metric; in enterprise AI, it’s a cost center and a governance concern. Spring AI’s Usage object is deliberately simple yet powerful.

+-----------------------+
|        Usage          |
+-----------------------+
| - promptTokens: Integer|
| - completionTokens    |
| - totalTokens         |
+-----------------------+

Prompt tokens and completion tokens allow per-interaction cost calculation.
totalTokens may be provided by the provider or computed as a sum.
null values indicate that the provider did not return usage (e.g., streaming mode without usage aggregation).

Observability Integration
#

Because Usage is part of ChatResponseMetadata, Spring Boot’s observability stack (Micrometer, Zipkin) can be leveraged to record token consumption per request. A custom ChatClient advisor can log usage as a metric with zero provider-specific code.

Enterprise Cost Governance
#

For a company using multiple LLM providers, each with different pricing tiers, the normalized Usage lets a central service compute cost: cost = promptTokens * inputPrice + completionTokens * outputPrice. Without a unified Usage model, each integration would need a custom cost calculation, leading to duplication and errors.

ChatResponse Lifecycle
#

Let’s trace the complete lifecycle from the moment a Prompt is handed to a ChatModel to the ChatResponse being consumed by business logic.

sequenceDiagram participant App as Business Service participant CM as ChatModel participant Adapter as Provider Adapter participant SDK as Provider SDK participant LLM as LLM API App->>CM: call(prompt) CM->>Adapter: convertPrompt(prompt) Adapter->>SDK: providerRequest SDK->>LLM: HTTP request LLM-->>SDK: providerResponse (native DTO) SDK-->>Adapter: nativeResponse Adapter->>Adapter: map to ChatResponse Adapter-->>CM: ChatResponse CM-->>App: ChatResponse App->>App: extract Generation, usage, etc.

Application calls chatModel.call(prompt).
ChatModel delegates to its adapter’s internal createRequest(prompt) to convert the prompt.
Provider SDK is invoked, returning a provider-specific response DTO (e.g., ChatCompletion).
Adapter maps the native response to a ChatResponse, normalizing generations and metadata.
ChatModel returns the ChatResponse to the caller, which can now work with framework-standard types.

This pipeline is identical for every provider; only the adapter’s mapping logic differs. The adapter is the single seam where provider knowledge lives.

Provider Adapter Architecture
#

The adapter layer is where the real heavy lifting occurs.

Each adapter performs two inverse transformations:

Prompt → Provider request: Detailed in the Prompt architecture article.
Provider response → ChatResponse: This is our current focus.

For OpenAI, the adapter extracts choices, maps each to a Generation with AssistantMessage, and constructs Usage. For Anthropic, it iterates over content blocks, separating text and tool use, and assembles an AssistantMessage with the appropriate tool calls. The logic is non-trivial but encapsulated behind a single method.

By adopting a consistent ChatModel interface that returns ChatResponse, the rest of Spring AI (advisors, ChatClient) remains oblivious to these differences.

Design Patterns Used
#

The ChatResponse architecture is a showcase of classic design patterns applied to a modern AI domain.

Adapter Pattern
#

Purpose: Convert the interface of a provider’s response DTO into the ChatResponse interface that the framework expects.
Benefits: Each provider adapter isolates the mapping code, enabling provider independence and easy addition of new providers.
Tradeoffs: The adapter must translate every field; if a provider adds a new feature, the adapter must be updated before that feature becomes available in the normalized model.

Composite Pattern
#

Purpose: ChatResponse is a composite of Generation objects, each of which is itself a composite of AssistantMessage and metadata. The whole framework treats a single Generation and a list of them uniformly through the getGenerations() accessor.
Benefits: Clients can loop over candidates or simply call getResult() for the first. Multi-turn agents can append the first generation’s message to conversation history without caring about the count.
Tradeoffs: Adds an extra layer of indirection; for the common case of n=1, developers must still go through response.getResult().getOutput().

Facade Concept
#

Purpose: ChatResponse provides a unified, simplified interface to the complex, varied world of provider responses.
Benefits: Reduces dependency count for the rest of the framework; the complexity of native DTOs is hidden behind a single class.
Tradeoffs: The facade may not expose every provider-specific detail; developers who need low-level access must use the providerMetadata map or drop to the SDK.

Strategy Pattern
#

Purpose: The ChatModel interface defines a strategy for calling an AI model. Each provider’s adapter (e.g., OpenAiChatModel) is a concrete strategy.
Benefits: New providers can be plugged in without altering the advisor pipeline or application code.
Tradeoffs: Strategies must honor the same ChatResponse contract, which may limit innovative features initially.

Dependency Injection
#

Purpose: All adapters are Spring beans, configured via auto-configuration. The ChatModel to be injected can be selected by profile or qualifier.
Benefits: Loose coupling, easy testing with mocks, and Spring’s full lifecycle management (e.g., @PostConstruct for client warm-up).
Tradeoffs: Requires Spring container; not suitable for non-Spring environments without manual wiring.

Source Code Walkthrough
#

Let’s examine simplified but representative snippets from the actual Spring AI codebase.

ChatResponse Construction in an Adapter
#

Inside OpenAiChatModel:

@Override
public ChatResponse call(Prompt prompt) {
    OpenAiApi.ChatCompletionRequest request = createRequest(prompt);
    ResponseEntity<ChatCompletion> responseEntity = 
        this.openAiApi.chatCompletionEntity(request);
    ChatCompletion completion = responseEntity.getBody();

    List<Generation> generations = completion.choices().stream()
        .map(choice -> {
            AssistantMessage assistantMessage = new AssistantMessage(
                choice.message().content(), 
                mapToolCalls(choice.message().toolCalls())
            );
            GenerationMetadata genMeta = new GenerationMetadata(
                choice.finishReason() != null ? choice.finishReason() : "stop",
                "assistant"
            );
            return new Generation(assistantMessage, genMeta);
        })
        .collect(Collectors.toList());

    Usage usage = new Usage(
        completion.usage().promptTokens(),
        completion.usage().completionTokens(),
        completion.usage().totalTokens()
    );
    ChatResponseMetadata meta = new ChatResponseMetadata(
        completion.id(), completion.model(), usage, Map.of("system_fingerprint", completion.systemFingerprint())
    );

    return new ChatResponse(generations, meta);
}

Analysis:

The mapping is centralized in one method. The rest of the class does not know about ChatCompletion.
Tool calls are mapped to ToolCall objects inside AssistantMessage.
Provider-specific metadata (system_fingerprint) is stored in the escape hatch.
Immutability is respected: all objects are created fresh.

Usage Normalization
#

The Usage object itself is trivial:

public class Usage {
    private final Integer promptTokens;
    private final Integer completionTokens;
    private final Integer totalTokens;

    public Usage(Integer promptTokens, Integer completionTokens, Integer totalTokens) {
        this.promptTokens = promptTokens;
        this.completionTokens = completionTokens;
        this.totalTokens = totalTokens;
    }
    // getters...
}

Analysis:

All fields are nullable Integer to handle incomplete data gracefully.
Cost computation can be done externally; the framework does not assume pricing models.

AssistantMessage and Tool Calls
#

public class AssistantMessage extends AbstractMessage {
    private final List<ToolCall> toolCalls;
    // ...
}

public class ToolCall {
    private final String id;
    private final String type;
    private final String functionName;
    private final String arguments; // JSON string
    // getters...
}

Analysis:

Tool calls are stored as a list of value objects, preserving the structure required for the next interaction (sending tool results).
arguments is kept as a raw JSON string to avoid pre-parsing into a specific schema; the framework defers parsing to the application or a future structured-output feature.

ChatResponse and Streaming
#

Streaming adds complexity because the response arrives as a series of partial chunks rather than a single complete object. Spring AI addresses this with Flux<ChatResponse>.

Streaming Challenges
#

Partial content: Each chunk may contain a fragment of the message content, not a complete AssistantMessage.
Metadata timing: Usage statistics may only appear in the final chunk.
Tool calls: A tool call may be streamed as a sequence of partial JSON fragments.

Spring AI’s design meets these challenges by returning each chunk as a ChatResponse containing a Generation with the delta applied to the content. The final ChatResponse in the stream carries the full metadata and final content. This allows consumers to process incremental updates (e.g., displaying tokens as they arrive) while still obtaining a fully normalized object at the end.

The adapter’s streaming method maps each provider’s streaming chunk (e.g., ChatCompletionChunk for OpenAI) into a ChatResponse. Because the normalized model supports partial data (empty usage, partial content), the same ChatResponse class works for both synchronous and streaming scenarios — a clean reuse of the domain object.

ChatResponse and Tool Calling
#

Tool calling is where the response model proves its flexibility.

When the LLM decides to invoke a function, the provider returns a tool call instead of plain text. Spring AI captures this inside AssistantMessage.toolCalls. The finishReason is set to "tool_calls". The application can then:

Detect the tool call via generation.getMetadata().getFinishReason().
Extract the function name and arguments from ToolCall.
Execute the function locally.
Create a new Prompt containing the original messages plus a ToolMessage with the result.

The normalized ToolCall model bridges providers: OpenAI returns tool_calls[] with function.name and function.arguments; Anthropic returns tool_use content blocks with name and input. The adapter maps both to ToolCall, making the tool-execution logic portable.

This design is essential for future agent workflows, where the assistant may chain multiple tool calls. The ChatResponse acts as the intermediate representation in that loop.

Enterprise Benefits
#

The architectural decisions around ChatResponse translate directly into business capabilities.

Vendor Independence: A single REST endpoint can be backed by different providers based on configuration. The ChatResponse abstraction means the controller never imports provider SDK classes.
Maintainability: Changing an LLM’s response structure (e.g., upgrading to a new OpenAI API version) only requires updating one adapter. The rest of the application remains unchanged.
Testability: Business services can be unit-tested with new ChatResponse(List.of(new Generation(new AssistantMessage("Mock reply")))). No wiremock, no API keys.
Observability: A ChatClient advisor can log response.getMetadata().getUsage() for every call, using a single code path across all providers.
Cost Governance: A centralized interceptor can track token consumption and enforce per-user or per-department budgets, aggregating usage from the normalized Usage object.
Multi-Provider Strategies: A fallback strategy can switch to a different provider if the first returns a finishReason indicating failure, using the same ChatResponse-based logic.

Design Tradeoffs
#

No design is free of tradeoffs, and ChatResponse is no exception.

Additional Abstraction Layer: Every response is wrapped, which adds a small runtime overhead and another class for developers to learn. However, the overhead is negligible compared to network latency, and the clarity gained far outweighs the learning curve for teams building serious AI features.

Feature Lag Risk: When a provider introduces a novel feature (e.g., Anthropic’s “extended thinking”), the common ChatResponse model cannot immediately expose it. The framework must either extend the normalized model (with careful versioning) or rely on the providerMetadata escape hatch. Developers who need immediate access may have to access the raw provider DTO, breaking the abstraction temporarily.

Lowest Common Denominator Problems: The normalized finish reasons, usage fields, and metadata are a subset of all possible provider features. Some richness is lost. Spring AI mitigates this by keeping ChatOptions and providerMetadata open for extension, but the core ChatResponse remains deliberately minimal.

Mapping Complexity: The adapter code can become intricate, especially when handling streaming and tool calls. This complexity is encapsulated, but it must be written and maintained for each supported provider.

Strengths of the approach far outweigh the weaknesses in an enterprise context, where stability, testability, and portability are top priorities.

Comparison with Other Frameworks
#

Framework / Approach	Response Model	Provider Independence	Metadata Normalization	Tool Call Support	Streaming
Spring AI	`ChatResponse` with `List<Generation>`, `ChatResponseMetadata`	Complete; adapter pattern	Unified `Usage`, finish reasons, model ID	Typed `ToolCall` inside `AssistantMessage`	`Flux<ChatResponse>` using same model
LangChain4j	`AiMessage` returned from `ChatLanguageModel.generate()`	Yes, through `ChatLanguageModel` interface	`TokenUsage` with prompt/completion tokens	`ToolExecutionRequest` list on `AiMessage`	Streaming via `StreamingChatLanguageModel` with `StreamingResponseHandler`
Direct OpenAI SDK	`ChatCompletion` DTO	None (OpenAI only)	Raw OpenAI `Usage`	`ChatCompletionMessage.getToolCalls()`	`ChatCompletionChunk` from streaming client
Custom Enterprise Layers	Often custom `AiResponse` wrapping provider-specific DTO	Partial; often hard-coded to a vendor	Ad hoc mapping in each integration	Varies widely; typically hard-coded	Inconsistent

Spring AI’s ChatResponse is distinguished by its tight integration into the advisor pipeline and its explicit separation of Generation and metadata. LangChain4j’s approach is similar but uses AiMessage directly as the top-level return, and streaming is handled through a different interface. The Spring AI model, with its immutable ChatResponse envelope, aligns better with reactive paradigms and Spring’s functional style.

Lessons for Framework Designers
#

The design of ChatResponse offers several universal principles for anyone building an AI integration layer.

Stable Response Contracts Are Essential. A clear, versioned return type (ChatResponse) that doesn’t change often allows the ecosystem (advisors, clients, serialization) to evolve independently of providers.
Normalize External Systems at the Boundary. The adapter layer should be the sole place where provider-specific DTOs are mapped. This keeps the core domain pure and testable.
Separate Domain Models from SDK Models. ChatResponse is not a dressed-up ChatCompletion; it’s a purpose-built domain object. This prevents leaking provider abstractions into the application.
Design for Future Providers. The providerMetadata map is a low-cost investment that allows new provider features to be accessed without model changes. This reduces the pressure to constantly rev the normalized API.
Preserve Extensibility Through Composition. Using Generation as an inner object rather than a monolithic response allows the framework to handle multi-candidate scenarios and attach different metadata at different levels.

Future Evolution
#

The ChatResponse model is built to evolve alongside the AI landscape.

Structured Outputs: As JSON mode and function-calling schemas become common, Generation may carry a parsed object in addition to the raw text, offering type-safe access to structured responses.
Tool Calling: The ToolCall model will likely expand to support nested tool use and streaming tool calls, enabling more sophisticated agent behaviors.
MCP (Model Context Protocol): Spring AI’s response model can map to MCP’s response messages, maintaining a consistent abstraction for context exchange.
Agent Architectures: In a multi-step agent, a sequence of ChatResponse objects forms a trace. The immutable, history-friendly design makes this easy to persist and replay.
Advanced Observability: Usage may be extended with provider-specific cost data or latency metrics, still accessed through the same ChatResponseMetadata.

Because ChatResponse is a first-class domain object, these extensions can be added without breaking existing code — a testament to thoughtful upfront design.

FAQ
#

1. Why doesn’t Spring AI return the provider’s native DTO directly?
Returning native DTOs would tightly couple the entire application to a specific provider’s SDK. ChatResponse provides a stable, portable contract that lets you switch providers without changing business logic.

2. Why is Generation separated from ChatResponse?
Generation represents a single candidate completion, while ChatResponse is the whole response envelope (possibly multiple candidates plus metadata). This separation allows the framework to handle multi-choice scenarios cleanly and attach metadata at the right level.

3. How does ChatResponse support future providers?
The providerMetadata map in ChatResponseMetadata allows adapters to pass through provider-specific fields without requiring changes to the common model. Over time, popular fields can be promoted to first-class properties.

4. How is token usage standardized across providers?
Each adapter maps the provider’s usage fields into a common Usage object with promptTokens, completionTokens, and totalTokens. This normalizes the different naming conventions and units.

5. Can ChatResponse represent tool calls?
Yes. AssistantMessage holds a list of ToolCall objects. The adapter maps provider-specific tool call formats (e.g., OpenAI’s tool_calls[], Anthropic’s tool_use blocks) into this universal structure.

6. How does ChatResponse handle streaming?
In streaming mode, each partial chunk is wrapped in a ChatResponse with a Generation containing delta content. The final chunk carries the aggregated metadata. This reuses the same object model, providing a consistent programming model for both sync and reactive streams.

7. Why are finishReason and role stored in GenerationMetadata instead of AssistantMessage?
AssistantMessage is designed to be a reusable message artifact (e.g., in conversation history). finishReason is a generation-specific detail that doesn’t belong to the message itself. This separation keeps the message model clean.

8. Is ChatResponse immutable?
Yes. The generations list is unmodifiable, and all component objects are effectively immutable. This ensures thread safety and prevents unintended side effects.

9. How does Spring AI’s response model compare to LangChain4j’s?
Both provide portability, but Spring AI’s ChatResponse uses a composable structure with explicit Generation and ChatResponseMetadata. LangChain4j tends to return AiMessage directly, with usage available through a separate callback. Spring AI’s envelope design aligns better with the advisor chain pattern.

10. Can I access the raw provider response if needed?
Yes, through the providerMetadata map or by using the provider-specific ChatModel implementation directly. However, doing so breaks portability and is discouraged for general use.

11. How does ChatResponse aid observability?
Because Usage and model information are part of the standardized metadata, observability tools (Micrometer, custom advisors) can capture token counts, model IDs, and latency in a provider-independent way.

12. What happens if a provider doesn’t supply token usage?
The Usage fields will be null. The framework gracefully handles null values. Cost calculation logic should check for nulls before computing charges.

Conclusion
#

ChatResponse is not merely a wrapper around provider output; it is a carefully crafted domain model that shields enterprise applications from the chaos of disparate LLM response formats. By defining a stable, immutable, and extensible contract, Spring AI enables vendor independence, simplifies testing, and provides a foundation for advanced capabilities like streaming, tool calling, and agentic workflows.

The design choices — the separation of Generation from ChatResponse, the normalization of Usage and metadata, and the adapter-driven conversion — reflect deep experience with enterprise integration patterns. Framework designers and architects building on top of LLMs would do well to study this model. When your AI response is a first-class domain object, your entire system becomes more resilient, more observable, and ready for the next wave of AI evolution.

Introduction #

The Problem with Provider-Native Responses #

Where ChatResponse Fits in Spring AI Architecture #

ChatResponse Deep Dive #

Core Responsibilities #

Internal Structure #

Why a Domain Object and Not a Raw DTO #

Generation Object Analysis #

Why Generation Exists #

Internal Structure #

AssistantMessage Design Analysis #

Place in the Message Hierarchy #

Why a Typed Message Instead of a String #

Metadata Normalization Strategy #

ChatResponseMetadata #

GenerationMetadata #

How Normalization Works in Practice #

Usage Tracking Design #

Observability Integration #

Enterprise Cost Governance #

ChatResponse Lifecycle #

Provider Adapter Architecture #

Design Patterns Used #

Adapter Pattern #

Composite Pattern #

Facade Concept #

Strategy Pattern #

Dependency Injection #

Source Code Walkthrough #

ChatResponse Construction in an Adapter #

Usage Normalization #

AssistantMessage and Tool Calls #

ChatResponse and Streaming #

Streaming Challenges #

ChatResponse and Tool Calling #

Enterprise Benefits #

Design Tradeoffs #

Comparison with Other Frameworks #

Lessons for Framework Designers #

Future Evolution #

FAQ #

Conclusion #

Related Articles

Accelerate Your Cloud Certification.

Introduction
#

The Problem with Provider-Native Responses
#

Where ChatResponse Fits in Spring AI Architecture
#

ChatResponse Deep Dive
#

Core Responsibilities
#

Internal Structure
#

Why a Domain Object and Not a Raw DTO
#

Generation Object Analysis
#

Why Generation Exists
#

Internal Structure
#

AssistantMessage Design Analysis
#

Place in the Message Hierarchy
#

Why a Typed Message Instead of a String
#

Metadata Normalization Strategy
#

ChatResponseMetadata
#

GenerationMetadata
#

How Normalization Works in Practice
#

Usage Tracking Design
#

Observability Integration
#

Enterprise Cost Governance
#

ChatResponse Lifecycle
#

Provider Adapter Architecture
#

Design Patterns Used
#

Adapter Pattern
#

Composite Pattern
#

Facade Concept
#

Strategy Pattern
#

Dependency Injection
#

Source Code Walkthrough
#

ChatResponse Construction in an Adapter
#

Usage Normalization
#

AssistantMessage and Tool Calls
#

ChatResponse and Streaming
#

Streaming Challenges
#

ChatResponse and Tool Calling
#

Enterprise Benefits
#

Design Tradeoffs
#

Comparison with Other Frameworks
#

Lessons for Framework Designers
#

Future Evolution
#

FAQ
#

Conclusion
#