Spring Boot AI

In this example, inspired by Building Agents with AWS: Complete Tutorial, we will build a simple AI agent application using Spring AI, highlighting key features like:

Chat Client API and its Advisors
Model Context Protocol (MCP)
Retrieval Augmented Generation (RAG)
Testing with AI Model Evaluation 🤩

This project is indexed and certified by MCP Review

The original example uses AWS Bedrock, but one of the great things about Spring AI is that with just a few config tweaks, the same code works with any other supported model. This project supports three profiles: Ollama (local), Google Gemini and AWS Bedrock.

The application features an AI agent that helps users book accommodations in tourist destinations.

Through MCP, the agent can use the following tools:

Weather Tool: Retrieves weather information for a specific city and date.
Booking Tool: Books accommodations in a city for a specific date.

The Weather tool will be implemented locally using MCP, while the Booking tool will be provided by a remote MCP server. The current date is provided via the system prompt. Additional information about cities will be retrieved from a vector store using RAG.

Diagram

Implementation
- MCP Server
- Chat Server
  - Components
Configuration
Test
- Test MCP Server
- Test Chat Server
Run
How to use other AI models
Documentation

Implementation

MCP Server

As is often the case with Spring Boot, implementing the MCP Server is pretty straightforward. Following the MCP Server Boot Starter guide, you just need to:

Add the spring-ai-starter-mcp-server-webflux or spring-ai-starter-mcp-server-webmvc dependency
Create an instance and annotate it with @Tool and @ToolParam:

@Service // or @Bean / @Component
class BookingTool(private val bookingService: BookingService) {
  @Tool(
    description = "make a reservation for accommodation for a given city and date",
  )
  fun book(
    @ToolParam(description = "the city to make the reservation for")
    city: String,
    @ToolParam(description = "the check-in date, when the reservation begins")
    checkinDate: LocalDate,
    @ToolParam(description = "the check-out date, when the reservation ends")
    checkoutDate: LocalDate
  ): String = bookingService.book(city, checkinDate, checkoutDate) // Delegate to a service
}

@Configuration
class BookingToolConfiguration {
  @Bean
  fun bookingToolCallbackProvider(bookingTool: BookingTool) = 
    MethodToolCallbackProvider.builder()
      .toolObjects(bookingTool)
      .build()
}

Chat Server

The Chat Server is a Spring Boot application built with the following dependencies:

spring-boot-starter-web or -webflux - to expose a REST API for the chat interface
spring-ai-starter-mcp-client - to use MCP
spring-ai-starter-vector-store-pgvector and spring-ai-advisors-vector-store - to enable RAG with PGVector
spring-ai-starter-model-ollama - to use Ollama models
spring-ai-starter-model-google-genai and spring-ai-starter-model-google-genai-embedding - to use Google Gemini models
spring-ai-starter-model-bedrock and spring-ai-starter-model-bedrock-converse - to use AWS Bedrock models

Components

MCP Tools
- Weather Tool - a local MCP tool that queries a WeatherService for the weather in a given city on a given date
- Booking Tool - a remote MCP tool that connects to the Booking MCP Server to reserve accommodations
Chat
- Chat Client - a Spring AI ChatClient configured with:
  - A system prompt to define the AI agent’s role and the current date
  - The AI model autoconfigured by Spring Boot via the active profile
  - The above MCP tools as part of the AI agent’s toolset
- Chat Service - wraps the Chat Client and adds three advisors:
  - QuestionAnswerAdvisor - fetches context from a vector store and augments the user input (RAG)
  - PromptChatMemoryAdvisor - adds conversation history to the user input (chat memory)
  - SimpleLoggerAdvisor - logs the chat history to the console (for debugging)
- Chat Controller - exposes a simple REST POST endpoint that takes user input, calls the Chat Service, and returns the AI agent’s response
Vector Store Initializer - loads some sample data into the vector store at startup

Let's implement this step by step ...

Weather Tool

Here's how the Weather Tool is implemented:

Create an instance and annotate it with @Tool and @ToolParam:

@Service // or @Bean / @Component
class WeatherTool(private val weatherService: WeatherService) {
  @Tool(description = "get the weather for a given city and date")
  fun getWeather(
    @ToolParam(description = "the city to get the weather for")
    city: String,
    @ToolParam(description = "the date to get the weather for") 
    date: LocalDate
  ): String = weatherService.getWeather(city, date) // Delegate to a service
}

@Configuration
class WeatherToolConfiguration {
  @Bean
  fun weatherToolCallbackProvider(weatherTool: WeatherTool) =
    MethodToolCallbackProvider.builder()
      .toolObjects(weatherTool)
      .build()
}

Booking Tool

To set up the Booking Tool as a remote MCP tool, we just need to configure the MCP client SSE connection in application.properties:

spring.ai.mcp.client.toolcallback.enabled=true
spring.ai.mcp.client.sse.connections.booking-tool.url=http://localhost:8081

You can find all the alternative configurations in MCP Client Boot Starter documentation.

Chat Client

We create the Chat Client using Spring AI's ChatClient.Builder, which is already autoconfigured via spring.ai configuration properties (we'll talk at that later in Configuration), and initialize it with a custom system prompt and the available MCP tools:

@Configuration
class ChatClientConfiguration {
  @Bean
  fun chatClient(
    builder: ChatClient.Builder,
    toolCallbackProviders: List<ToolCallbackProvider>
  ): ChatClient {
    return chatClientBuilder(builder, toolCallbackProviders).build()
  }

  private fun chatClientBuilder(
    builder: ChatClient.Builder,
    toolCallbackProviders: List<ToolCallbackProvider>
  ): ChatClient.Builder {
    val system = """
      You are an AI powered assistant to help people book accommodation in touristic cities around the world.
      If there is no information, then return a polite response suggesting you don't know.
      If the response involves a timestamp, be sure to convert it to something human-readable.
      Do not include any indication of what you're thinking.
      Use the tools available to you to answer the questions.
      Just give the answer.
      When booking accommodation for a weekend, assume check-in on Saturday and check-out on Monday.
      Current date: {currentDate}
      """.trimIndent()
    return builder
      .defaultSystem(system)
      .defaultToolCallbacks(*toolCallbackProviders.toTypedArray())
  }
}

Chat Service

The Chat Service exposes a single chat method that takes a chat ID and a user question. It calls the Chat Client with the user question along with a set of advisors to enrich the interaction:

QuestionAnswerAdvisor - retrieves relevant context from a vector store and injects it to the context (RAG)
PromptChatMemoryAdvisor - retrieves or creates an InMemoryChatMemoryRepository for the given chat ID and adds it to the context
SimpleLoggerAdvisor - logs internal advisor traces to the console (if logging.level.org.springframework.ai.chat.client.advisor is set to DEBUG)

Additionally, the question and answer are logged to the console.

Here’s the implementation:

@Service // or @Bean / @Component
class ChatService(
  vectorStore: VectorStore,
  private val clock: Clock,
  private val chatClient: ChatClient
) {
  private val logger = LoggerFactory.getLogger(ChatService::class.java)
  private val questionAnswerAdvisor = QuestionAnswerAdvisor.builder(vectorStore).build()
  private val simpleLoggerAdvisor = SimpleLoggerAdvisor()
  private val chatMemory = ConcurrentHashMap<String, PromptChatMemoryAdvisor>()

  fun chat(chatId: String, question: String): String {
    val chatMemoryAdvisor = chatMemory.computeIfAbsent(chatId) {
      PromptChatMemoryAdvisor.builder(
        MessageWindowChatMemory.builder()
          .chatMemoryRepository(InMemoryChatMemoryRepository())
          .build()
      ).build()
    }
    return chatClient
      .prompt()
      .system { it.param("currentDate", LocalDate.now(clock)) }
      .user(question)
      .advisors(questionAnswerAdvisor, chatMemoryAdvisor, simpleLoggerAdvisor)
      .call()
      .content().apply {
        logger.info("Chat #$chatId question: $question")
        logger.info("Chat #$chatId answer: $this")
      }!!
  }
}

Chat Controller

The Chat Controller exposes a simple REST POST endpoint that takes user input, calls the Chat Service, and returns the AI agent’s response:

@RestController
class ChatController(private val chatService: ChatService) {
  @PostMapping("/{chatId}/chat")
  fun chat(
    @PathVariable chatId: String, 
    @RequestParam question: String
  ): String? {
    return chatService.chat(chatId, question)
  }
}

Vector Store Initializer

It's as simple as using Spring AI’s autoconfigured VectorStore and adding documents to it. This automatically invokes the embedding model to generate embeddings and store them in the vector store:

@Bean
fun vectorStoreInitializer(vectorStore: VectorStore) = 
  ApplicationRunner {
    // TODO check if the vector store is empty ...
    // TODO load cities from a JSON file or any other source ...
    cities.forEach { city ->
      val document = Document(
        "name: ${city.name} " +
        "country: ${city.country} " +
        "description: ${city.description}"
      )
      vectorStore.add(listOf(document))
    }
  }

You can find the full version of vectorStoreInitializer in ChatServerApplication.kt.

Configuration

We use application.properties instead of application.yml because YAML cannot have both spring.ai.model.embedding (scalar, used by Bedrock and Ollama) and spring.ai.model.embedding.text (nested, used by Gemini) at the same time. Maybe some day this will be solved, but anyway this is a PoC that supports multiple models — in a production application you'd likely use only one, so this wouldn't be an issue.

In this file, we define global configuration values:

Disable all model auto-configurations by default.
Configure the datasource to connect to a PostgreSQL database with PGVector support.
Set the server port to 8080.
Configure the URL of the remote Booking Tool MCP server.
Set the logging level for chat advisor debug traces.

spring.application.name=chat-server

spring.datasource.url=jdbc:postgresql://localhost:5432/postgres
spring.datasource.username=postgres
spring.datasource.password=password
spring.datasource.driver-class-name=org.postgresql.Driver

spring.ai.model.chat=none
spring.ai.model.embedding=none
spring.ai.model.embedding.text=none

spring.ai.mcp.client.toolcallback.enabled=true
spring.ai.mcp.client.sse.connections.booking-tool.url=http://localhost:8081

server.port=8080

logging.level.org.springframework.ai.chat.client.advisor=INFO

The AI model is configured via Spring profiles. Each profile sets the chat model, embedding model, and vector store dimensions. The active profile must be specified at runtime using SPRING_PROFILES_ACTIVE.

Ollama profile

In application-ollama.yml, we configure Spring AI to use Ollama models:

Set the base URL for the Ollama server to http://localhost:11434.
Set the chat model to llama3.1:8b (must be a tools-enabled model).
Set the embedding model to nomic-embed-text.
Use pull-model-strategy: when_missing to only pull models if they are not available locally.
Configure PGVector as the vector store with 768 dimensions (matching the embedding model size).

spring:
  ai:
    model:
      embedding: "ollama"
      chat: "ollama"
    ollama:
      base-url: "http://localhost:11434"
      init:
        pull-model-strategy: "when_missing"
      chat:
        options:
          model: "llama3.1:8b"
      embedding:
        options:
          model: "nomic-embed-text"
    vectorstore:
      pgvector:
        table-name: "vector_store_ollama"
        dimensions: 768
        initialize-schema: true

Gemini profile

In application-gemini.yml, we configure Spring AI to use Google Gemini models:

Set the Google API key from the GOOGLE_API_KEY environment variable.
Set the chat model to gemini-2.5-flash.
Set the embedding model to gemini-embedding-001 with 768 dimensions.
Configure PGVector as the vector store with 768 dimensions.

spring:
  ai:
    model:
      chat: "google-genai"
      embedding:
        text: "google-genai"
    google:
      genai:
        api-key: "${GOOGLE_API_KEY}"
        chat:
          options:
            model: "gemini-2.5-flash"
            temperature: 0.7
        embedding:
          api-key: "${GOOGLE_API_KEY}"
          text:
            options:
              model: "gemini-embedding-001"
              dimensions: 768
    vectorstore:
      pgvector:
        table-name: "vector_store_gemini"
        dimensions: 768
        initialize-schema: true

Bedrock profile

In application-bedrock.yml, we configure Spring AI to use AWS Bedrock models:

Set the AWS credentials and region from environment variables.
Set the chat model using Bedrock Converse API.
Set the embedding model using Bedrock Cohere.
Configure PGVector as the vector store with 1024 dimensions (matching the Cohere embedding model size).

spring:
  ai:
    model:
      embedding: "bedrock-cohere"
      chat: "bedrock-converse"
    bedrock:
      aws:
        access-key: "${AWS_ACCESS_KEY_ID}"
        secret-key: "${AWS_SECRET_ACCESS_KEY}"
        region: "${AWS_REGION:eu-central-1}"
      converse:
        chat:
          options:
            model: "${AWS_BEDROCK_CHAT_MODEL}"
            max-tokens: 2048
      cohere:
        embedding:
          model: "${AWS_BEDROCK_EMBEDDING_MODEL}"
    vectorstore:
      pgvector:
        table-name: "vector_store_bedrock"
        dimensions: 1024
        initialize-schema: true

Test

Test MCP Server

To test the MCP Server, we will use a McpClient to call the book method of the Booking Tool, mocking the downstream BookingService:

MCP Server Test

See the simplified test implementation below. For the complete implementation, including a test that verifies the list of available tools, refer to McpServerApplicationTest.kt.

@SpringBootTest(webEnvironment = RANDOM_PORT)
class McpServerApplicationTest {

  // 1. Inject the server port (it is random)
  @LocalServerPort
  val port: Int = 0

  // 2. Mock the BookingService instance
  @MockitoBean
  lateinit var bookingService: BookingService

  @Test
  fun `should book`() {
    // 3. Create a McpClient connected to the server
    val client = McpClient.sync(
      HttpClientSseClientTransport("http://localhost:$port")
    ).build()
    client.initialize()
    client.ping()

    // 4. Mock the bookingService using argument captors
    val bookResult = "Booking is done!"
    val cityCaptor = argumentCaptor<String>()
    val checkinDateCaptor = argumentCaptor<LocalDate>()
    val checkoutDateCaptor = argumentCaptor<LocalDate>()
    doReturn(bookResult)
      .whenever(bookingService)
      .book(
        cityCaptor.capture(),
        checkinDateCaptor.capture(),
        checkoutDateCaptor.capture()
      )

    // 5. Call the tool
    val city = "Barcelona"
    val checkinDate = LocalDate.parse("2025-04-15")
    val checkoutDate = LocalDate.parse("2025-04-18")
    val result = client.callTool(CallToolRequest(
      "book",
      mapOf(
        "city" to city,
        "checkinDate" to checkinDate.toEpochDay(),
        "checkoutDate" to checkoutDate.toEpochDay()
      )
    ))

    // 6. Verify the result
    assertThat(result.isError).isFalse()
    assertThat(result.content).singleElement()
      .isInstanceOfSatisfying(TextContent::class.java) {
        // TODO why is text double quoted?
        assertThat(it.text).isEqualTo("\"$bookResult\"")
      }

    // 7. Verify that the bookingService was called with
    // the correct parameters
    assertThat(cityCaptor.allValues).singleElement()
      .isEqualTo(city)
    assertThat(checkinDateCaptor.allValues).singleElement()
      .isEqualTo(checkinDate)
    assertThat(checkoutDateCaptor.allValues).singleElement()
      .isEqualTo(checkoutDate)

    // 8. Close the client
    client.close()
  }
}

To run the MCP Server tests:

cd mcp-server
./gradlew test

Test Chat Server

To test the Chat Server, we will:

Override the default profile to gemini and disable the MCP client in application-test.yml (overridable via SPRING_PROFILES_ACTIVE).
Replace the remote Booking Tool by a local Booking Test Tool with the same signature.
- Create the local Book Testing Tool in BookingTestToolConfiguration.kt.
Mock the downstream services Weather Service and Booking Service with MockitoBean.
Create a fixed Clock to control the current date in ClockTestToolConfiguration.kt
- Declare it as @Primary @Bean to override the default Clock bean.
Start Docker Compose with PGVector using Testcontainers

Chat Server Test

You might’ve noticed that the test doesn’t actually check the MCP client’s SSE connection as it is disabled. I tried spinning up an McpServer using McpServerAutoConfiguration, and it almost worked. The problem? The client tries to connect before the server is up, which causes the whole application to fail on startup. Maybe is just an ordering issue, and hopefully something that can be fixed in the future 🤞

Now for the interesting part, how do we test the AI agent’s response? This is where Evaluation Testing comes in:

One method to evaluate the response is to use the AI model itself for evaluation. Select the best AI model for the evaluation, which may not be the same model used to generate the response.

This aligns with the evaluation techniques described in Martin Fowler’s Evals GenAI pattern:

Self-evaluation: The LLM evaluates its own response, but this can reinforce its own mistakes or biases.
LLM as a judge: Another model scores the output, reducing bias by introducing a second opinion.
Human evaluation: People manually review responses to ensure the tone and intent feel right.

To keep things simple, we’ll go with self-evaluation 🤓

Each test will follow this structure:

@Test
fun `should do something`() {
  // 1. Mock downstream service(s)
  // Optionally use argument captors depending on how you plan
  // to verify parameters in step 5
  // Example for BookingService:
  doReturn("Your booking is done!")
    .whenever(bookingTestService).book(any(), any(), any())

  // 2. Call the chat service
  val chatId = UUID.randomUUID().toString()
  val chatResponse = chatService.chat(
    chatId, 
    "Can you book accommodation for Barcelona from 2025-04-15 to 2025-04-18?"
  )
  
  // 3. Evaluate the response using the AI model
  val evaluationResult = TestEvaluator(chatClientBuilder) { evaluationRequest, userSpec ->
    userSpec.text(
      """
      Your task is to evaluate if the answer given by an AI agent to a human user matches the claim.
      Return YES if the answer matches the claim and NO if it does not.
      After returning YES or NO, explain why.
      Assume that today is ${LocalDate.now(clock)}.
      Answer: {answer}
      Claim: {claim}
      """.trimIndent()
    )
      .param("answer", evaluationRequest.responseContent)
      .param("claim", evaluationRequest.userText)
  }.evaluate(EvaluationRequest(
    "Accommodation has been booked Barcelona from 2025-04-15 to 2025-04-18", 
    chatResponse
  ))
  
  // 4. Assert the evaluation result is successful, show feedback if not
  assertThat(evaluationResult.isPass).isTrue
    .withFailMessage { evaluationResult.feedback }
  
  // 5. If applicable, verify the parameters passed to the service
  // You can verify with argument captors or use the `verify` method as in the example below:
  verify(bookingTestService).book(
    eq("Barcelona"), 
    eq(LocalDate.parse("2025-04-15")), 
    eq(LocalDate.parse("2025-04-18"))
  )
}

See the full test implementation in ChatServerApplicationTest.kt.

Each evaluation uses a custom prompt tailored to the specific response being tested, and as you experiment, you'll notice some surprisingly quirky behavior. That’s why I ended up creating a custom TestEvaluator, based on Spring AI’s RelevancyEvaluator and FactCheckingEvaluator, which may not yet offer the level of customization you might want.

I had to adjust each prompt after running into odd results. For example, the evaluation model assuming it was still 2023 and refusing to believe the AI agent could predict weather for 2025. Or it mistaking "you" in the answer as referring to itself instead of the user. The weirdest? One evaluation just answered “NO” but the explanation said, “well, maybe I should have said YES” 🤣

For a production system, you'd definitely need a lot of prompt tuning and testing to get things right, for both the system and the evaluator, I suppose that’s part of the "fun" when working with GenAI.

By default, tests use the Gemini profile both locally and in CI (requires GOOGLE_API_KEY).

cd chat-server
./gradlew test

To test locally with Ollama (requires Ollama to be up, results may be flaky without adequate GPU hardware):

cd chat-server
SPRING_PROFILES_ACTIVE=ollama ./gradlew test

To test locally with Bedrock (requires AWS credentials):

cd chat-server
SPRING_PROFILES_ACTIVE=bedrock ./gradlew test

Run

You can configure the application using environment variables or a system.properties file in the root directory. This file is ignored by Git and is loaded by both ./gradlew bootRun and tests.

Example system.properties:

# AWS Bedrock
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=eu-central-1
AWS_BEDROCK_CHAT_MODEL=...
AWS_BEDROCK_EMBEDDING_MODEL=...

# Google Gemini
GOOGLE_API_KEY=...

Running the vector database

All profiles require PGVector for the RAG vector database:

cd chat-server
docker compose -f docker-compose-vectordb.yml up -d

To stop it:

cd chat-server
docker compose -f docker-compose-vectordb.yml down

To stop it and remove all volumes (removes all vector database data):

cd chat-server
docker compose -f docker-compose-vectordb.yml down -v

Running Ollama locally

If you want to use local LLMs, you can run Ollama either via Docker Compose or as a native application (more info at ollama.com).

cd chat-server
docker compose -f docker-compose-ollama.yml up -d

To stop it:

cd chat-server
docker compose -f docker-compose-ollama.yml down

Running the application

Start MCP server

cd mcp-server
./gradlew bootRun

Start Chat Server with one of the following profiles:

Ollama (requires Ollama and vector database):

cd chat-server
SPRING_PROFILES_ACTIVE=ollama ./gradlew bootRun

Gemini (requires vector database and GOOGLE_API_KEY):

cd chat-server
SPRING_PROFILES_ACTIVE=gemini ./gradlew bootRun

Bedrock (requires vector database and AWS credentials):

cd chat-server
SPRING_PROFILES_ACTIVE=bedrock ./gradlew bootRun

Start chatting

A simple chat UI is available at http://localhost:8080/chat.html (disclaimer: this UI was entirely AI-generated as frontend is not the goal of this PoC).

You can also explore the API endpoints at http://localhost:8080/swagger-ui.html.

How to use other AI models

To use any of the other AI models supported by Spring AI, follow these steps:

Add the required dependencies
Configure the model in its own application-<model>.yml file
Activate the profile using SPRING_PROFILES_ACTIVE=<model>

Documentation

Spring AI documentation
Martin Fowler's GenAI patterns
Inspired by sample project spring-ai-java-bedrock-mcp-rag
Awesome Spring AI

Happy GenAI coding! 💙

spring-boot-ai