Spring AI With ChromaDB Vector Store

Last updated: November 4, 2025

Written by: Hardik Singh Behl

Reviewed by: Liam Williams

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

1. Overview

With traditional databases, we typically rely on exact keyword or basic pattern matching to implement our search functionality. While sufficient for simple applications, this approach fails to fully understand the meaning and context behind natural language queries.

Vector stores address this limitation by storing data as numeric vectors that capture their meaning. Similar words end up close to each other, which allows for semantic search, where the relevant results are returned even if they don’t contain the exact keywords used in the query.

In this tutorial, we’ll explore how to integrate ChromaDB, an open-source vector store, with Spring AI.

To convert our text data into vectors that ChromaDB can store and search, we’ll need an embedding model. We’ll use Ollama to run an embedding model locally.

2. Dependencies

Let’s start by adding the necessary dependencies to our project’s pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-chroma-store-spring-boot-starter</artifactId>
    <version>1.0.0-M6</version>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    <version>1.0.0-M6</version>
</dependency>

The ChromaDB starter dependency enables us to establish a connection with our ChromaDB vector store and interact with it.

Additionally, we import the Ollama starter dependency, which we’ll use to run our embedding model.

Since the current version, 1.0.0-M6, is a milestone release, we’ll also need to add the Spring Milestones repository to our pom.xml:

<repositories>
    <repository>
        <id>spring-milestones</id>
        <name>Spring Milestones</name>
        <url>https://repo.spring.io/milestone</url>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
    </repository>
</repositories>

This repository is where milestone versions are published, unlike the standard Maven Central repository.

Since we’re using multiple Spring AI starters in our project, let’s also include the Spring AI Bill of Materials (BOM) in our pom.xml:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.0.0-M6</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

With this addition, we can now remove the version tag from both of our starter dependencies.

The BOM eliminates the risk of version conflicts and ensures our Spring AI dependencies are compatible with each other.

3. Setting up Local Test Environment With Testcontainers

To facilitate local development and testing, we’ll use Testcontainers to set up our ChromaDB vector store and Ollama service.

The prerequisite for running the required services via Testcontainers is an active Docker instance.

3.1. Test Dependencies

First, let’s add the necessary test dependencies to our pom.xml:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-spring-boot-testcontainers</artifactId>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>chromadb</artifactId>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>ollama</artifactId>
    <scope>test</scope>
</dependency>

These dependencies provide us with the necessary classes to spin up ephemeral Docker instances for both of our external services.

3.2. Defining Testcontainers Beans

Next, let’s create a @TestConfiguration class that defines our Testcontainers beans:

@TestConfiguration(proxyBeanMethods = false)
class TestcontainersConfiguration {

    @Bean
    @ServiceConnection
    public ChromaDBContainer chromaDB() {
        return new ChromaDBContainer("chromadb/chroma:0.5.20");
    }

    @Bean
    @ServiceConnection
    public OllamaContainer ollama() {
        return new OllamaContainer("ollama/ollama:0.4.5");
    }
}

We specify the latest stable versions for our containers.

We also annotate our bean methods with @ServiceConnection. This dynamically registers all the properties required to set up a connection with both of our external services.

Even when not using the Testcontainers support, Spring AI automatically connects to ChromaDB and Ollama when running locally on their default ports of 8000 and 11434, respectively.

However, in production, we can override the connection details using the corresponding Spring AI properties:

spring:
  ai:
    vectorstore:
      chroma:
        client:
          host: ${CHROMADB_HOST}
          port: ${CHROMADB_PORT}
    ollama:
      base-url: ${OLLAMA_BASE_URL}

Once the connection details are configured correctly, Spring AI automatically creates beans of type VectorStore and EmbeddingModel for us, allowing us to interact with our vector store and embedding model, respectively. We’ll look at how to use these beans later in the tutorial.

Although @ServiceConnection automatically defines the necessary connection details, we’ll still need to configure a few additional properties in our application.yml file:

spring:
  ai:
    vectorstore:
      chroma:
        initialize-schema: true
    ollama:
      embedding:
        options:
          model: nomic-embed-text
      init:
        chat:
          include: false
        pull-model-strategy: WHEN_MISSING

Here, we enable schema initialization for ChromaDB. Then, we configure nomic-embed-text as our embedding model and instruct Ollama to pull the model if it’s not present in our system.

Alternatively, we can use a different embedding model from Ollama or a Hugging Face model as per requirement.

3.3. Using Testcontainers During Development

While Testcontainers is primarily used for integration testing, we can also use it during our local development.

To achieve this, we’ll create a separate main class in our src/test/java directory:

class TestApplication {

    public static void main(String[] args) {
        SpringApplication.from(Application::main)
          .with(TestcontainersConfiguration.class)
          .run(args);
    }
}

We create a TestApplication class and, inside its main method, start our main Application class with our TestcontainersConfiguration class.

This setup helps us to set up and manage our external services locally. We can run our Spring Boot application and have it connect to our external services, which are started via Testcontainers.

4. Populating ChromaDB at Application Startup

Now that we have our local environment set up, let’s populate our ChromaDB vector store with some sample data during application startup.

4.1. Fetching Poetry Records From PoetryDB

For our demonstration, we’ll use the PoetryDB API to fetch poems.

Let’s create a PoetryFetcher utility class for this:

class PoetryFetcher {

    private static final String BASE_URL = "https://poetrydb.org/author/";
    private static final String DEFAULT_AUTHOR_NAME = "Shakespeare";

    public static List<Poem> fetch() {
        return fetch(DEFAULT_AUTHOR_NAME);
    }

    public static List<Poem> fetch(String authorName) {
        return RestClient
          .create()
          .get()
          .uri(URI.create(BASE_URL + authorName))
          .retrieve()
          .body(new ParameterizedTypeReference<>() {});
    }

}

record Poem(String title, List<String> lines) {}

We use RestClient to invoke the PoetryDB API with the specified authorName. To deserialize the API response to a list of Poem records, we use ParameterizedTypeReference without explicitly specifying the generic response type, and Java will infer the type for us.

We also overload our fetch() method without any parameter to retrieve poems by the author Shakespeare. We’ll be using this method in our next section.

4.2. Storing Documents in ChromaDB Vector Store

Now, to populate our ChromaDB vector store with poems during application startup, we’ll create a VectorStoreInitializer class that implements the ApplicationRunner interface:

@Component
class VectorStoreInitializer implements ApplicationRunner {

    private final VectorStore vectorStore;

    // standard constructor

    @Override
    public void run(ApplicationArguments args) {
        List<Document> documents = PoetryFetcher
          .fetch()
          .stream()
          .map(poem -> {
              Map<String, Object> metadata = Map.of("title", poem.title());
              String content = String.join("\n", poem.lines());
              return new Document(content, metadata);
          })
          .toList();
        vectorStore.add(documents);
    }

}

In our VectorStoreInitializer, we autowire an instance of VectorStore.

Inside the run() method, we use our PoetryFetcher utility class to retrieve a list of Poem records. Then, we map each poem into a Document with the lines as content and the title as metadata.

Finally, we store all the documents in our vector store. When we invoke the add() method, Spring AI automatically converts our plaintext content into vector representation before storing it in our vector store. We don’t need to explicitly convert it using the EmbeddingModel bean.

By default, Spring AI uses SpringAiCollection as the collection name to store data in our vector store, but we can override it using the spring.ai.vectorstore.chroma.collection-name property.

5. Testing Semantic Search

With our ChromaDB vector store populated, let’s validate our semantic search functionality:

private static final int MAX_RESULTS = 3;

@ParameterizedTest
@ValueSource(strings = {"Love and Romance", "Time and Mortality", "Jealousy and Betrayal"})
void whenSearchingShakespeareTheme_thenRelevantPoemsReturned(String theme) {
    SearchRequest searchRequest = SearchRequest
      .builder()
      .query(theme)
      .topK(MAX_RESULTS)
      .build();
    List<Document> documents = vectorStore.similaritySearch(searchRequest);

    assertThat(documents)
      .hasSizeLessThanOrEqualTo(MAX_RESULTS)
      .allSatisfy(document -> {
          String title = String.valueOf(document.getMetadata().get("title"));
          assertThat(title)
            .isNotBlank();
        });
}

Here, we pass some common Shakespearean themes to our test method using @ValueSource. We then create a SearchRequest object with the theme as the query and MAX_RESULTS as the number of desired results

Next, we call the similaritySearch() method of our vectorStore bean, with our searchRequest. Similar to the add() method of the VectorStore, Spring AI converts our query to its vector representation before querying our vector store.

The returned documents will contain poems that are semantically related to the given theme, even if they don’t contain the exact keyword.

6. Conclusion

In this article, we explored how to integrate ChromaDB vector store with Spring AI.

Using Testcontainers, we started Docker containers for our ChromaDB and Ollama services, creating a local test environment.

We looked at how to populate our vector store with poems from the PoetryDB API during application startup. Then, we used common poetry themes to validate our semantic search functionality.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

REST with Spring Boot

Learn Spring Security

Learn Spring

Learn Spring Data JPA

View All Spring Courses

Learn JUnit

Learn Maven

Learn Hibernate JPA

Learn Mockito

View All Courses

Full Archive

Baeldung Ebooks

About Baeldung