Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

Locating a substring in a larger string is a common operation when we work with Java. Usually, we can find a substring’s index using the indexOf() method.

In this tutorial, we’ll explore various approaches to solve an interesting and pragmatic problem: finding the n-th occurrence of a substring in a larger string.

2. Introduction to the Problem

The standard indexOf() method can give us the index of a substring in a string. For example, we can find the index (8) of the substring “a” in “This is a word.“:

int firstIdx = "This is a word.".indexOf("a");
assertEquals(8, firstIdx);

However, the indexOf(substring) method always returns the index of the substring’s first occurrence. Therefore, sometimes, it’s inconvenient when we want to know a substring’s n-th occurrence index using this method. Let’s see an example:

final static String INPUT = "a word, a word, a word, a word";
// indexes of "a":          "0       8       16      24 "

As the example above shows, INPUT contains four “a“s. We can obtain the first “a” index (0) by calling indexOf(“a”) directly. However, we may need solutions to get other indexes of “a”‘s occurrences, such as 8, 16, and 24.

Next, let’s see how to solve this problem.

For simplicity, we’ll leverage unit test assertions to verify whether each approach returns the expected result.

3. Finding the Substring’s Second Occurrence

Before we solve the “finding the n-th occurrence” problem, let’s first find the index of the second (n=2) occurrence of the substring “a”. Then, we’ll extend the “find the second occurrence” solution to various “the n-th occurrence” solutions.

We’ve learned that the indexOf(“a”) returns the index of the first occurrence of “a”. Further, we can pass the fromIndex int parameter to the indexOf() method. The fromIndex parameter indicates the character index we start searching from. 

Therefore, a straightforward idea to get the second occurrence’s index is calling indexOf() twice:

  • Get the index of the first occurrence by calling indexOf(substring) and save the result in a variable, let’s say firstIdx
  • Get the index of the second occurrence by calling indexOf(substring, firstIdx + substring.length())

Next, let’s implement this approach and test with our INPUT string:

int firstIdx = INPUT.indexOf("a");
int secondIdx = INPUT.indexOf("a", firstIdx + "a".length());
assertEquals(8, secondIdx);

Now, some of us might have realized we can call indexOf() n times with the corresponding fromIdx parameter to get the index of the n-th occurrence. For example, we can add another indexOf() call to the test above to get the third occurrence’s index: thirdIdx = INPUT.indexOf(“a”, secondIdx + “a”.length());.

So next, let’s extend the “finding the second occurrence” solution to “finding the n-th occurrence”.

4. Finding the Substring’s N-th Occurrence

Usually, we’d use recursion or looping to implement repeated operations.

4.1. The Recursive Approach

So, let’s first implement the recursive solution:

int nthIndexOf(String input, String substring, int nth) {
    if (nth == 1) {
        return input.indexOf(substring);
    } else {
        return input.indexOf(substring, nthIndexOf(input, substring, nth - 1) + substring.length());
    }
}

The implementation is pretty straightforward. The nth variable works as a counter. We reduce its value in each recursion step.

Next, let’s test the method with our example data:

int result1 = nthIndexOf(INPUT, "a", 1);
assertEquals(0, result1);

int result2 = nthIndexOf(INPUT, "a", 2);
assertEquals(8, result2);

int result3 = nthIndexOf(INPUT, "a", 3);
assertEquals(16, result3);

int result4 = nthIndexOf(INPUT, "a", 4);
assertEquals(24, result4);

int result5 = nthIndexOf(INPUT, "a", 5);
assertEquals(-1, result5);

The test passes if we give it a run. Also, as we can see, when the total occurrence count is less than the nth value, the method returns -1.

4.2. The Iterative Approach

Similarly, we can implement the same idea in an iterative approach:

static int nthIndexOf2(String input, String substring, int nth) {
    int index = -1;
    while (nth > 0) {
        index = input.indexOf(substring, index + substring.length());
        if (index == -1) {
            return -1;
        }
        nth--;
    }
    return index;
}

The test with the same input passes as well:

int result1 = nthIndexOf2(INPUT, "a", 1);
assertEquals(0, result1);

int result2 = nthIndexOf2(INPUT, "a", 2);
assertEquals(8, result2);

int result3 = nthIndexOf2(INPUT, "a", 3);
assertEquals(16, result3);

int result4 = nthIndexOf2(INPUT, "a", 4);
assertEquals(24, result4);

int result5 = nthIndexOf2(INPUT, "a", 5);
assertEquals(-1, result5);

5. The Regex-Based Solution

We’ve seen how to solve the problem using the indexOf() method. Alternatively, we can use Java’s Regex API to solve the problem.

Matcher.find() allows us to find the next matched occurrence in the input string. Therefore, we can call Matcher.find() n times to get the n-th match. Also, we can get each match’s start index using Matcher.start():

int nthOccurrenceIndex(String input, String regexPattern, int nth) {
    Matcher matcher = Pattern.compile(regexPattern).matcher(INPUT);
    for (int i = 0; i < nth; i++) {
        if (!matcher.find()) {
            return -1;
        }
    }
    return matcher.start();
}

Next, let’s create a test to verify whether the regex-based solution does the job correctly:

int result1 = nthOccurrenceIndex(INPUT, "a", 1);
assertEquals(0, result1);

int result2 = nthOccurrenceIndex(INPUT, "a", 2);
assertEquals(8, result2);

int result3 = nthOccurrenceIndex(INPUT, "a", 3);
assertEquals(16, result3);

int result4 = nthOccurrenceIndex(INPUT, "a", 4);
assertEquals(24, result4);

int result5 = nthOccurrenceIndex(INPUT, "a", 5);
assertEquals(-1, result5);

It’s worth noting that this solution allows us to match dynamic substrings matching a pattern in the input. However, on the other hand, the indexOf() based approach only works with fixed substrings.

6. Conclusion

In this article, we’ve learned various ways to locate the n-th occurrence of a substring within a string:

  • The recursive solution based on the indexOf() method
  • The iterative solution based on the indexOf() method
  • Regex-based solution

As always, the complete source code for the examples is available over on GitHub.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.