Run-Length Encoding and Decoding in Java

Azure Spring Apps is a fully managed service from Microsoft (built in collaboration with VMware), focused on building and deploying Spring Boot applications on Azure Cloud without worrying about Kubernetes.

The Enterprise plan comes with some interesting features, such as commercial Spring runtime support, a 99.95% SLA and some deep discounts (up to 47%) when you are ready for production.

>> Learn more and deploy your first Spring Boot app to Azure.

And, you can participate in a very quick (1 minute) paid user research from the Java on Azure product team.

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

The Jet Profiler was built for MySQL only, so it can do things like real-time query performance, focus on most used tables or most frequent queries, quickly identify performance issues and basically help you optimize your queries.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

Accelerate Your Jakarta EE Development with Payara Server!

With best-in-class guides and documentation, Payara essentially simplifies deployment to diverse infrastructures.

Beyond that, it provides intelligent insights and actions to optimize Jakarta EE applications.

The goal is to apply an opinionated approach to get to what's essential for mission-critical applications - really solid scalability, availability, security, and long-term support:

>> Download and Explore the Guide (to learn more)

The AI Assistant to boost Boost your productivity writing unit tests - Machinet AI.

AI is all the rage these days, but for very good reason. The highly practical coding companion, you'll get the power of AI-assisted coding and automated unit test generation.
Machinet's Unit Test AI Agent utilizes your own project context to create meaningful unit tests that intelligently aligns with the behavior of the code.
And, the AI Chat crafts code and fixes errors with ease, like a helpful sidekick.

Simplify Your Coding Journey with Machinet AI:

>> Install Machinet AI in your IntelliJ

Looking for the ideal Linux distro for running modern Spring apps in the cloud?

Meet Alpaquita Linux: lightweight, secure, and powerful enough to handle heavy workloads.

This distro is specifically designed for running Java apps. It builds upon Alpine and features significant enhancements to excel in high-density container environments while meeting enterprise-grade security standards.

Specifically, the container image size is ~30% smaller than standard options, and it consumes up to 30% less RAM:

>> Try Alpaquita Containers now.

DbSchema is a super-flexible database designer, which can take you from designing the DB with your team all the way to safely deploying the schema.

The way it does all of that is by using a design model, a database-independent image of the schema, which can be shared in a team using GIT and compared or deployed on to any database.

And, of course, it can be heavily visual, allowing you to interact with the database using diagrams, visually compose queries, explore the data, generate random data, import data or build HTML5 database reports.

>> Take a look at DBSchema

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

1. Overview

In computer science, data compression techniques play an important role in optimizing storage and transmission efficiency. One such technique that has stood the test of time is Run-length Encoding (RLE).

In this tutorial, we’ll understand RLE and explore how to implement encoding and decoding in Java.

2. Understanding Run-Length Encoding

Run-length Encoding is a simple yet effective form of lossless data compression. The basic idea behind RLE is to represent consecutive identical elements, called a “run” in a data stream by a single value and its count rather than as the original run.

This is particularly useful when dealing with repetitive sequences, as it significantly reduces the amount of space needed to store or transmit the data.

RLE is well suited to compress palette-based bitmap images, such as computer icons and animations. A well-known example is the Microsoft Windows 3.x startup screen, which is RLE compressed.

Let’s consider the following example:

String INPUT = "WWWWWWWWWWWWBAAACCDEEEEE";

If we apply a run-length encoding data compression algorithm to the above string, it can be rendered as follows:

String RLE = "12W1B3A2C1D5E";

In the encoded sequence, each character follows the number of times it appears consecutively. This rule allows us to easily reconstruct the original data during decoding.

It’s worth noting that the standard RLE only works with “textual” input. If the input contains numbers, RLE can’t encode them in a nonambiguous way.

In this tutorial, we’ll explore two run-length encoding and decoding approaches.

3. Character Array Based Solution

Now that we understand RLE and how it works, a classic approach to implementing run-length encoding and decoding is converting the input strings into a char array and then applying the encoding and decoding rules to the array.

3.1. Creating the Encoding Method

The key to implementing run-length encoding is to identify each run and count its length. Let’s first look at the implementation and then understand how it works:

String runLengthEncode(String input) {
    StringBuilder result = new StringBuilder();
    int count = 1;
    char[] chars = input.toCharArray();
    for (int i = 0; i < chars.length; i++) {
        char c = chars[i];
        if (i + 1 < chars.length && c == chars[i + 1]) {
            count++;
        } else {
            result.append(count).append(c);
            count = 1;
        }
    }
    return result.toString();
}

Next, let’s quickly walk through the code above and understand the logic.

First, we use StringBuilder to store each step’s result and concatenate them for better performance.

After initializing a counter and converting the input string to a char array, the function iterates through each character in the input string.

For each character:

If the current character is the same as the next character and we are not at the end of the string, the count is incremented.
If the current character is different from the next character or we are at the end of the string, the count and the current character are appended to the result StringBuilder. The count is then reset to 1 for the next unique character.

Finally, the StringBuilder is converted to a string using toString() and returned as the result of the encoding process.

When we test our INPUT with this encoding method, we get the expected result:

assertEquals(RLE, runLengthEncode(INPUT));

3.2. Creating the Decoding Method

Identifying each run is still crucial to decode a RLE string. A run includes the character and the number of times it appears, such as “12W” or “2C“.

Now, let’s create a decoding method:

String runLengthDecode(String rle) {
    StringBuilder result = new StringBuilder();
    char[] chars = rle.toCharArray();

    int count = 0;
    for (char c : chars) {
        if (Character.isDigit(c)) {
            count = 10 * count + Character.getNumericValue(c);
        } else {
            result.append(String.join("", Collections.nCopies(count, String.valueOf(c))));
            count = 0;
        }
    }
    return result.toString();
}

Next, let’s break down the code and understand step by step how it works.

First, we create a StringBuilder object to hold step results and convert the rle string to a char array for later processing.

Also, we initialized an integer variable count to keep track of the count of consecutive occurrences.

Then, we iterate through each character in the RLE-encoded string. For each character:

If the character is a digit, it contributes to the count. The count is updated by appending the digit to the existing count using the formula 10 * count + Character.getNumericValue(c). This is done to handle multi-digit counts.
If the character is not a digit, a new character is encountered. The current character is then appended to the result StringBuilder repeated count times using Collections.nCopies(). The count is then reset to 0 for the next set of consecutive occurrences.

It’s worth noting that if we work with Java 11 or later, String.valueOf(c).repeat(count) is a better alternative to repeat the character c count times.

When we verify the decoding method using our example, the test passes:

assertEquals(INPUT, runLengthDecode(RLE));

4. Regex Based Solution

Regex is a powerful tool for dealing with characters and strings. Let’s see if we can perform run-length encoding and decoding using regex.

4.1. Creating the Encoding Method

Let’s first look at the input string. If we can split it into a “run array”, then the problem can be easily solved:

Input     : "WWWWWWWWWWWWBAAACCDEEEEE"
Run Array : ["WWWWWWWWWWWW", "B", "AAA", "CC", "D", "EEEEE"]

We cannot split the input string by a character to achieve this. Instead, we must split by a zero-width position, such as the position between ‘W‘ and ‘B‘, ‘B‘ and ‘A“, etc.

It isn’t hard to discover the rules of these positions: the characters before and after the position are different. Therefore, we can build a regex to match the required positions using look-around and back reference: “(?<=(\\D))(?!\\1)“. Let’s quickly understand what this regex means:

(?<=(\\D)) – Positive look-behind assertion ensures the match is after a non-digit character (\\D represents a non-digit character).
(?!\\1) – Negative lookahead assertion ensures the matched position isn’t before the same character as in the positive lookbehind. \\1 refers to the previously matched non-digit character.

The combination of these assertions ensures that the split occurs at the boundaries between consecutive runs of the same character.

Next, let’s create the encoding method:

String runLengthEncodeByRegEx(String input) {
    String[] arr = input.split("(?<=(\\D))(?!\\1)");
    StringBuilder result = new StringBuilder();
    for (String run : arr) {
        result.append(run.length()).append(run.charAt(0));
    }
    return result.toString();
}

As we can see, after we obtain the run array, the rest of the task is simply appending each run’s length and the character to the prepared StringBuilder.

The runLengthEncodeByRegEx() method passes our test:

assertEquals(RLE, runLengthEncodeByRegEx(INPUT));

4.2. Creating the Decoding Method

We can follow a similar idea to decode a RLE-encoded string. First, we need to split the encoded string and obtain the following array:

RLE String: "12W1B3A2C1D5E"
Array     : ["12", "W", "1", "B", "3", "A", "2", "C", "1", "D", "5", "E"]

Once we get this array, we can generate the decoded string by easily repeating each character, such as ‘W‘ 12 times, ‘B‘ once, etc.

We’ll use the look-around technique again to create the regex to split the input string: “(?<=\\D)|(?=\\D+)“.

In this regex:

(?<=\\D) – Positive look-behind assertion ensures that the split occurs after a non-digit character.
| – Indicates the “OR” relation
(?=\\D+) – Positive lookahead assertion ensures the split occurs before one or more non-digit characters.

This combination allows the split to occur at the boundaries between consecutive counts and characters in the RLE-encoded string.

Next, let’s build the decoding method based on the regex-based split:

String runLengthDecodeByRegEx(String rle) {
    if (rle.isEmpty()) {
        return "";
    }
    String[] arr = rle.split("(?<=\\D)|(?=\\D+)");
    if (arr.length % 2 != 0) {
        throw new IllegalArgumentException("Not a RLE string");
    }
    StringBuilder result = new StringBuilder();

    for (int i = 1; i <= arr.length; i += 2) {
        int count = Integer.parseInt(arr[i - 1]);
        String c = arr[i];

        result.append(String.join("", Collections.nCopies(count, c)));
    }
    return result.toString();
}

As the code shows, we included some simple validations at the beginning of the method. Then, while iterating the split array, we retrieved the count and character from the array and appended the character repeated count times to the result StringBuilder.

Finally, this decoding method works with our example:

assertEquals(INPUT, runLengthDecodeByRegEx(RLE));

5. Conclusion

In this article, we first discussed how run-length encoding works and then explored two approaches to implementing run-length encoding and decoding.

As always, the complete source code for the examples is available over on GitHub.

Run-Length Encoding and Decoding in Java

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Overview

2. Understanding Run-Length Encoding

3. Character Array Based Solution

3.1. Creating the Encoding Method

3.2. Creating the Decoding Method

4. Regex Based Solution

4.1. Creating the Encoding Method

4.2. Creating the Decoding Method

5. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course:

REST with Spring

Learn Spring Security ▼▲

Learn Spring Security Core

Learn Spring Security OAuth

Learn Spring

Learn Spring Data JPA

Persistence

REST

Security

Full Archive

Baeldung Ebooks

About Baeldung

Write for Baeldung

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Overview

2. Understanding Run-Length Encoding

3. Character Array Based Solution

3.1. Creating the Encoding Method

3.2. Creating the Decoding Method

4. Regex Based Solution

4.1. Creating the Encoding Method

4.2. Creating the Decoding Method

5. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course: