1. Introduction

Encryption is a vital part of many computer processes. While it can protect the communication between endpoints, it can also secure data at rest. However, strong encryption isn’t always a requirement. For more lax security, we can put together faster, lightweight, and less complex ways to encrypt a file. Sometimes, these can be a first step to a more robust implementation.

In this tutorial, we explore a basic way to transform the contents of a file in an effort to obscure its meaning. First, we talk about bits and bit operations. After that, we concentrate on the way XOR works. Next, we demonstrate a basic use case for the operation. Finally, we implement and explain solutions in three languages that take a stream of data and a key, encrypting the former with the latter.

We tested the code in this tutorial on Debian 12 (Bookworm) with GNU Bash 5.2.15. Unless otherwise specified, it should work in most POSIX-compliant environments.

2. Bit Operations

Bit operations work on bits. Regular bits can take on one of two values:

  • 0
  • 1

In bit manipulation, there are several basic operations:

  • NOT: unary operation that takes one bit and returns its inverse, so NOT 1 = 0 and vice-versa
  • AND: binary operation that takes two bits and returns 1 only if both are 1
  • OR: binary operation that takes two bits and returns 0 only if both are 0

While AND and OR operations can be implemented using specially constructed chains with the other operator, NOT is indispensable for bit logic. In other words, we can implement all operations via NOT and either AND or OR.

Let’s see the so-called truth tables of each operation over the b1 and b2 bits:

+----+----+----------+-----+----+
| b1 | b2 | NOT (b2) | AND | OR |
+----+----+----------+-----+----+
| 0  | 0  | 1        | 0   | 0  |
| 0  | 1  | 0        | 0   | 1  |
| 1  | 0  | 1        | 0   | 1  |
| 1  | 1  | 0        | 1   | 1  |
+----+----+----------+-----+----+

Here, we see the results of applying OR and AND to b1 and b2 along with the result from NOT on b2. Notably, b1 AND b2 is the same as b2 AND b1. The same applies to OR as well.

For convenience, programming and even electronic hardware environments most often offer NOT, AND, OR, and XOR.

3. The EXclusive OR (XOR) Operation

EXclusive OR (XOR) is a binary operation that takes two bits and returns 0 only if both are equal, 1 otherwise:

+----+----+----+-----+
| b1 | b2 | OR | XOR |
+----+----+----+-----+
| 0  | 0  | 0  | 0   |
| 0  | 1  | 1  | 1   |
| 1  | 0  | 1  | 1   |
| 1  | 1  | 1  | 0   |
+----+----+----+-----+

In this way, XOR is closely related to OR, but eXcludes the case in which both bits are equal to 1.

While it may not be obvious at first glance, using XOR has important benefits:

  • unlike OR and AND, XOR has a 50-50 chance of returning 0 and 1 for truly random inputs, making it an ideal way to hide information
  • XOR can return either of its arguments if we repeat the XOR operation with the resulting XOR value and the other argument

To clarify the latter point, we can perform several operations:

  • b1 XOR b2 = b3
  • b3 XOR b1 = b2
  • b3 XOR b2 = b1

As we can see, b3 is the result of performing XOR over b1 and b2. Consequently, using this result with either b1 or b2 produces b2 or b1 respectively.

4. XOR Use Case

Due to the way it works, one of the most common applications for XOR is data transformation via a repeated key or a key stream.

4.1. Key With Equal Length

Sometimes, we might have a key with the same length as the data. In practice, to achieve this, algorithms usually employ hashing. However, we use a single character to demonstrate.

For instance, let’s create a file that only contains the x character:

$ printf x > file

Now, we can view the file as [-b]inary via the xxd hex editor:

$ xxd -b file
00000000: 01111000                                               x

As we can see, x is 01111000 (ASCII code) in binary.

Now, if we assume our secret key for encrypting this single character is + plus, we go through several steps:

  1. convert + to its binary form
  2. convert the content of the file to its binary form
  3. XOR each bit of + with each bit of x
  4. (optionally) convert the result back to a character

So, let’s create a key file and check its binary form:

$ printf + > key
$ xxd -b key
00000000: 00101011                                               +

At this point, we can perform the XOR operation:

$ echo 'obase=2; '$(( $((2#01111000)) ^ $((2#00101011)) )) | bc
1010011

This is the result after applying a key over content with the same length. In this case, we use Bash arithmetic expansion.

4.2. Repeated Key or Stream

Since having a key with the same length as the data it should encode is rare, repeating the same key is a usual workaround to having a long enough secret:

keykeykeykeykeykeyke
Long classified text

Alternatively, we can employ a key stream.

Either way, the process for applying XOR is similar but requires a loop.

While we can get the correct result in all cases, there are several potential problems with the steps and method above:

  • manual conversion of data to binary code
  • depends on shell arithmetic expansion
  • conversion to and from binary is inefficient
  • lack of leading zero in the result
  • complex pipeline

Because of these shortcomings, we might consider alternative approaches.

5. Using Perl

One common language with a long history of text and general data processing is Perl. Although it gradually seems to be getting more and more obscure, using the perl interpreter can be invaluable when it comes to data manipulation.

5.1. XOR Operator

For example, the ^ operator is a standard part of the syntax and can be used on any basic data type:

$ perl -e 'print 0 ^ 1'
1
$ perl -e 'print 120 ^ 43'
83
$ perl -e 'print 0b01111000 ^ 0b00101011'
83

Notably, perl recognizes 0 and 1 as bits, but can also handle binary and decimal numbers as usual.

5.2. XOR and Conversion

Because of its flexibility, we can also include the binary conversion as part of the XOR operation:

$ perl -e 'printf "%08b", 0b01111000 ^ 0b00101011'
01010011

In this case, we use the printf() function with the %08b format specifier:

  • % indicates this is a format specifier (%% is a percent symbol in the context of a printf() string)
  • 0 is the padding character
  • 8 is the desired minimal size of the output
  • b converts the input to binary

This way, we get a binary result of the proper length without any extra tools outside of Perl.

5.3. Character XOR

Furthermore, we can just use characters instead of their binary form:

$ perl -e 'printf "%08b", ord("x" ^ "+")'
01010011

In this case, the format specifier is the same, but we use ord() to get the decimal value of the result after performing XOR on x and +. Otherwise, we get the result as a character, which may not always be printable.

5.4. String XOR

Similarly, we can upgrade the solution to strings within files:

$ cat file.in
Secret text.
$ cat file.in | perl -0we '
$k = $ARGV[0];
binmode(STDIN);
binmode(STDOUT);
while(sysread(STDIN,$c,length($k))) {
  print $c^substr($k,0,length($c));
}
' key | cat --show-all
8^@^Z^Y^@^MK^Q^\^S^QWa

In summary, this script uses the first passed argument as the key. After that, sysread() gets chunks with the length() of the [$k]ey from the STDIN standard input in binmode() binary mode. Next, it performs an XOR over them with the key itself. The result goes to the STDOUT standard output.

In this case, we pass the input from a file with the cat command and use the –show-all flag to see even unprintable characters.

If we don’t want to pass key as cleartext on the command line, we can also place it in a file and use $(<key.file) command substitution.

Notably, the Perl implementation is stripped down from any error checking. This way, we can encrypt and decrypt with the same script.

6. Using C

As usual, C is a good fit when performing lower-level tasks such as applying the XOR operation over chunks of data. The reason behind this is its ability to control pointers and byte structures, allocate data placeholders of specific sizes, and work with standard streams.

To that end, let’s write a small program without error checks:

$ cat xor.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
  size_t kl, n, i;
  unsigned char *k, *b;

  k = argv[1];
  kl = strlen(k);

  b = (unsigned char *) malloc(kl);

  freopen(NULL, "rb", stdin);
  freopen(NULL, "wb", stdout);

  while ((n = fread(b, 1, kl, stdin)) != 0L) {
    for (i = 0; i < n; i++)
      b[i] ^= k[i];
    fwrite(b, 1, n, stdout);
  }

  free(b);

  exit(0);
}

In summary, we get the key from the first command-line argument passed to the program. After that, we reopen stdin and stdout. Next, we read chunks with the key length from the input, XOR them with the key, and then output to stdout. Again, there isn’t any error checking.

At this point, we compile the source via cc:

$ cc -o xor xor.c

Let’s run the resulting executable in the same way as the Perl script from earlier:

$ cat file.in
Secret text.
$ cat file.in | ./xor key | cat --show-all
8^@^Z^Y^@^MK^Q^\^S^QWa

As expected, the result is equivalent. Consequently, we can again use $(<key.file) instead of keyto avoid exposure.

Of course, we can expect this implementation to be slightly faster due to the compiled nature of C.

7. Using Python

Due to its popularity and comprehensiveness, the Python language is a viable alternative for the task at hand.

So, let’s port the Perl code for running the XOR operation over a data stream to Python:

$ cat file.in
Secret text.
$ cat file.in | python3 -c '
import sys

k = sys.argv[1]
kl = len(k)

sys.stdin.mode += "b"
sys.stdout.mode += "b"

while b := sys.stdin.buffer.read(kl):
  o = ""
  for i in range(len(b)):
    o += chr(b[i] ^ ord(k[i]))
  sys.stdout.write(o)
' key
8^@^Z^Y^@^MK^Q^\^S^QWa

Like before, we first get the key as the first argument and set stdin and stdout to [b]inary mode. After that, we read at most [k]ey [l]ength characters from stdin, and XOR them with the respective key byte. Finally, we write the chunk to stdout.

Similar to the previous examples, no error checking or optimization is considered in this code snippet.

8. Summary

In this article, we briefly explained the XOR binary operation and wrote several implementations for weak data encryption via XOR and a key.

In conclusion, although we can use different modules to perform the operation, implementing a custom solution can be a good idea in some cases.