Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Introduction

In this article, we’ll discuss what ObjectId is, how we can generate it, and possible ways of ensuring its uniqueness.

2. ObjectId General Information

Let’s start by explaining what an ObjectId is. An ObjectId is a 12-byte hexadecimal value and one of the possible datatypes in BSON specification. BSON is a binary serialization of a JSON document. Moreover, MongoDB uses ObjectId as its default identifier for the _id field in documents. There is also a default unique index on the _id field set up when a collection is created.

This prevents users from inserting two documents having the same _id. Moreover, the _id index can not be dropped from the collection. However, it’s possible to have a single document with the same _id inserted into two collections.

2.1. ObjectId Structure

ObjectId can be divided into three different parts. Considering ObjectId of 6359388c80616b1fc6d7ec71, the first part would consist of 4 bytes – 6359388c. Those 4 bytes represent time in seconds since the Unix Epoch. The second part consists of the next 5 bytes, which are 80616b1fc6. Those bytes represent a random value generated once per process. The random value is unique to the machine and process. The last part is 3 bytes d7ec71, and it represents an incrementing counter which starts from a random value.

It’s also worth mentioning that the above structure is valid for MongoDB in version 4.0 and above. Before that, there were four parts of which the ObjectId was constructed. The first 4 bytes represent seconds since the Unix Epoch, and the next three are for the machine identifier.

Next 2 bytes for the process id and the last 3 bytes for the counter start from a random value.

2.2. ObjectId Uniqueness

The most important thing, which is also mentioned in the MongoDB documentation, is that the ObjectId is highly likely considered to be unique when generated. That being said, there is a very slim possibility of generating a duplicate ObjectId. Looking at the structure of ObjectId, we can see that there are over 1,8×10^19 possibilities for ObjectId to be generated within one second.

Even if all ids were generated within the same second on the same machine within the same process, that would be over 17 million possibilities just for the counter itself.

3. ObjectId Creation

There are multiple ways of creating ObjectId in Java. It can be done either with non-parameters or parametrized constructors.

3.1. ObjectId Creation With Non-parameterized Constructors

The first and one of the easiest ones is via a new keyword with the non-parametrized constructor:

ObjectId objectId = new ObjectId();

The second is simply calling a static method get() on an ObjectId class. Not directly calling the non-parametrized constructors. However, the implementation of the get() method consists of creating ObjectId the same as in the first example – through the new keyword:

ObjectId objectId = ObjectId.get();

3.2. ObjectId Creation With Parameterized Constructors

The rest of the examples use parametrized constructors. We can create an ObjectId by passing the Date class as a parameter or both the Date class and int counter. If we try to create ObjectId with the same Date in both methods, we’ll get a different ObjectId for new ObjectId(date) vs. new ObjectId(date, counter).

However, if we create two ObjectId through new ObjectId(date, counter) in the same second, we’ll get a duplicate ObjectId since it was generated in the same second, on the same machine, and with the same counter. Let’s see an example:

@Test
public void givenSameDateAndCounter_whenComparingObjectIds_thenTheyAreNotEqual() {
    Date date = new Date();
    ObjectId objectIdDate = new ObjectId(date); // 635981f6e40f61599e839ddb
    ObjectId objectIdDateCounter1 = new ObjectId(date, 100); // 635981f6e40f61599e000064
    ObjectId objectIdDateCounter2 = new ObjectId(date, 100); // 635981f6e40f61599e000064

    assertThat(objectIdDate).isNotEqualTo(objectIdDateCounter1);
    assertThat(objectIdDate).isNotEqualTo(objectIdDateCounter2);

    assertThat(objectIdDateCounter1).isEqualTo(objectIdDateCounter2);
}

Additionally, it’s possible to create ObjectId by providing a hexadecimal value straight as a parameter:

ObjectId objectIdHex = new ObjectId("635981f6e40f61599e000064");

There’re a few more possibilities to create an ObjectId. We can pass byte[] or ByteBuffer class. If we create an ObjectId by passing an array of bytes to a constructor, we should get the same ObjectId by creating it through ByteBuffer class using the same array of bytes.

Let’s see an example:

@Test
public void givenSameArrayOfBytes_whenComparingObjectIdsCreatedViaDifferentMethods_thenTheObjectIdsAreEqual(){
    byte[] bytes = "123456789012".getBytes();
    ObjectId objectIdBytes = new ObjectId(bytes);

    ByteBuffer buffer = ByteBuffer.wrap(bytes);
    ObjectId objectIdByteBuffer = new ObjectId(buffer);

    assertThat(objectIdBytes).isEqualTo(objectIdByteBuffer);
}

The last possible method would be to create an ObjectId by passing a timestamp and a counter to a constructor.

4. Pros and Cons of ObjectId

As with all things, there are pros and cons worth knowing about.

4.1. Benefits of ObjectId

Since ObjectId is 12-byte long, it’s smaller than the 16-byte UUID. That being said, if we have a lot of documents in the database using ObjectId rather than UUID, we’ll save some space. Around 26500 usages of ObjectId will save about 1MB compared to UUID. This seems to be a minimal amount.

Still, if the database is large enough and it’s also possible that a single document will have more than one occurrence of the ObjectId, then the gain of disk space and RAM might be significant since the documents, in the end, will be smaller. Secondly, as we learned before, a timestamp is embedded into the ObjectId, which might be useful in some cases.

For instance, to determine which ObjectId was created first, assuming all of them were autogenerated and not created by manipulating the Date class into the parametrized constructor as we’ve seen before.

4.2. Drawbacks of ObjectId

On the other hand, there are some identifiers even smaller than a 12-byte ObjectId, which again would save even more disk space and RAM. Furthermore, since ObjectId is just a generated hexadecimal value, this means there is a possibility of having a duplicate id. It’s very slim, but it’s still possible.

5. Ensuring the Uniqueness of ObjectId

If we have to ensure that the generated ObjectId is unique, we can try to program a bit around it to make it 100% sure it’s not a duplicate.

5.1. Try Catch DuplicateKeyException

Suppose we insert a document with a field _id already in the database. In that case, we can catch a DuplicateKeyException and retry the inserting operation until it’s successful. This method will only work on fields that have a unique index created.

Let’s see an example of that. Considering a User class:

public class User {
    public static final String NAME_FIELD = "name";

    private final ObjectId id;
    private final String name;

    // constructor
    // getters
}

We’ll insert a User into the database and then try to insert another one with the same ObjectId. This will cause DuplicateKeyException to be thrown. We can catch that and retry the insert operation of User. However, this time, we’ll generate another ObjectId. For the purpose of this test, we’ll use an embedded MongoDB library and Spring Data with MongoDB.

Let’s see an example:

@Test
public void givenUserInDatabase_whenInsertingAnotherUserWithTheSameObjectId_DKEThrownAndInsertRetried() {
    // given
    String userName = "Kevin";
    User firstUser = new User(ObjectId.get(), userName);
    User secondUser = new User(ObjectId.get(), userName);

    mongoTemplate.insert(firstUser);

    // when
    try {
        mongoTemplate.insert(firstUser);
    } catch (DuplicateKeyException dke) {
        mongoTemplate.insert(secondUser);
    }

    // then
    Query query = new Query();
    query.addCriteria(Criteria.where(User.NAME_FIELD)
      .is(userName));
    List<User> users = mongoTemplate.find(query, User.class);
    assertThat(users).usingRecursiveComparison()
      .isEqualTo(Lists.newArrayList(firstUser, secondUser));
}

5.2. Find and Insert

Another approach, probably not recommended, could be to find a document with a given ObjectId to see if it exists. If it doesn’t exist, we could insert it. Otherwise, throw an error or generate another ObjectId and try again. This method is also unreliable since there is no atomic find and insert option in MongoDB, which could lead to inconsistencies.

It’s a common approach to autogenerate ObjectId and try to insert a document without ensuring its uniqueness. It seems to be overkill to on each insert try catch DuplicateKeyException and retry the operation. The number of edge cases is very limited, and it’s tough to reproduce such a case without seeding ObjectId with either Date, counter or timestamp in the first place.

However, if, for some reason, we can’t afford to have a duplicate ObjectId due to those edge cases, then we’d consider using the above method to ensure global uniqueness.

5. Conclusion

In this article, we learned what an ObjectId is, how it’s built, how we can generate it, and possible ways of ensuring its uniqueness. In the end, it seems to be the best idea to trust the autogeneration of ObjectIds.

All code samples can be found over on GitHub.

Course – LSD (cat=Persistence)

Get started with Spring Data JPA through the reference Learn Spring Data JPA course:

>> CHECK OUT THE COURSE
res – Persistence (eBook) (cat=Persistence)
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.