Implementing Cross-entity Unique Keys in Kalix - Part 1

Renato Cavalcanti.: Principal Engineer, Lightbend.

26 September 2022,
7 minute read

Introduction

A common question that is often asked when building event sourced models is how to deal with cross-entity unique keys. The problem appears when the model has a field that is not its primary key, but needs to be unique across all entities in the system.

For example, imagine a social media application where a user is identified by some auto-generated identifier, but also by a handle and an email address. Twitter and GitHub for instance are very good examples of that, but there are plenty of other examples out there.

When a user is created the system will generate a unique identifier for it. That identifier is stable and is used as the primary identifier. Alongside it, the user will choose a handle and also give their email address. Both handle and email address must be unique across all users in the application.

We know how to do it when using a classical relational database. We simply add a unique constraint to both columns and we let the database manage it. However, when using event sourced models we can only store primary keys in the journal. Moreover, entities represent a transaction boundary and as such, we can only modify one at time. We can’t “lock” all entities in order to decide if a new handle is unique or if an email is already in use.

In this series of articles we will show you how to implement cross-entity unique keys by combining some of the existing Kalix components and features.

We will cover the topic in three different articles:

In part 1, we will discuss a flawed approach and analyze why it won't provide us a good solution.
In part 2, we will introduce you to a technique to enforce uniqueness across entities.
Finally, in part 3, we will add robustness and flexibility to the system by implementing some workflows using Kalix Timers and discussing some failure scenarios.

For this series we will be implementing our services using the Kalix Java SDK.

NOTE

The full project code can be found here.

The Event Sourced User Account

So let’s start modeling...

We will start creating our user account as an event sourced entity.

First the domain file: src/main/proto/com/example/user/domain/user_domain.proto

syntax = "proto3";

package com.example.user.domain;

option java_outer_classname = "UserDomain";

message UserState {
string user_id = 1;
string full_name = 2;
string handle = 3;
string email =  4;
}

message UserCreated {
string full_name = 1;
string handle = 2;
string email =  3;
}

And the service definition: src/main/proto/com/example/user/user_api.proto

syntax = "proto3";

import "google/protobuf/empty.proto";
import "kalix/annotations.proto";

package com.example.user;

option java_outer_classname = "UserApi";

message CreateUser {
string user_id = 1 [(kalix.field).entity_key = true];
string handle = 2;
string email =  3;
string full_name = 4;
}


service UserService {
option (kalix.codegen) = {
  event_sourced_entity: {
    name: "com.example.user.domain.UserEntity"
    entity_type: "user"
    state: "com.example.user.domain.UserState"
    events: [
      "com.example.user.domain.UserCreated"
    ]
  }
};
option (kalix.service).acl.allow = { principal: ALL };

rpc Create(CreateUser) returns (google.protobuf.Empty);
}

So far so good. This will give us a UserEntity to implement. The create method and corresponding event handler could be implemented as:

public class UserEntity extends AbstractUserEntity {

public Effect<Empty> create(UserDomain.UserState currentState, UserApi.CreateUser createUser) {
if (currentState != null) {
  return effects().error("User [" + entityId + "] already created");
} else {
  UserDomain.UserCreated event =
    UserDomain.UserCreated.newBuilder()
      .setFullName(createUser.getFullName())
      .setHandle(createUser.getHandle())
      .setEmail(createUser.getEmail())
      .build();
  return effects()
    .emitEvent(event)
    .thenReply(__ -> Empty.getDefaultInstance());
}
}

@Override
public UserDomain.UserState userCreated(UserDomain.UserState currentState,
                                        UserDomain.UserCreated userCreated) {
  return UserDomain.UserState.newBuilder()
    .setFullName(userCreated.getFullName())
    .setEmail(userCreated.getEmail())
    .setHandle(userCreated.getHandle())
    .build();
}
}

The first issue with this implementation is that the only field that is truly unique is the userId. It allows the creation of any number of users all sharing the same handle or email address, or both. That’s clearly not what we want.

We will need some form of coordination point where we can have a consistent view of the system and check if the handle and the email are already in use or not. And this needs to be done before we try to create the User entity.

Kalix Action is the component we are looking for here. Instead of letting the commands hit our User entity directly, we will route them through an Action that will do some pre-checks first.

However, once inside an Action, we can’t query for entities and ask: is anyone already using this handle? And even if we could call each individual entity and check if the handle is free, we still have the problem that when we finish our query, someone may have started to use that same handle.

A common but flawed approach to this problem is to create a Kalix View for the User Entity. Basically, views allow us to create indexes on fields other than the primary key. So, in principle we could query a View from inside our Action and check if there are any entries for the handle and email that we are trying to use. If none is found, we could then proceed by creating the new User.

Will that work? The answer is: sometimes. It will work sometimes because Views are eventually consistent and therefore they may not offer a consistent view of the system at the moment we run our query. It can be that you get good results during your tests, but there is no guarantee that such a check will give you the required consistency level.

The lesson to learn about Views is that they are very good at exposing data using different indexes, but they are not our best option if we want to make business decisions that require strict consistency.

When building a Kalix application, it’s important to understand that each call to a component can be performed on different nodes and that there is no global transaction across calls. To make it even more clear, there is no transaction across two calls even when they are performed on the same node. Each component should be considered as living in its own isolation bubble. So, even if Views were strictly consistent, it won’t suffice to query a view, make some decisions based on its results and then perform some operations on another component, like the User entity.

That doesn’t mean that we can’t build strictly consistent models in Kalix. Instead, we need to keep in mind the distributed nature of Kalix and embrace it. The path to strictly consistent models in a distributed system is to chain the operations in the right order, put in place the necessary measures to cope with possible failures and be able to recover from them. This often requires some plumbing when building everything from scratch. The good news is that we can combine some Kalix features to achieve this level of consistency without having to deal with all the complexity associated with it.

In part 2, we will cover how to combine Actions, Value Entities and Event Sourced Entities to implement cross-entity unique keys.