A Guide to Event Sourcing in Microservices

The Kalix Team at Lightbend

17 October 2022,
9 minute read

Introduction

As all DevOps engineers and software architects know, managing a distributed database system while ensuring sufficiently performant response time and failsafe resiliency is no small task. In the demanding world of cloud applications, which needs massively parallel data access, the granular scaling of individual services is critical. The fine scaling of particular services necessitates data analytics on data stores to decipher contextual intel that could help accommodate the highly elastic demand of certain functions. Thanks to the distributed nature of data stores, this is not easy.

What if helpful information already exists within your data, waiting for you to explore it—and all you need is an adjustment in how you look at data propagation within the system?

First, let’s get back to basics.

Microservices

In Henry Ford's words, “Nothing is particularly hard if you divide it into small jobs.” While he was not referring to software architectures, this statement resonates well with the modern-day microservice design’s mantra of divide-and-conquer (or otherwise deploy).

Microservice architecture is a design methodology that focuses on building an array of single-function modules that integrate to form holistic functionalities through well-defined interfaces and operations. In recent years it has taken the DevOps community by storm, as it works well with Agile workflows, is scalable even with small and distributed development teams, and moves towards continuous integration and continuous deployment (CI/CD).

In contrast, the tried-and-true monolithic design architecture is a single, streamlined, autonomous application. It benefits from a much easier development and debugging workflow, but it is challenging to scale selective functions without scaling the entire application. This inflexibility makes microservice architecture attractive for applications with unpredictable needs concerning computing resource scalability.

Microservices work exceptionally well with loosely coupled single-function modules. Each microservice and its Dev team can decide on patterns while maintaining independence. For instance, services that perform data analytics can choose the RedShifts pattern. In contrast, other services that perform full-text searches can choose the ElasticSearch pattern with little impact on each other. This flexibility is appealing to software architects and, no doubt, also to developers.

As one would have expected, this flexibility comes with trade-offs. First, the benefits of the microservice architecture rely on maintaining inter-dependencies. The more loosely coupled they are, the easier they are to maintain. But as the application grows in scope and complexity, it becomes increasingly challenging to maintain proper decoupling. Second, making sense of the application's state can involve complex queries from multiple services and data aggregation. This complexity raises the question of database management, especially in a micro-services topology.

Event Sourcing: Is It Enough to Know Where We Are?

Imagine your coworker, Tom, took a vacation for a week. Upon Tom’s return, it’s normal to ask, “How was the trip?” This oversimplified example points out the obvious: it not only matters where we are now, but it’s also at least as necessary to know how we got here.

We briefly discussed how microservices allow the flexibility to pick a data store that makes the most sense. Most data stores use relational databases that capture the state, in other words, the most updated values for each table or record of the services. The state of the service holds tremendous value to the application, but it's not the whole story.

Among other things, there are two main drawbacks:

Updates to current data records can inadvertently overwrite the previous state. It is possible to keep some transactional records separately from the main data store, but this does not allow for the recreation or reversal of the system's state.
The state change itself does not capture the reason for the update. The cause of the state update is said to be implicit. The reason for data-record changes or associating meta-data is often just as valuable to data analytics.

Let’s walk through an example.

Consider an inventory system whose primary purpose is to keep track of the number of products stored in a warehouse. We first walk through the concepts with descriptive data constructs and then illustrate with pseudo-code for more detailed explanations.

Here we have the system records with four entries, namely the product identifier SKU, the most updated inventory count Unit-Count, the date the last inventory came in Date-incoming, and the date of last inventory shipped Date-outgoing.

[SKU Unit-Count Date-incoming Date-outgoing] starts with the following: [XYZ-1, 100, 2022-01-01, 2022-02-01]

Now, on March 1st, an additional 100 units arrived. Then on March 15th, 20 units were sold while another 50 units arrived. Finally, on April 1st, 20 more units were sold. An inventory count on the same date revealed a miscount previously—the system has 20 fewer units. This sequence of events would result in the records changing as follows:

[XYZ-1, 200, 2022-03-01, 2022-02-01] on March 1st
[XYZ-1, 230, 2022-03-15, 2022-03-15] on March 15th
[XYZ-1, 190, 2022-03-15, 2022-04-01] on April 1st

This is an accurate representation of the current state of the inventory count at any moment in time. However, it cannot reveal why the inventory changed. For instance, the records themselves could not provide information about the two separate events (-20 and +50 on March 15th). More importantly, the last state [XYZ-1, 190, 2022-03-15, 2022-04-01] gives no details on what happened previously. That information was lost! If there is data corruption in the data store, that would mean the inability to recreate the system's state unless performing a full backup.

Event sourcing tackles this at a fundamentally different level of data recording and capture. Rather than focusing on the current state, it records the events and stores them as streams of modifications. Take a look at the following equivalent sequence of events.

Stream SKU = XYZ-1:

[Initial state: Unit = 100]
[Unit received: Unit = 100, 2022-03-01]
[Unit sold: Unit = 20, 2022-03-15]
[Unit received: Unit = 50, 2022-03-15]
[Unit sold: Unit = 20, 2022-04-01]
[Unit Adjust: Unit = -20, 2022-04-01]

With this approach, we are grouping events based on a stream identifier, or aggregate streams, containing a complete history of the sequence of events over time. This information allows a system to run analytics on all our data. More importantly, this allows building the summary snapshots we showed by replaying the events in chronological order (or any order for that matter if so desired).

In the Context of Microservices

In the realms of microservices, this event sourcing approach effectively decouples the receiving and distribution of data. Event sourcing synchronously receives data as ordered data triggers. It then asynchronously distributes data to various microservices. The microservices can individually maintain a state-only data store.

The data store holds a record of the event process. It only allows for create (C), read (R), and delete (D) operations. This is significantly different from the typical CRUD operations where records can be updated (U). The event sequence is also maintainable as a state-based secondary presentation layer. This provides two formats of the same state or event for otherwise unachievable data analysis. More importantly, you can use this as a traceability audit trail when one presentation layer becomes corrupt. In other words, it’s possible to have a microservice storing the current state compared to the event sequence as a validation.

As an example, this also leads to the concept of an event store. The event messages are received and stored in a transactional database as transaction logs. The state persists as a series of events up to a certain point in the sequence. New events are appended but never overwritten, while each microservice has its respective state stored in its database. Making this effectively a self-auditable system. The event sequence functions as a historical data stream and the overall system contain sufficient information to store more than one state at a time.

Event Sourcing Level Up: Event Streaming

Event sourcing addresses specific issues of state-only data store architectures, but it’s still commonly viewed as a storage model. This is analogous to the concept of “data at rest,” where the reads and writes are concurrent through interpretation of the state at the micro-service level. This is often the case when the source of truth spans multiple sets of derived data that persist in other microservices.

What if there is a significant volume difference between reads and writes? A scenario that often occurs when massively parallel reads are required. Think of Amazon as an extension of the previous inventory example. Consider a scenario with millions of reads (through consumption of the data state or inventory count). The system must replicate or cache and distribute these read operations through horizontal scaling while maintaining concurrency and reducing data conflicts. This is the concept of “data in motion,” where events involve shared data models as streams. Here, there are many microservices with polyglot persistence you can engage.

Summing Up

We’ve touched on several ideas—so why event source?

The principles of capturing the progression of events and how it impacts the state of a data store allow specific data analytics or metric monitoring that would otherwise be impossible. For instance, the time between sales of particular products gives insights into product popularity that inventory count alone could not provide, such as the following:

A ledger or accounting-like system data store is an important event for specific applications where fail-safety and data traceability are critical. This is becoming increasingly attractive for financial applications.
Leveraging asynchronous operations of data distribution increases the ability to scale certain services to demand without total failure.
Eliminates the need to update operations, reducing data record conflicts. This is especially true with distributed data stores at the microservices level. At this level, concurrency and conflict are often computationally expensive to resolve.
At a business level, event sourcing is tremendously powerful for data transformation and representation in relational models. Moreover, event sourcing and streaming are critical in helping data representation at multiple locations or domains with high availability needs.

As developers and architects, these tasks can feel overwhelming. Developing it in-house may not be the right approach for most of you. This is especially true when the uniqueness of your application lies in delivering a service to your customers but not in managing databases and operational infrastructure. A platform as a service (PaaS), such as Kalix, reduces the complexities of cloud systems and edge applications where stateful and stateless microservices co-exist. A serverless model that abstracts that complexity could help bring event sourcing in a microservices architecture to your application—with ultra-low legacy and high resilience.

Through event sourcing, Kalix offers the ultra adaptability modern cloud-based software development companies need. Kalix gives you the freedom to ignore the underlying infrastructure and focus on the task at hand. Leveraging the latest microservices technology, Kalix enables developers to build high-performance, cloud-native applications.

Learn more about putting event sourcing to work for you with Kalix from Lightbend.