What is Event Sourcing architecture?

Sai Prasanth NG

Partner February 04, 2019

Scenario

Let’s consider a fictitious e-commerce application with the following architecture:

The UI layer communicates with the Product service to get a list of all the products, it communicates with the Cart service to add/remove/checkout items in a cart. The Cart service communicates with Order service to create an order when the cart is checkout. The UI layer communicates with the Order service to get details and the state of an order.

Let’s consider the Cart service. In a traditional CRUD based approach, we would model our database as follows

A table Cart to store the user_id and the corresponding cart_id for the user along with the state of the cart. Another table CartItems to store the products that have been added to the cart. If a product is added to the cart, an entry is made in the CartItems table and if an item is removed from the cart, the corresponding entry in CartItems table is removed.

We have implemented this and pushed it to production and let’s assume the business is going great. Now the business analyst team wants to analyze what’s the likelihood of people to buy products that they have removed from their cart before checking out the cart. We cannot get this information from our database as our database only stores the current state of the cart. We can answer this question by choosing a different architecture called Event Source architecture.

Event Source Architecture

“Event source” architecture is a type of implementation of the “Event Driven” architecture in which all the changes are stored as events in the database. “Event” is a representation of facts, facts can be ignored but facts cannot be retracted or deleted. Let’s continue with our example for the Cart service, the lifecycle of a cart based on events would be as follows

All the state changes to the cart are stored as events, to calculate the current state of the application we have to apply all the events sequentially. With the traditional CRUD based approach, the set above events would give us the perception that “2 items added to the cart and then 1 item was removed” is same as “1 item added to the cart”. The event source based architecture carries the actual perception that “2 items were added to the cart and then 1 item was removed from the cart”.

With this architecture, we can answer the question “What is the likelihood of people buying items that they have removed from the cart at later point in time”, we have to just go back and check all the ItemRemoved events and check if the same item has been part of the ItemAdded event at a later point in time. We can store these events in a traditional relational database without any issues.

The current state is stored in the memory by applying the events as they occur. It could become computationally heavy as the number of events increases, we can solve the issue by taking snapshots of events at regular intervals, we now only have to process the events that have occurred after the snapshot.

Points to keep in mind during implementation

Events should always be in the past tense as events are created after a change has occurred.
Events should only be created, they should not be deleted or modified.
Queries such as return all the carts whose value is greater than a particular amount will be computationally heavy.

We can solve this by applying the CQRS pattern i.e we can create projections of data and store it in a different database which can be used for such type of queries

Events are anonymous.
We need to have an Event upgrading strategy in mind i.e how to update all the old events if we decide to change the structure of an event.
Events should be broadcasted only within a bounded context
A command should be used when we know what should be done
An event doesn’t expect anything in return, it just broadcasts messages and anyone interested can subscribe to it.

Advantages

It forces us to think in business terms rather than the traditional CRUD based approach.
This architecture improves the performance as it involves only capturing the events and we do not update them.
It makes testing easy as we can travel to the past and apply the events one by one.
It acts as an Audit log.
We can project the data in different forms as per the requirement.
Scaling immutable data is easy as you have to just cache the data and don’t need to care about syncing the data to get the updates.
We can rebuild the Current State of the application from scratch.
We can determine the application state at any point in time.
It is easier to plug two systems together as they communicate with each other via events.
All the applications interested in the updates can subscribe to the update events and then can just update their own local cache of the data.

Disadvantages

It is tough to understand the flow of the system without observing the flow in production
It is an unfamiliar style of programming for most of the developers.