2021-07-01

What are message brokers?

A message broker, or message bus, is a solution that provides asynchronous data transmission from one system to another via an intermediate system.

In contrast to direct connections, where senders know the receivers to which data will be sent and connect directly to them, messaging solutions decouple the sending of data from data consumption. Senders do not know which receivers will see that data or when they will see it.

In addition to routing data, messaging systems can also provide capabilities distributed processing, flow control, and resiliency. One example is the concept of dead-letter queue or topic, a space to store messages that could not be delivered.

Messaging is a broad term that covers several models that differ based on how data is moved from senders to recipients. There are two primary categories of messaging models: message queue and publish-subscribe (often referred to as pub-sub).

Pros

Application decoupling. Using message brokers, services do not have knowledge of other services. They are notified of new events, process that information and produce/publish new information. This new information then can be consumed by the required service, or any number of services. Loose coupling allows microservices to be smaller and more independent.
Non-blocking asynchronicity. Microservices should perform as efficiently as possible, and it is a waste of resources to have many threads blocked waiting for a response. With asynchronous messaging applications can send a request and process another request instead of waiting for a response.
Scalability. Since each service is small and only performs one task, they should be able to grow or shrink as needed. Event driven architecture and messaging make it easy for microservices to scale since they are decoupled and do not block. This also makes it easy to determine which service is the bottleneck and add additional instances of just that service, instead of blindly scaling up all services, which can be the case when microservices are chained together with synchronous communications.
Greater resiliency. Messaging platforms that offer guaranteed delivery can act as the source of truth in the event of system failures and enable rapid recovery without data loss. In the case of service failures, the use of messaging allows healthy services to continue working since they are not blocked by the failed service. Once the failed service is back online, it will start processing the messages that had accumulated during the downtime, making the system eventually consistent. Additionally, the message broker handles the retry of unacknowledged messages which frees the services from network retry and error handling logic.
Backpressure control. Message brokers can control excess of traffic by storing large quantities of messages which consumers will poll at their own pace, alleviating load spikes on a system.

Cons

Message delivery issues. Messages can be duplicated or produced without order. To mitigate these issues, we can be deterministic and plan for it. For example, in the case of a duplicated message, by checking if the action has already been executed in the system, in such case the action would be skipped and the message acknowledged. And in the case of unordered messages, by performing the action and reacting with an error and not acknowledging the message, which then will be requeued and picked up again, which will lead to eventual consistency.
Unreliable data integrity. Favoring availability over consistency, the system will always return the most recent available version of the information, even if it cannot be guaranteed that it is up to date due to network partitioning.

Message queue

A message queue system consists of message producers, message queues, and message consumers. The queue receives messages created by a producer and ensures that each message for a given queue is delivered to and processed by exactly one consumer.

Message queues can support high rates of consumption by adding multiple consumers for each queue, but only one consumer will receive each message. Which consumer receives which message is determined by the implementation of the message queue. To ensure that a message is only processed by one consumer, each message is deleted from the queue once it has been received, processed, and acknowledged by a consumer.

Message queues support use cases where it is important that each message is processed only once, but it is not necessary to process messages in order, such as a task list or work queue. For example, in the case of network or consumer failures, the message queue will try resending an affected message at a later time (not necessarily to the same consumer) and as a result, that message may be processed out of order.

A publish-subscribe system consists of message publishers, topics, and message subscribers. The topic receives messages created by a publisher, but in contrast to a message queue, it allows each message to be delivered to multiple subscribers.

Publish-subscribe messaging systems ensure that each subscriber receives messages from a topic in the exact order in which they were received by the messaging system. Moreover, subscribers can have two types of subscriptions: ephemeral subscriptions, where the subscription is only active as long the subscriber is up and running, and durable subscriptions, where the subscription is maintained as long as it is not explicitly deleted.

Publish-subscribe use cases are frequently associated with stateful applications, where the order of received messages is important because they determine the state of the application and will, as a result, impact the correctness of whatever processing logic the application needs to apply.

Patterns

Outbox

In operations that modifiy the state of an application, and such change is communicated to other services, a failure to publish that event could lead to corruption.

In order to avoid that publication error, we can use the outbox pattern, which refers to the action of performing a state change and storing its related event message in a single transaction. Then, a process to publish the event will pull the message from storage and publish it to the message broker. Once it succeeds, it can delete the message or flag it as published.

A caveat is that each message will be published at least once, because if the step to delete or flag the published message fails, it could lead to an event published more than once.

Idempotent consumers

In order to avoid processing duplicated messages, consumers can be made idempotent.

One way to achieve it is to store the message id after successfully processing it, so then it can be checked before handling a new event and discarding it if it has been processed already.

Another way is to use an inbox component, where incoming messages are stored and acknowledged, and duplicated messages will be discarded. Then the consumer will pull messages from it and process and marking them as processed in a single transaction.

Coreography

For operations that span multiple services we can use event coreography, which splits the operation into a decentralized workflow of events where each service consumes messages, performs an action, and publishes messages. All services are reactive to other events in the system, and act accordingly.

Benefits & drawbacks of choreography:

No centralized logic
Useful for small/simple workflows
Difficult to conceptualize if a lot of services are involved
Harder to debug and test if a lot of services are involved

Orchestration

Another way to handle operations that span multiple services is to use orchestration, which is a way to centralize the workflow of logic for a business process. It coordinates the workflow by sending commands to the appropriate services, consuming the resulting events, and proceeding from there. Orchestrators also store state to know which steps of the workflow have occurred. Because of this, if there is a failure in the workflow, it can perform compensating actions to recover from a failure.

Benefits & drawbacks of orchestration

Centralized logic
Easier to understand the workflow since its defined in a central location
Full control over the workflow steps via commands
Easier to debug and test
Creates a single point of failure

Management

An essential aspect of message brokers is having visibility into how they perform. Some common metrics are:

Throughput: the number of handled messages per second.
Queue lenght: how many pending messages are to be consumed.
Processing time: the time it takes for a message to be processed by a consumer.

When running a message broker in a system, these metrics should be constantly monitored to proactively make changes in the face of problems to keep everything running smoothly.

If the influx of messages or the processing time increases, a bottleneck may occur depending on the number of consumers or allocated resources and the queue will fill up. If this is a temporary increase, the queue will act as a backpressure control while consumers poll messages at their own pace. A permanent solution is to add more consumers or increase consumer resources so more messages can be processed concurrently.

If some messages can not be processed (known as dead letters or poison messages), they can be sent to a dead letter queue to be handled as needed, either by logging their payload for inspection, sending them to a validation service, or any other action.

An important note is that throughput should be a system-wide metric and trying to increase it should not be done thoughtlessly. If consumers depend on downstream components such as a database, and has limited resources, the bottleneck will not be fixed, just transferred to another layer.