Enabling Reliable and Scalable Event Streams in Distributed Systems


Introduction

Event streams provide a framework for processing and transmitting continuously generated data in distributed systems. Rather than relying on static batch files, streams enable real-time or near-real-time communication between producers and consumers. The challenge lies in efficiently transmitting these streams, ensuring scalability, reliability, and fault tolerance. This subchapter focuses on two key mechanisms: messaging systems and partitioned logs.


Messaging Systems

A messaging system provides the foundation for transmitting events between producers and consumers. Unlike direct communication methods (e.g., HTTP or TCP connections), messaging systems decouple producers from consumers by introducing an intermediary called a message broker.

How Messaging Systems Work

  1. Producers send events (messages) to a broker, often organized by topics.
  2. The broker stores these messages temporarily or persistently.
  3. Consumers subscribe to topics, receiving messages either in real-time or when they are ready to process them.

Key Features of Messaging Systems

  1. Durability with Message Brokers
    Message brokers, such as RabbitMQ and ActiveMQ, provide durability by writing messages to disk. This ensures resilience to failures, enabling consumers to retrieve messages that were stored before the crash of a producer or a consumer node.

  2. Load Balancing
    When multiple consumers subscribe to a topic, the broker can distribute messages in two ways:

    • Load Balancing: Distribute messages among consumers to parallelize processing.
    • Fan-Out: Broadcast all messages to every consumer subscribed to the topic.

Limitations of Messaging Systems

  1. Short-Term Storage: Traditional brokers are optimized for transient workloads and delete messages after they are acknowledged. Therefore, they are unsuitable for long-term message storage.
  2. Message Ordering: Depending on the configuration, messages may arrive out-of-order if brokers redistribute them following consumer failures.

Partitioned Logs

A log-based message broker builds on the durable and append-only log structure seen in replication and storage engines. Well-known examples include Apache Kafka and Amazon Kinesis. These systems address some of the challenges of traditional brokers by retaining messages long-term and providing better reliability when dealing with unbounded streams.

How Partitioned Logs Work

  1. Producers send events to a topic, which is divided into partitions for scalability. Each event in a partition is assigned a monotonically increasing offset, ensuring per-partition message order.
  2. Consumers independently read from assigned partitions at their own pace, tracking offsets to avoid reprocessing.

Advantages of Log-Based Brokers

  1. Persistence for Long-Term Availability: Unlike traditional brokers, logs retain messages until they are explicitly deleted. This allows new consumers to replay past events and catch up on historical data.
  2. Fan-Out Without Performance Loss: Multiple consumers can read the same data from partitions without affecting each other, enabling efficient stream processing and replication tasks.
  3. Efficient Sequential Reads: Consumers read sequentially from partitions, enabling high throughput.

Message Broker vs. Partitioned Log

Aspect Traditional Message Broker Log-Based Broker
Message Retention Messages are deleted after acknowledgment. Messages are retained until explicitly deleted.
Delivery Mechanism Push-based (broker pushes messages to consumer). Pull-based (consumer reads messages from log).
Message Ordering Limited guarantees, may vary during redelivery. Strong per-partition ordering guarantees.
Scalability Limited by broker processing capacity. Horizontal scalability via partitions.

Challenges in Event Transmission

Regardless of the mechanism, transmitting event streams presents inherent challenges:

  1. Backpressure and Overload
    • In traditional brokers, unbounded queues caused by slow consumers risk degrading overall performance.
    • Partitioned logs mitigate this by allowing streams to accumulate independently per partition.
  2. Crash Recovery
    • Brokers use acknowledgments to confirm message delivery, relying on replays to recover unprocessed messages.
  3. Distributed Order Guarantees
    • Partition-level ordering ensures integrity within each topic. However, maintaining order across partitions adds complexity and is generally avoided unless explicitly needed.

Conclusion

Messaging systems and partitioned logs represent two complementary approaches to transmitting event streams in distributed systems. Traditional brokers excel at transient workloads, while log-based brokers offer persistent storage, replayability, and scalability. By understanding the trade-offs between these mechanisms, engineers can design stream-based architectures that balance performance, reliability, and durability—key pillars for modern data-intensive applications.

Series Designing Data-Intensive Applications Part 35 of 41
  1. Designing Reliable Data Systems
  2. What is Scalability in Data Systems?
  3. Building Maintainable Software Systems
  4. Relational Model Versus Document Model
  5. Speaking the Language of Data- A Guide to Query Languages
  6. Unraveling Connections- Exploring Graph-Like Data Models
  7. The Backbone of Databases- Data Structures that Power Storage
  8. Transaction Processing vs. Analytics Let's understand the divide
  9. Understanding Column-Oriented Storage- A Deep Dive into Analytics Optimization
  10. Formats for Encoding Data
  11. Modes of Dataflow in Distributed Systems
  12. Leaders and Followers - The Core of Replication
  13. Problems with Replication Lag - Challenges and Solutions
  14. Multi-Leader Replication in Distributed Databases
  15. Leaderless Replication Flexibility for Distributed Databases
  16. Partitioning and Replication in Scaling Distributed Databases
  17. Partitioning of Key-Value Data- Strategies and Challenges
  18. Partitioning and Secondary Indexes- Balancing Efficiency and Complexity
  19. Efficient Methods for Rebalancing Data in Distributed Systems
  20. Ensuring Accurate Request Routing in Distributed Databases
  21. The Slippery Concept of a Transaction
  22. Exploring Weak Isolation Levels in Databases
  23. Achieving Serializability in Transactions
  24. Faults and Partial Failures in Distributed Systems
  25. Navigating Unreliable Networks in Distributed Systems
  26. The Challenges of Unreliable Clocks in Distributed Systems
  27. Knowledge Truth and Lies in Distributed Systems
  28. Consistency Guarantees in Distributed Systems
  29. Linearizability in Distributed Systems
  30. Understanding Ordering Guarantees in Distributed Systems
  31. Achieving Reliability with Distributed Transactions and Consensus Mechanisms
  32. Leveraging Unix Tools for Efficient Batch Processing
  33. MapReduce and Distributed Filesystems- Foundations of Scalable Data Processing
  34. Advancing Beyond MapReduce- Modern Frameworks for Scalable Data Processing
  35. Enabling Reliable and Scalable Event Streams in Distributed Systems
  36. Synchronizing Databases with Real-Time Streams
  37. Unifying Batch and Stream Processing for Modern Pipelines
  38. Integrating Distributed Systems for Unified Data Pipelines
  39. Unbundling Monolithic Databases for Flexibility
  40. Building Correct Systems in Distributed Environments
  41. Ethical Data Practices for Building Better Systems

Want to get blog posts over email?

Enter your email address and get notified when there's a new post!