Unbundling Monolithic Databases for Flexibility

Introduction

Databases have traditionally consolidated functionalities, combining features like storage, indexing, query processing, and replication into one integrated system. However, as application requirements diversify, handling everything in one database becomes impractical. Unbundling databases proposes breaking monolithic systems into smaller, specialized components. These components work collectively but focus on mastering specific responsibilities, such as full-text search, analytics, or change data capture.

The unbundling approach not only increases flexibility but also aligns with the Unix philosophy of modularity—combining small tools with well-defined purposes to create reliable, scalable systems.

Why Unbundle Databases?

The one-size-fits-all database model often falls short when the application requires niche capabilities that a general-purpose DBMS cannot excel at (e.g., full-text searches, graph traversals, or machine learning). Breaking down databases enables:

Specialized Performance: Each component is optimized for specific workloads, improving performance across diverse access patterns.
Resilience and Scalability: Independent components reduce complexity and enable more fine-grained fault-tolerance strategies.

The Role of Derived Data

Unbundling databases aligns closely with the derived data architecture.
Secondary indexes, materialized views, caches, and full-text indexes may be externalized rather than built into a single database, using tools most suited to their tasks (e.g., Elasticsearch for full-text indexing).

This approach retains the flexibility to build pipelines that use multiple components, balancing performance and functionality across specialized systems.

Composing Data Storage Technologies

Unbundling databases means using different specialized systems and coordinating them for broader workflows. Components in an unbundled architecture include:

Materialized Views: Precomputed aggregations for fast query response.
Replication Logs: Coordinating data synchronization across multiple storage systems.
Search Engines: External systems (such as Elasticsearch) tailored for full-text or fuzzy queries.

Unbundling emphasizes breadth—providing broad coverage for multiple workloads—over deep optimization for individual use cases.

Example:
Imagine using distributed blob storage to hold raw datasets, a graph database for social network queries, and Kafka streams for log-based synchronization, all integrating seamlessly for specific application needs.

Challenges in Unbundled Systems

While unbundling databases has advantages, challenges arise in orchestrating these components:

Dataflow Complexity: Unbundling demands careful attention to how data flows between systems, possibly requiring custom code for communication and synchronization.
Write Synchronization: Ensuring every system participating in unbundling receives writes reliably (e.g., through event logs) is crucial. A single disagreement may lead to data divergence.
Operational Overhead: Each piece of infrastructure introduces its own operational quirks (e.g., scaling, fault tolerance), which can increase administrative costs.

Designing Applications Around Dataflow

In an unbundled database world, applications are seen as derivation functions feeding off a stream of state changes. These flows involve:

Event Logs for Dataflow Coordination: Systems like Kafka or Pulsar provide a write-ahead log to synchronize streams with downstream systems like caches or materialized views.
Continuous Derivations: Secondary indexes, full-text search indexes, and cached views are derived in near real-time through automation, minimizing manual coordination.

Dataflow-friendly architectures emphasize loose coupling between tools while offering predictable results.

Conclusion

Unbundling databases enables organizations to leverage highly specialized tools for diverse workloads while maintaining scalable and resilient systems. Although unbundling introduces challenges in coordination and integration, its modular approach fosters long-term extensibility. By managing derived dataflows and embracing the diversity of modern database tools, engineers can create systems tailored to the evolving complexities of data processing. This trend is set to grow as applications increasingly prioritize flexibility and performance.

Unbundling Monolithic Databases for Flexibility

Introduction

Why Unbundle Databases?

The Role of Derived Data

Composing Data Storage Technologies

Challenges in Unbundled Systems

Designing Applications Around Dataflow

Conclusion

More to read

Ethical Data Practices for Building Better Systems (Mar 12, 2023)

Building Correct Systems in Distributed Environments (Mar 3, 2023)

Integrating Distributed Systems for Unified Data Pipelines (Feb 18, 2023)

Unifying Batch and Stream Processing for Modern Pipelines (Feb 11, 2023)

Unbundling Monolithic Databases for Flexibility

Introduction

Why Unbundle Databases?

The Role of Derived Data

Composing Data Storage Technologies

Challenges in Unbundled Systems

Designing Applications Around Dataflow

Conclusion

Want to get blog posts over email?

More to read

Ethical Data Practices for Building Better Systems (Mar 12, 2023)

Building Correct Systems in Distributed Environments (Mar 3, 2023)

Integrating Distributed Systems for Unified Data Pipelines (Feb 18, 2023)

Unifying Batch and Stream Processing for Modern Pipelines (Feb 11, 2023)