Data Mesh: The decentralized data platform

Crishantha Nanayakkara
5 min readOct 7, 2023

--

An Introduction

Overview

As Enterprise Architects we have witnessed how enterprise software architectures have been evolved over the years.

The typical “operational plane” of a typical enterprise software architecture, evolved from a centralized monolithic design to a distributed microservices design. Now we are seeing the same paradigm shift slowly being adopted in the “data analytical plane” as well.

So, let's have a look at what it really means.

Enterprise Data Platforms

During the past few decades, we have been witnessing three (03) basic generations of enterprise data platforms.

  1. Generation 01 —> Data Warehouses
  2. Generation 02 —> Data Lakes / Lake houses
  3. Generation 03 — -> Modern Cloud Data Architectures (The Hybrid Approach)

In the first era, companies invested heavily by moving the organization wise data to Data Warehouses for data retrieval / analytical purposes. In the next generation, they slowly moved the data workloads to more robust Data Lakes / Data Lakehouses replacing legacy Data Warehouses. However, slowly most of the modern data architectures introduced a hybrid model by having the best of both architectures (The warehouse and the lake). However, these modern data architectures have a centralized data approach where you collect and process your data in a central manner.

Though this approach made your life easier in terms of its ability to access your data in a more centralized manner, slowly it started to show some cracks. For example, data consumers can get overly dependent on a central data engineering team to get access probably to your own organizational data. On top of that, the data engineers who manage the central data repository (The warehouse or the lake) may not have the required business know-how to manipulate data consumer generated data. By looking at this, you can see that there is a clear problem of the ownership of the data.

The “data ownership” is a mandatory requirement for many regulatory institutions / ecosystems such as Governments, Banks, etc. With the advent of evolving “Data Protection Acts” in many regulatory environments (especially in the public or banking sectors), we cannot deny the importance of data ownership and its governance. Hence, adopting a more decentralized approach to your own data is slowly becoming a must.

Data Mesh

The “data mesh” is more of a “process” than a “technology stack or and implementation style”. It is basically a data architectural approach that delegates the responsibility of enterprise data to those areas of business that have the expertise in them.

In the “data mesh” approach, the data engineering team can easily break the entire data architecture into smaller, domain-oriented components (Data Products) allowing them to be more agile and scalable. This is exactly maps with what we do with microservices as well.

With the “data mesh” approach, a group of experts within a selected “domain” manage its datasets within the “domain”. They are responsible to build “data products” within that “domain”, which can be later consumed as APIs within the organization or outside the organization, depending on the sensitivity of the data.

The respective “domain” basically owns and controls the “data products” under its purview. These “data products” could be disseminated via APIs (i.e. REST) just like you do in a typical Microservices setup.

Data Mesh Principles

Zhamak Dehghani [2], who conceptualized the Data Mesh design, explains its architecture with four key principles.

  1. Domain-driven decentralization
  2. Data as a product
  3. Self-service data platform
  4. Federated governance
Data Mesh Principles

Domain-driven Decentralization — Data is owned by those who know it best. These domain teams basically own their data and will grant access by request.

The Data Mesh — An Example

Data as a product — Domain teams are responsible for the quality of data, unlike central data lakes had data pipelines to cleanse the data which are ingested. These domain level data are shared as a product to another domain if required. These data are discoverable, addressable, trustable, and secure. A “Data Product” is known as a “Node” in a Data Mesh Domain. For example, the “Farmer” domain has three Data Products / Nodes. (See the Agriculture Data Mesh example below). Each of this product can work as a “microservice for the data world”. Each Data Product is an encapsulated component, which comprises as a combination of (data + code + infrastructure).

Data as a Product

Self-serve Data Platform — Data is available via centralized self-service platform.

Federated Data Governance — This principle is there to make sure all the autonomous data products under each domain are working together. This is established by having global standards such as data interoperability standards, that can apply to all data products in each domain in the data mesh. These are basically implemented and enforced centrally by the data mesh platform.

Federated Data Governance

Is it a silver bullet?

Data Mesh is never a silver bullet like any other software architecture pattern. It is recommended for a more complex and larger data architectures, which needs some agility, data ownership, etc. If your data architecture is a smaller one, with a smaller number of domains, with a lesser regulatory background, then having a Data Mesh may be an over-kill considering the effort and its complexity. This is same as what we experience in a typical Microservices adoption.

Hope you were able to get a basic understanding about what “Data Mesh” is trying to achieve. There is more to it. Let's drill them down in later articles.

Thank You!

References

  1. How to move beyond a Monolithic Data Lake to a Distributed Data Mesh — https://martinfowler.com/articles/data-monolith-to-mesh.html
  2. Data Mesh: Delivering Data-Driven value at Scale [Book] — By Zhamak Dehghani
  3. https://www.thoughtworks.com/en-us/insights/books/data-mesh

--

--

Crishantha Nanayakkara

Enterprise Architect, Consultant @ FAO (UN), Former CTO, ICTA Sri Lanka