Data Mesh: A decentralized data platform

Crishantha Nanayakkara
3 min readOct 7, 2023

An Introduction

Overview

We have seen how enterprise application architectures especially the “operational plane” has evolved over the last few decades. We basically moved from Monolithic architectures, which is more centralized to Microservices architectures, which is more distributed.

You can see the same trend now happening in the “data analytical plane” as well by slowly shifting the trend away from a more centralized data analytics platforms (Data Warehouses to Data Lakes) to more decentralized platforms such as Data Mesh.

Enterprise Data Platforms

As explained above, we have been witnessing two basic eras in enterprise data platform architectures or in “data analyticalplane”.

  1. Generation 01 —> The Data Warehouses
  2. Generation 02 —> The Data Lakes

Some organizations, have spent their money heavily on moving to Data Warehouses and some companies directly moved to more robust Data Lakes bypassing Data Warehouses. Some have both architectures working together leveraging strengths of both to have a more of a hybrid model. However, both of these architectures have a centralized approach where you collect and process your data in a central manner.

Though this approach made your life easier in terms of its ability to access your data in a more centralized manner, it can create problems too. For example, data consumers can get overly dependent on a central data engineering team to get access probably to your own organizational data. On top of that, the data engineers who manage the central data repository (The warehouse or the lake) may not have the required business know-how to manipulate data consumer generated data. By looking at this, you can see that there is a clear problem of the ownership of the data.

The “data ownership” is a mandatory requirement for many regulatory institutions / ecosystems such as Governments, Banks, etc. With the advent of many “Data Protection Acts” involved in many regulatory environments, we cannot deny the importance of the “data ownership”. Hence, adopting a more decentralized approach to your own data is slowly becoming a must.

Data Mesh

In the simplest term, the Data Mesh is the microservices version of data analytics platforms. If you already know what microservices architecture does, the Data Mesh architecture also tries to do the same. So it is on the right direction if you take the enterprise application development as a whole.

With the “data mesh” approach, the data engineering team can easily break the entire data architecture into smaller, domain oriented components allowing them to be more agile and scalable. This is exactly maps with what we do with microservices as well.

With the “data mesh” approach, a group of experts within a selected “domain” manage its datasets within the “domain”. They are responsible to build “data products” within that “domain”, which can be later consumed as APIs within the organization or outside the organization, depending on the sensitivity of the data.

The respective “domain” basically owns and controls the “data products” under its purview. These “data products” could be disserminated via APIs (i.e. REST) just like you do in Microservies architecture.

Zhamak Dehghani [2], who conceptualized the Data Mesh design while she was at ThoughtWorks, explains its architecture with four key principles.

  1. Domain ownership of data — Domain teams basically own their data and will grant access by request
  2. Data as a product — Domain teams are responsible for the quality of data, unlike central data lakes had data pipelines to cleanse the data which are ingested
  3. Self service — Data is available via self service
  4. Data governance — Providing a framework for the accountability for data

Is it a silver bullet?

Data Mesh is never a silver bullet like any other software architecture pattern. It is recommended for a more complex and larger data architectures, which needs some agility, data ownership, etc. If your data architecture is a smaller one, with a less number of domains, with a lesser regulatory background, then having a Data Mesh may be an anti-pattern considering the effort and its complexity. This is same as what we experience in a typical Microservices adoption.

Hope you were able to get a basic understanding about what “Data Mesh” is trying to achieve. There is more to it. Lets drill them down in later articles.

Thank You!

References

  1. How to move beyond a Monolithic Data Lake to a Distributed Data Mesh — https://martinfowler.com/articles/data-monolith-to-mesh.html
  2. Data Mesh: Delivering Data-Driven value at Scale [Book] — By Zhamak Dehghani
  3. https://www.thoughtworks.com/en-us/insights/books/data-mesh

--

--

Crishantha Nanayakkara

Enterprise Architect, Consultant @ FAO (UN), Former CTO, ICTA Sri Lanka