Full Stack Blog – Data-oriented architecture

22 May 2023

Data-oriented architecture

What is this?

Data-oriented architecture

This is an architecture with single logical storage for data and loosely coupled services. Communications between services should be organized through data layer.

"Data-oriented architecture" - means, think about data and storage first and then about services to process this data. This is true for the applications where data is the main value.

In Data-Oriented Architecture (DOA), systems are still organized around small, loosely-coupled components, as in SOA microservices. taken from here

Data-Oriented architecture may be the solution to the problem when the system contains too many communications between services, which overly complicates its development and support.

Benefits

  • Reduce integration problem
  • Single source of truth
  • Complex calculations can be performed closely to the data

Problems

  • Data schema should be designed for data in whole
  • One point of failure
  • "Data Is Forever", hard to remove data

Microservices & Shared database

We can compare microservices and data-oriented architectures. In example below we can see the key differences between them

data oriented architecture

In true microservices architecture we should use different storages for each service to isolate data. In case of data-oriented architectures we are using one logical storage which can be implemented as database cluster or as a set of different systems to store data.

You can call this as shared database pattern for microservices. But, "shared database" it is still microservices. It is still set of services which interact with each other. We still design system as set of services and only after that design data storage. In DOA we should design data first, design data with an eye on business processes.

shared database pattern: each service freely accesses data owned by other services using local ACID transactions. data owned by service it is not shared data

In DOA the data can't be owned by any service, this is a root part of the application.

Approaches and tools

Multi-model database

To build data layer in DOA only with one database component we should have very functional and powerful database. It would be great to have ability to store different types of data and have scalability, durability and etc...

Some interesting examples that can help us:

  • ArangoDB - multi-model database system since it supports three data models (graphs, JSON documents, key/value)
  • MarkLogic data platform - it is multi-model database which can store xml, json, binary data. Semantics store with with RDF. Supports XQuery and JavaScript.

GraphQL

GraphQL can be used as abstraction layer on top of databases or services which stored data. This abstraction layer can be used as monolith datastore which hold all data behind. All other services will interact with each other through GraphQL API as with data layer.

data oriented architecture GraphQL

Component-to-component interactions

In DOA we should provide ability for services to interact with each other through the data layer, becouse in DOA we are trying to reduce direct communications between services.

NOTE: Direct communications between services is not prohibited. But, in DOA we try to reduce the number of such connections to zero.

Communications through data layer can be achieved with producer/consumer pattern. For example:

producer/consumer Pattern

or, we can organize processing pipeline with data exchange via data layer:

processing pipeline

here, we are using each service as a step which transforms data in our pipeline.

Example with plant knowledge graph

As an example for this article, let's use knowledge graph for plant data.

Description

the main purpose of the application

Store and provide access to data which related to plants (trees, flowers, etc...).

key functions

Knowledge graph should only do two things: it should store the data and provide access to that data.

  • Store different types of data

    • Plant descriptions
    • Plant images. Metadata and binary
    • Plant sets
    • Plant properties. A lot of different types of properties
    • Plant nurseries. Address, description
    • Nursery owner. Name, contacts
    • Relations between all content types
    • Park. Name, description, address
    • Relations: Park->Plant sets->Plant
    • etc...
  • Discovery functions

    • Users should be able to find any data
    • Recommendations

Algorithms here, of course, very important, but, as you can see, data it is main component in this system. As result, we can try to use data-oriented architecture. We can design and work on data quality and not on services and their interactions.

Solution example

plant knowledge store with Data-oriented architecture

Here we isolate our data in one single logical storage with only one access method through data access layer. All data was divided in two repositories. First database are used for all non-binary data like json, text, graphs. Second - it is binary storage for plant images.

benefit:

  • Single source of truth
  • The data schema will be designed as a complete repository. This makes it easier to add, improve the quality and connectivity of data, maintain and correct errors in data.

Connections between services is not good idea, but they are not restricted in this architecture. And we are using it for "plant-name-parser" because each request to this service very simple and can be processed very quickly in realtime. An easier way from an implementation point of view is to have a direct connection to this service.

More interesting materials

Data-Oriented Architecture (Rajive Joshi, Ph.D.)
Data-Oriented Architecture
Data-Oriented Architecture (2020)

Conclusion

Would be great to have a feedback for this article. What is wrong here or any mistakes, or maybe I am wrong in some thoughts. Or you have an arguments why this architecture - it is wrong choose.