Data-driven Software Architecture
Part 3/3: Data-driven implementation of LinkedDataHub
October 28, 2020
In our previous posts we explained why the Web needs a data-driven approach and why organizations need to adopt data-driven architecture and a common data model. In this post we will talk about using RDF for data modeling and replacing imperative code with declarative technologies. We will then introduce you to the data-driven design behind our Knowledge Graph management system, LinkedDataHub.
We have described data-driven software architecture in high-level terms of putting data at the center instead of code and decoupling applications from data. To put it short, data-driven software allows you to build apps in a way that any differences (i.e. domain model, features, user interface) between these apps are encoded in application-specific data and not in the imperative code.
Such an approach, as we explained before, requires two processes to be set in motion: decoupling business logic from code and replacing imperative code with declarative technologies. But how is it done?
Decoupling business logic from code
Business logic is knowledge specific to a business domain codified explicitly in a machine-readable form. That includes models, taxonomies, rules, and constraints. Right now these models and constraints are usually implemented as object-oriented class libraries while the taxonomies might be managed in a relational database. This means that they are coupled with code or siloed in legacy data models, which hinders reuse across applications. This is a major problem which costs enterprises billions or even trillions of dollars globally.
This is where RDF comes to the rescue: a common data model with built-in interchange features and the flexibility to accommodate data from any domain. The methods for encoding domain knowledge in RDF are mature and have been developed and used for almost 20 years. There is an array of complementary technologies that can replace the costly legacy solutions:
|Use case||Legacy approaches||RDF approach||Specifications|
|Domain modeling||UML models||Ontologies with classes, properties, instances etc.||RDFS, OWL|
|Classifications, code lists||Tables with parent/child keys. Tables with nested sets.||Taxonomies with broader/narrower concepts. Less strict than ontologies.||SKOS|
||Graph validation against patterns/shapes||SPARQL, SPIN, SHACL|
|Rules||Stored DB procedures. Custom object-oriented implementations||Inference using RDFS/OWL reasoners. Explicit pattern-based rules||SPARQL, SPIN, SHACL|
From the architectural perspective, the technologies above are used in the Model component in the MVC architecture. With each such artifact (taxonomies, rules, constraints) extracted from legacy code and moved to RDF data, the Model grows thinner and thinner. Eventually, it becomes a generic, domain-agnostic processor that ingests ontologies, taxonomies and other application-specific data to provide a data-driven application.
Replacing imperative code with declarative technologies
There exists an environment for bringing data and functions together declaratively, that is, there is little need for new code.
Declarative technologies is another piece in the data-driven approach. To understand why they are important, we need to understand how they are different from imperative languages that are much more widely known.
Declarative code defines what outcome it expects without specifying how it should be achieved. Probably the most well-known example is an SQL query. It specifies the data projection we want to retrieve and the query processor can apply different algorithms and optimization techniques to produce it as long as we get the correct result. More than that, queries can be reused on different standard-compliant databases which are implemented in different programming languages and run on different operating systems (although standards-compliance across SQL database products is generally poor).
One of the biggest advantages of declarative programming languages is that they allow the developer to operate on a much higher level of abstraction, compared to implementing an algorithm imperatively from scratch. That is because the processing model is already defined in the language's specification and provides users with a convenient framework, allowing them to focus on the specific high-level objective.
Another advantage is that declarative programming languages can often be executed using parallelism, which improves performance. In contrast, using an imperative language the developers have to implement their own parallel algorithms, which are complex and bug-prone.
Domain-specific languages (DSLs) are probably the most well-known examples of declarative technologies, e.g. query languages (SQL, XQuery, SPARQL etc.), stylesheets (CSS, XSLT) and so on.
Below are some of the use cases where declarative technologies can replace imperative code.
|HTTP API||Linked Data Templates (more on that below), RDF/POST|
|RDF ETL||SPARQL (CSV2RDF, JSON2RDF), XSLT|
|User interface||XSLT, Interactive XSLT (Saxon-JS), RDF/POST|
The XML-based technologies have fallen out of favor over the years, but objectively the XML stack still comprises the most well-standardized and feature-rich data technologies. XSLT specifically is very powerful for both producing and consuming RDF data and has been continuously developed. For example, the latest version 3.0 supports streaming transformations which allow you to transform large files that do not fit into memory.
Implementing the data-driven approach with LinkedDataHub
LinkedDataHub—AtomGraph's open-source Knowledge Graph management system—was designed natively for the RDF data model and implements all the approaches we mentioned above:
- a generic low-code framework which does not require new code to implement new applications
- a uniform Linked Data API with an interactive user interface
- the use of declarative technologies
The Model in a LinkedDataHub application is simply an RDF triplestore (or any SPARQL-compliant datasource). Each application can also have its domain ontology, but it is not required to. No object-oriented layer is used to represent the model above the RDF API.
The Controller is based on our own Linked Data Templates (LDT) specification, which translates read-write Linked Data requests into SPARQL queries and updates. The Linked Data API is uniform but the mappings are application-specific and packaged into ontologies, which can be imported and reused by applications. The LDT ontology defines application behavior, not merely describes it as does Open API or Hydra. We see it as a modern implementation of the original Semantic Web vision of ontology-driven agents.
Main components of the Linked Data Templates architecture
The LDT applications themselves are defined using RDF; in other words, they are part of the Knowledge Graph.
Data-driven approach: the way forward
Our vision for LinkedDataHub is for it to become a common framework for Knowledge Graph exploration and management as well as no code/low code application development. We strive to empower end-users with novel, previously impossible features that are enabled by the data-driven architecture.
One such feature is set-based navigation, also known as parallax navigation.
Another one is a model store (similar to the PowerApps standard entity catalog). It will allow end-users to compose application models from entity definitions interactively, without having to write a line of code.
We are building a community of developers around LinkedDataHub, who create their own data-driven applications and provide us with their feedback. We are open to help everyone who wants to get involved and publish or explore their data as Knowledge Graphs. So, give LinkedDataHub a try and ⭐ our GitHub repositories.
We are also looking for early adopters from companies interested in the data-driven software architecture. If you are an organization looking for ways to manage your data as a Knowledge Graph, contact us for a demo of LinkedDataHub.