Data-driven Software Architecture 1/3

The web continues to grow and expand in innovative ways. At the same time, few seem to notice that the way web applications are currently built is dragging us ever deeper into technical debt. Software-centric architecture and legacy technologies used to implement it are at odds with the potential of the web. At best they are wildly inefficient, and at worst they are heading us into a software apocalypse.

In this first part of our 3-post series on data-driven software architecture, we explain the problems with imperative code and API trends, and what they mean for the enterprise.

Legacy technologies

A lot has changed since the advent of the web, yet, paradoxically, the way we approach building applications and software hasn't. The architecture of end-user software applications has not significantly evolved since the 1970s, when the Model-View-Controller (MVC) pattern was adopted.

The problem with legacy systems

As commercial RDBMSs started appearing around 1980 they solidified the components of this pattern, roughly in the following way:

MVC framework

Model

Database with schema
Code that wraps access to the database schema, often using an Object-Relational Mapping (ORM)
Code implementing business logic

View: renders model data in the UI
Controller: changes Model state based on user input

The MVC continues to be the dominating pattern in web application frameworks to this day. Its most popular components, such as RDBMSs and ORMs, have become legacy. They stand no chance to be optimized for use on the web, since the web was invented decades later.

The problem with APIs

The world is on course to having a trillion programmable endpoints

Tyler Jewell

Enterprises worldwide are constantly developing new APIs to expose their services. As a result, the number of web applications and APIs continues to grow, ever expanding the use of legacy components. The proponents of the API economy might call it a "good thing" as they charge hundreds of dollars per hour as consultants in the self-serving industry built around API design and management.

What most of them fail to notice is that they are dragging themselves deeper into the technical debt. Let's be clear: a large number of different APIs is a liability. The compound complexity of the sheer amount of APIs makes system integration and interoperability increasingly difficult:

Most of the APIs use proprietary, non-standard vocabularies to describe their (meta)data.
Identifiers are local to the system and have no meaning in a broader context.
Recent API description formats such as Open API are helpful to developers writing client code, but that still means more code is required.
Sharing data across APIs requires code-based connectors for each API-to-API link, with the total number of connectors approaching exponential growth.

The problem with code

With the growing number of web applications and APIs, the size of their codebases grows continuously as well.

Since the Model and the Controller in MVC are usually implemented in imperative programming languages such as Java, C# or JavaScript, the more complex the model is or the more APIs an application exposes, the larger its codebase becomes. Large codebases are typically found in operating systems, databases and other critical software, but these days even an enterprise ERP system might contain millions of lines of code. It already takes hours to clone such a codebase from a version control repository as well as to build it. The growing complexity makes it impossible for individual developers to comprehend such systems; the number of bugs increases as it is proportional to the number of lines of code.

The reuse of models, or rather the lack of it, is another problem with imperative code. To quote Dave McComb of Semantic Arts, a proponent of the data-centric approach who has written a series of books on the topic:

There is virtually no reuse at the business concept level across applications, despite huge potential benefit for doing so.

Data-Centric Revolution

What is worrying is not the current situation, but the long-term trend. If the number of API and the size of codebases continues to grow at least at the current, if not exponential rate, where does this trend lead us to? Millions of APIs and billions of lines of code that takes weeks to build? Clearly the current approach will not scale, but we might not be able to notice, as humans are fundamentally bad at understanding exponential growth.

More generally, we can argue that the programming hasn't improved in decades. Most of the development is still done by typing code into a text editor, rather than using more interactive and model-driven interface paradigms.

Computers had doubled in power every 18 months for the last 40 years. Why hadn't programming changed?

The Coming Software Apocalypse

As software is becoming ever larger and more important in our digital society, some experts warn we are headed for a "software apocalypse". There must be a better alternative, which leads to less code and less complexity over time, not more.

Head to the next part of this series to read about a simpler and smarter way of handling enterprise data and application software. Follow our updates on Twitter and LinkedIn and join the conversation.