A better runtime for component-based web applications

I have an idea but currently don't have the time or resources to work on it. So I'm sharing the idea here, hoping we can at least discuss it, and maybe someone will even feel inspired to take it on.

The idea is based on two predictions. First, I'm convinced that the future of web sites or web applications is component-based platforms (e.g. Drupal modules, WordPress plugins, etc). Second, I believe that the best way to deploy and use web sites or web applications is through a SaaS hosting environment (e.g. WordPress.com, DrupalGardens, SalesForce's Force.com platform, DemandWare's SaaS platform, etc). Specifically, I believe that in the big picture on-premise software is a "transitional state". It may take another 15 years, but on-premise software will become the exception rather than the standard. Combined, these two predictions present a future where we have component-based platforms running in SaaS environments.

To get the idea, imagine a WordPress.com, SquareSpace, Wix or DrupalGardens where you can install every module/plugin available, including your own custom modules/plugins, instead of being limited to those modules/plugins manually approved by their vendors. This is a big deal because one of the biggest challenges with running web sites or web applications is that almost every user wants to extend or customize the application beyond what is provided out of the box.

Web applications have to be (1) manageable, (2) extensible, (3) customizable and (4) robust. The problem is that we don't have a programming language or an execution runtime that is able to meet all four of these requirements in the context of building and running dynamic component-based applications.

Neither PHP, JavaScript, Ruby, Go or Java allow us to build truly robust applications as the runtimes don't provide proper resource isolation. Often all the components (i.e. Drupal modules, WordPress plugins) run in the same memory space. In the Java world you have Enterprise Java Beans or OSGi which add some level of isolation and management, but it still doesn't provide full component-level isolation or component-level fault containment. As a result, it is required that one component pretty much trusts the other components installed on the system. This means that usually one malfunctioning component can corrupt the other component's data or functional logic, or that one component can harm the performance of the entire platform. In other words, you have to review, certify and test components before installing them on your platform. As a result, most SaaS vendors won't let you install untrusted or custom components.

What we really need here is an execution runtime that allows you to install untrusted components and guarantee application robustness at the same time. Such technology would be a total game-changer as we could build unlimited customizable SaaS platforms that leverage the power of community innovation. You'd be able to install any Drupal module on DrupalGardens, any plugin on WordPress.com or custom code on Squarespace or Wix. It would fundamentally disrupt the entire industry and would help us achieve the assembled web dream.

I've been giving this some thought, and what I think we need is the ability to handle each HTTP request in a micro-kernel-like environment where each software component (i.e. Drupal module, WordPress plugin) runs in its own isolated process or environment and communicates with the other components through a form of inter-process communication (i.e. think remote procedure calls or web service calls). It is a lot harder to implement than it sounds as the inter-process communication could add huge overhead (e.g. we might need fast or clever ways to safely share data between isolated components without having to copy or transfer a lot of data around). Alternatively, virtualization technology like Docker might help us move in this direction as well. Their goal of a lightweight container is a step towards micro-services but it is likely to have more communication overhead. In both scenarios, Drupal would look a lot like a collection of micro web services (Drupal 10 anyone?).

Once we have such a runtime, we can implement and enforce governance and security policies for each component (e.g. limit its memory usage, limit its I/O, security permission, but also control access to the underlying platform like the database). We'd have real component-based isolation along with platform-level governance: (1) manageable, (2) extensible, (3) customizable and (4) robust.

Food for thought and discussion?

Comments

Cal Evans (not verified):

Dries,

This is an interesting concept. On the one hand, yes, treating each module/plugin as a web service would isolate it and allow for a lot more freedom when it comes to running untrusted code. On the other hand - and I'm sure you've considered this - the increase in CPU power would be astronomical. Any module/plugin that needs access to the core app would need to run it itself. (e.g. if a module needs access to part of the Drupal core, it would have to execute that code, and any code that depends on independently of the calling app. This would push CPU usage through the roof.

Sharing the core app code with the micro-kernel requests - in PHP, the compiled OPCODE - would be ideal, but then you are reintroducing trust issues. How do you verify that the OPCODE is the one that is running in the core app?

The module running in it's own micro-kernel would have to be able to make calls back into the app via some type of trusted API to fetch data, call functions, etc.

If only there were a core app that was designed from the ground up to be an API. Oh...wait... :)

Seriously though, I think that given the right architecture, a future version of Drupal and WordPress - via the new REST API plugin - should be able to work this way.

Of course I disagree that PHP can't be the language to do this with. :)

Cheers!
=C=

August 27, 2014
Chris Weber (not verified):

It's not ready for mass adoption yet, but I think web browsers will offer a key part of the solution when Web Components are widely adopted. See: http://webcomponents.org/

That would give us a way to describe the markup, behaviors and appearance of a component.

A good number of components will only be able to handle client-side logic. These are elements that are traditionally thought of as presentational. There will also be a number of data driven objects. Data driven objects than can have their data acquisition deferred could still be based on web components. Data driven components that need to be rendered on the screen with their data (ASAP) could also be based on web components. In short, everything can be based on web components.

In my talk at Drupal Camp Twin Cities I showed how it's possible today to make web components that consume Drupal 8's RESTful API.

Dries, while you talk about microkernels and being able to handle parts of this request with awesome HTTP powers, Web components wouldn't need anything more than what HTTP 2 supplies. HTML Imports provides a mechanism for importing knowledge about how to render custom elements. HTTP2 provides a great way of bringing in multiple resources without the need for the performance optimization tricks of combination and compression that we currently employ.

Web components might be a year or two away from today but when it lands it's going to take the world by storm.

August 27, 2014
nod_ (not verified):

Interesting perspective in the talk The Birth & Death of JavaScript (around 19:00 min) about an all-JS OS and put a noFlo-style UI on top of that to build it.

One can dream :)

August 28, 2014
Ashish Datta (not verified):

I have no idea what the internals look like, but my understanding was that Salesforce's Apex platform - https://developer.salesforce.com/page/An_Introduction_to_Apex is pretty close to achieving something like this.

August 28, 2014
Larry Garfield (not verified):

Congratulations, Dries, you just described micro-services. :-) Both the pros and the cons.

I don't think the performance is even the biggest hurdle. Every runtime gets faster every release, and Moore's Law is still in effect. There are ways to write code that can sort-of abstract, but not entirely abstract, the connection between two components such that they could be peer processes or on separate systems, although as Martin Fowler has pointed out you need to be *really* careful there since the performance trade-offs are different. (Basically, you'd have to assume separate systems and then falling back to a single system is an optimization, not the architectural default.) You could also use not-HTTP as the connecting layer, since HTTP has a fair bit of overhead and complexity. HTTP2 helps there, but not entirely.

The bigger challenge is the API. Architecturally, you're describing building not one system but dozens of systems that have to talk to each other over APIs. You've seen how we handle that in Drupal. Drupal 7 and earlier, it was through "here's raw data, good luck". Drupal 8, we've tried to split the system up with more robust internal APIs via interfaces. It's paying off but it's hard work. It also means you cannot have arbitrary extension points; you can only have defined extension points. If you decouple the system into multiple systems, you have even fewer extension points.

If you can predict the extension points you need, then you're fine. If not, that becomes extremely limiting.

(Side note: This was actually a topic of conversation at the Chicago Advanced Drupal Meetup Group last night, in the context of Drupal 9/10; how we build a system that can be decoupled or not, depending on the needs of a specific client.)

And to Chris' point, web components are only meaningful in a web-page context. The type of system Dries is describing would only be talking to a web page occasionally. By the time it becomes mainstream HTML 8 will be only a small part of what happens on the web.

A multi-process Flow-based programming approach is one possibility. However, you're then talking about dozens or hundreds of processes and that runs into massive complexity issues (before you even talk about performance). The Go community has been discussing that, as Go doesn't support dynamic linking making runtime plugins basically impossible.

To Cal's point, I will say that mod_php is totally incapable of this sort of environment. The bootstrap cost for each request would be a performance issue, certainly compared to any other system. Shared-nothing only gets us so far. The only way PHP could play in this space is via something like ReactPHP, where there's a persistent process.

That said, to get the robustness you need probably requires a completely different approach to programming. Something like... full on functional in a strongly typed language, where the compiler/runtime has far more control and therefore can protect itself and make optimizations and adjustments without the source code having to change.

Which brings me right back to Erlang, which does almost all of what I describe above. :-) Purely functional (or nearly so), natively distributed (whether on one system or many is irrelevant, because the code is intrinsically parallel), and robust enough that you can replace parts of the program at runtime without shutting the program down.

That's what it would take to build what you describe. Or probably the same concept built with some newer language that doesn't exist yet, which isn't built on programming language theory dating from the 1970s (as most popular languages are today, including PHP). Having your cake and eating it to is not something any modern language or architecture really manages.

That said... I am not sure I agree with your premise in the first place that self-hosted software is doomed. :-) Even if it may be more efficient to SaaS, there are strong social benefits to self-hosting and distributed-everything rather than centralization that we *must*, as socially responsible developers, keep in mind. Have a look at the implications of MaidSafe (http://maidsafe.net/) or Storj (http://storj.io/), long-term, on what SaaS even means.

August 28, 2014
Lode Vanstechelman (not verified):

Something to take into account as well are accesses on a filesystem level. Currently all modules in a multi-site setup are executed as the same user. So if any code would be allowed to be executed, then one could also write some code to browse to the web root and try to get access to other sites' data (e.g. database credentials, private files, etc.). So ideally every site in such a SaaS environment should be executed in its own filesystem chroot as well.

August 28, 2014
B K (not verified):

Dries,
You didn't mention .Net in the list of platforms. Is this an oversight, or do you think that .Net fits the bill? Running components as Windows Services seems to match the behavior you're describing..

August 28, 2014
Dries:

I just found this article called Micro-services and PaaS; it describes the micro-services architecture at Netflix. Compared to Netflix's micro-services, Drupal has "nano-services". Interesting read nonetheless.

Thanks for all the comments so far!

August 29, 2014
Marc Drummond (not verified):

This is actually sounds a little bit like the new extensions system in iOS 8. Apps will be able to provide extensions that can work inside other apps, but that extension is sandboxed so that it can't interfere with the other app beyond the ways that are allowed.

August 30, 2014
Rob Colburn (not verified):

If I may start with a competing platform. James Haliday of the node community has been evangelizing his concept of a "Federated Architecture". This is similar to Micro-Services though a bit more robust, as components of the Federated Architecture are not necessarily services (though they typically are).

Here's James's talk, but I'll briefly summarize it as it relates to component architecture.
http://youtube.com/watch?v=84PE6EF3YWY

Imagine we shattered a given Drupal site into it's components, what might we see.
* A service wrapping Entity/Field CRUD
* A service wrapping Authorization, talking to the the Entity service.
* A service delivering an HTML based interface to Entity Crud
* A service routing all administrative interfaces.
...
This goes on, and we move on to Views, etc. Let's say stock Drupal clocks in at ~services. Each service is it's own executable on node. They can be deployed / scaled as needed.

Components are composed as a service/executable or set of those.

The first thought, oh goodness that's a lot of applications, but they can all start on a single VM and provide deep dtrace level visibility into their performance. And, as the site grows they can be distributed as needed. There memory/CPU can managed because they are individual processes.

This gets a bit hard to imagine in the confines of Php, but composer provides the npm layer albeit somewhat lacking in npm features currently - but that's only pull requests. As far as the application structure hhvm points a way forward.

September 3, 2014
Dries:

Here is an interesting presentation on building nano-services with Java 8 and Java EE 7 (different from micro-services):

https://www.youtube.com/watch?v=FKpePxsp6g0

September 4, 2014

Add new comment

© 1999-2014 Dries Buytaert Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.
Drupal is a Registered Trademark of Dries Buytaert.