Farrago Architecture

Philosophy

Conceptually, an RDBMS is an unusual combination of two systems with very different natures. The low level system is a computation- and data-intensive kernel in which scalability, performance, and reliability are the key factors. The high level system is simply an application server in which the data model defines relational objects, and the "business logic" is SQL parsing, optimization, execution, and extensibility. Many DBMS implementations fail to recognize this dual nature and have serious implementation flaws as a result. Heavyweight commercial systems treat the entire server as a kernel, which makes development of higher level logic unnecessarily difficult as it is bound by constraints only suitable for the lower level system. Lightweight systems (e.g. most Java databases) go to the opposite extreme, focusing on simplicity rather than performance, which necessarily limits their applicability. One objective of the Farrago project is to demonstrate that it's possible to have it all without creating something too monstrous. Another is to build up a modular plugin framework so that other projects can use Farrago as a base for more specialized SQL engines.

Plugin Architecture

In line with these goals, the Farrago architecture defines a multi-language pluggable framework with system-level extensibility in a variety of directions:

The frameworks span a number of programming, scripting, and modeling languages:

UML is used for defining all metadata models and driving the model-driven code generation build process. Farrago uses the standard CWM metamodel from OMG as a base, and defines its own extension model (FEM) as well. Projects which want to customize the Farrago metadata can define their own model extensions. All definitions which are shared across multiple programming languages are defined in UML.
C++ is used in the server framework for components in which a native code implementation is required for performance, API access, or low-level system access. The C++ portion of the framework (known as Fennel) is not directly pluggable. Instead, extension C++ modules can be defined together with wrapper Java plugins as companions (interfacing via JNI). We use the term module to describe a component which is designed to fit into a predefined interface, while the term plugin is reserved for modules which can be added to an already-installed server via DDL commands. The C++ portion of the framework is optional; it is possible to build a pure-Java DBMS by extending the Farrago framework, although not all of the necessary components are currently provided.
Java is the preferred environment for extensibility in the server framework since as a managed code environment it is much safer than C++ and provides access to a large number of important API's. Java plugins may be pure-Java, or may include C++ modules called via JNI.
Beyond plugins defining the server's behavior, another layer of extensibility exists in support for user-defined types and routines (including stored procedures and SQL scripts).
Finally, access to the server from as many client environments as possible is another important direction for extensibility.

Component Stack

The diagram below illustrates at a coarse granularity the various layers involved in the component stack of the Farrago platform:

The Java VM can be a top-level container for a standalone server (with RMI from client JDBC drivers), or Farrago can be hosted by other containers such as J2EE application servers in a multi-tier configuration. The Farrago framework maintains SQL-specific state per connected session. Parsing, validation, and optimization work against the catalog, which combines a local MDR repository with an extensible namespace system. User-defined SQL/MED namespace plugins can be used to "mount" foreign data sources, causing them to appear as additional top-level catalogs. Query execution plans are implemented as a combination of C++ access paths and generated Java code, with access to both local storage and foreign data (via namespace support).

Technology

Today's best application server technology is Java-based, and Java support for extensibility through language features such as reflection is excellent, so high-level Farrago components are developed in Java. Kernel-level components are implemented in C++ for efficiency (even java.nio can't make up for the lack of pointers in Java). JNI is used to bridge the top and bottom halves.

Farrago relies on the following independent open-source projects (some of which depend in turn on other projects):

Fennel for the C++ kernel.
Boost and STLport for portable C++ class libraries.
Netbeans MDR for all metadata management (system catalog object model and persistence, XMI import/export, etc.)
JavaCC for Java parser generation
OpenJava for Java code generation
Janino for runtime compilation of generated Java code
VJDBC for client/server connectivity
sqlline for a command-line interface
HSQLDB for repository persistence and a GUI front end
Apache Jakarta Commons for various class libraries
JGraphT for graph-theory class libraries
ResGen for internationalization

The system catalog model is based on the Common Warehouse Metamodel, with Farrago-specific extensions.

Build/test tools:

Apache Ant (where would we be without this?)
JUnit for unit testing
Emma for code coverage
Jalopy for code beautification
Macker for architectural enforcement

In addition, our intention is to define adapters for embedding Farrago in various application servers. For a lightweight configuration, it will be deployable as a servlet in a server such as Tomcat. For a heavyweight configuration (e.g. with distributed transaction support and JMX xmonitoring), it will be deployable in a full-fledged application server such as JBoss. Currently supported containers are a standalone RMI server and direct embedding as a serverless JDBC engine.

End $Id: //open/dev/farrago/doc/architecture.html#15 $