Overview
The architecture
doc mentions that Farrago Java code communicates with the Fennel
C++ kernel via JNI. This document presents the detailed design for
that integration.
Design Goals
The JNI interface wrapping Fennel is necessarily quite wide, as it has
to deal with a variety of storage objects and operations which will be
extended over time. This implies that the interface should be
designed carefully with a number of goals in mind.
- Programmer productivity: JNI programming is very tedious and
error-prone. Everything possible should be done to make life easy for
those who have to define and maintain this interface.
- Safety: JNI programming errors can result in undetected
corruption of the JVM and/or the storage manager, and weak typing
makes it very hard to detect these at compile time. The interface
design should attempt to counteract these problems.
- Expressiveness: It must be easy to exchange complex metadata
between Java and C++ code.
- Documentation: The interface must be well documented since it is
accessed by many Farrago components.
- Integration Completeness: The JNI interface must provide
integration support for all aspects of server execution (e.g. tracing,
exception handling, internationalization).
- Efficiency: Interface methods which are on the critical path for
query execution must be implemented with minimal overhead.
Approach
A well known approach which addresses most of the above design goals
is the proxy/peer system in
which C++ code is generated from Java interfaces. For Java code that
needs to call C++, the developer fills in implementations for
generated peer classes. Conversely, C++ code that needs to access
Java makes calls on generated proxy objects which wrap hidden JNI
method invocations. The code generation step addresses productivity,
safety, and expressiveness.
Generic tools such as Jace
are available which automate the Java-to-C++ code generation process.
For Farrago, a lightweight homebrew solution was developed along
similar lines, but tailored to take full advantage of available metadata.
The approach starts with a UML model for the data structures to be
communicated between Java and C++. This model is a subpackage of FEM
(TODO: link to FEM docs), so from it Java interfaces and
implementation classes are generated automatically as part of the
overall catalog build process (TODO: link to build docs). However,
these classes are marked with the
org.netbeans.mdr.transient
tag, so instances created at runtime will never be stored in the
catalog.
Once the MDR-based catalog is created, a custom C++ code generator is
run which produces C++ proxy classes (peers are not yet supported as
there has been little need so far). The code generator uses a
combination of Java reflection and JMI metadata access to transform
the source model. The available metadata allows us to take full
advantage of the type safety afforded by C++ generics. For example,
in Java, an JMI association is accessed via a weakly-typed Collection,
requiring typecasting. The code generator knows the true type of the
association end, so it generates an appropriate instantiation of a C++
template which hides all typecasting. The generated classes make use
of a small runtime framework for calling JNI.
TODO: diagram
It should be noted that the choice of this model-driven approach
brings some design goals into competition. The extra complexity of
dealing with a modeling tool introduces a learning curve and may add a
small drag on productivity (although it's still far superior to
straight JNI programming). However, the benefits for safety,
documentation, and expressiveness are worth the extra hassle.
FennelStorage Native Interface
The only Java class which declares any native methods is
net.sf.farrago.fennel.FennelStorage (TODO: javadoc link). Only a few
native methods are defined. The most important one is a generic
execution method for the command pattern described in the next
section. The other methods are special cases for efficient execution
of tuple streams.
Commands
Execution of Fennel storage manager operations is accomplished via the
Command pattern.
Java code instantiates a command object describing the operation to be
performed, and then passes this object to FennelDbHandle.executeCmd.
C++ code interprets the command via the corresponding generated
proxies and executes it. Here's a UML diagram of the command class hierarchy:
TODO: links to detailed docs for available commands
Handles
Fennel commands usually need to refer to existing storage objects such
as databases and transactions, and sometimes create new ones. These
inter-command references are accomplished via handles. A handle has
two parts:
- a dynamically allocated C++ object which stores the handle state,
including references to other underlying storage objects
- a Java object (instances of class
net.sf.farrago.fem.fennel.FennelHandle) which contains a long integer
representing the pointer to the C++ object
The command model defines associations which allow commands to refer
to input handles or return output handles:
Since specific associations are defined between individual command
and handle subclasses, safety is guaranteed (i.e. a
command can't accidentally pass a transaction handle where a stream
handle is expected).
Commands may refer to more than one handle (e.g. CmdRollback always
refers to a transaction handle, but may also refer to a savepoint handle).
Complex Return Types
In most cases encountered so far, complex structures are passed from
Java to C++ but not in the other direction. The current code
generation infrastructure supports proxies with limited mutator
support. Mutators are only generated for attributes with simple types
(primitives and Strings) whose names begin result. The few
instances where the storage manager must return complex information
are special-cased, e.g. by constructing an XMI string which is
transformed into Java object representation via the MDR import
facility.
Tuple Stream Definition
One of the commands, CmdOpenTupleStream, deserves individual mention.
Its single innocent-looking attribute, tupleStreamDef, is really an
entire submodel. Execution of CmdOpenTupleStream results in the
construction of an entire query execution graph of specialized
TupleStream nodes. The only graph topology currently supported is a
tree, which is why the Input/Consumer association in the model below
is 1-to-n rather than m-to-n:
The leaf nodes derived from TupleStreamDef in the above inheritance
hierarchy represent instantiable Fennel TupleStream types. When
Fennel interprets CmdOpenTupleStream, it walks the recursive
TupleStreamDataflowDef association, constructing the appropriate type
of TupleStream for each node visited, initializing it with the
defined parameters. It also ties the streams together into a dataflow
graph mirroring the TupleStreamDef associations and returns a handle
to this graph as the result of the command.
Tuple Stream Execution
When a TupleStream graph is executed, it may process a very large
number of tuples, and in some cases, these tuples must flow through
the Java virtual machine for filtering, transformation, etc.
FennelStorage defines a separate native interface for this purpose;
the interface is designed for efficiency to the extent allowed by
Java. There are two cases to consider: Java processing of tuples
produced by Fennel, and Fennel processing of tuples produced by Java.
From Fennel to Java
For this case, FennelStorage provides the tupleStreamFetch method,
which takes a stream handle and a byte array as input. The Fennel
implementation fills the byte array with data in the same tuple format
used by Fennel internally. Tuples are stored contiguously, and only
complete tuples are returned. A separate method tupleStreamDescribe
can be called to retrieve a physical description of the stream output
format. This, together with java.nio.ByteBuffer, is used by
implementations of net.sf.farrago.query.FennelTupleReader to unmarshal
the data returned by Fennel.
From Java to Fennel
In the opposite direction, things are a little more complicated.
First of all, the stream definition must tell Fennel how to call back
into Java in order to retrieve tuples. This is accomplished by
specifying an instance of JavaTupleStreamDef in the stream definition
passed to CmdOpenTupleStream. [TBD: Java object handles or class names
or whatever ends up being used.] Fennel uses the attributes in
JavaTupleStreamDef to locate an instance of Java class
JavaTupleStream. During query execution, when a consumer of this
stream requests tuples, Fennel makes a call to
JavaTupleStream.fillBuffer, passing a ByteBuffer. (Note that this
call sequence bypasses the usual proxy code generation infrastructure
for control and efficiency.) This ByteBuffer is actually a direct
reference to C++ memory compliments of java.nio, eliminating the need
to copy (TBD: why we don't do the same thing in the tupleStreamFetch
case). The JavaTupleStream implementation writes into this ByteBuffer
via an instance of net.sf.farrago.query.FennelTupleWriter which has
been provided with the target tuple format. Once the buffer is filled
or no more tuples are available, JavaTupleStream.fillBuffer returns to
Fennel, which continues execution with the consumer stream.
TODO: diagram of a tree involving dataflow in both directions
Configuration Parameters
TBD
Tracing
TBD
Exception Handling
TBD
Internationalization
TBD
End $Id: //open/dev/farrago/doc/design/jni.html#3 $ |