Overview

This document describes how logical Farrago table definitions are mapped to the default BTree-based reference implementation known as FTRS (Fennel Transactional Row Store). It is possible to plug in other mappings such as column-based storage using the CREATE LOCAL DATA WRAPPER command.

Limitations

Every FTRS table must have a primary key and exactly one clustered index (which need not be unique, and need not include the primary key). When a clustered index definition is omitted, the primary key is used. A table may have any number of unclustered indexes, both unique and non-unique.

The primary key requirement is not standard SQL (even though it ought to be). Eventually omission of a primary key will be allowed, and will result in generation of a hidden system-owned surrogate key.

Example DDL

We'll use a single table as an example:


create table ENLISTMENT(
    NAME varchar(128) not null,
    RANK char(3),
    SERIALNO integer not null constraint SERIALNO_PK primary key,
    PLATOONID integer not null)

create clustered index ENLISTMENT_CX on ENLISTMENT(NAME)

create index ENLISTMENT_PLATOONID on ENLISTMENT(PLATOONID)

Since unique constraints result in the creation of system-owned unique indexes for enforcement, three indexes result from this DDL statement. The constraint index will be referred to by its constraint name (SERIALNO_PK).

Each index has a defined key which is the column list specified by the DDL statement which created the index. For this example:

The defined key of SERIALNO_PK is (SERIALNO).
The defined key of ENLISTMENT_CX is (NAME).
The defined key of ENLISTMENT_PLATOONID is (PLATOONID).

Clustered Index Storage

The clustered index is implemented as a BTree which stores all of the table data in the order specified by the index definition. Here's some example data as it would be stored in the clustered index:

NAME	RANK	SERIALNO	PLATOONID
Boyle	CPL	1004	2
Carter	SGT	1001	2
Carter	PVT	1003	2
Lombardi	PVT	1002	1
Pyle	PVT	1000	1

Note that there is a duplicate in the defined key NAME of the clustered index. This is allowed since the clustered index was not specified as unique. However, we need a unique locator for every tuple stored, so we introduce the concept of an index's distinct key. For a unique index, this is identical to the defined key. For a non-unique clustered index, we append the columns of the primary key (leaving out any that were already referenced by the defined key). In this example, the distinct key for ENLISTMENT_CX is (NAME, SERIALNO).

Unclustered Index Storage

The distinct key for a clustered index can be thought of as a logical ROWID. In fact, this is how it is used when an unclustered index is stored. Each tuple in the BTree implementing an unclustered index consists of the unclustered index's defined key plus the columns of the clustered index's distinct key (again leaving out any redundant columns from the unclustered index's defined key). Here is the data stored for SERIALNO_PK:

SERIALNO	NAME
1000	Pyle
1001	Carter
1002	Lombardi
1003	Carter
1004	Boyle

Note that in this case, adding on the clustered index key does not make the unclustered index key "more unique" since it is already unique; but it does give us an indirect access path from SERIALNO to tuple: first search SERIALNO_PK to convert the SERIALNO into a NAME, and then use the (NAME,SERIALNO) combination to search ENLISTMENT_CX, which stores the values for all other columns.

For non-unique index ENLISTMENT_PLATOONID, adding on the distinct key of the clustered index serves both purposes:

PLATOONID NAME SERIALNO

1 Lombardi 1002

1 Pyle 1000

2 Boyle 1004

2 Carter 1001

2 Carter 1003

In this case of a non-unique unclustered index, the full tuple stored in the BTree forms the distinct key for the unclustered index. To summarize distinct keys:

The distinct key of SERIALNO_PK is (SERIALNO).
The distinct key of ENLISTMENT_CX is (NAME,SERIALNO).
The distinct key of ENLISTMENT_PLATOONID is (PLATOONID,NAME,SERIALNO).

Index Coverage

When processing a query, the optimizer chooses an access path based on what indexes are available. A filter on the clustered index defined key can be implemented as a direct search against the clustered index. For example,


select NAME,RANK,SERIALNO
from ENLISTMENT
where NAME='Pyle'

A filter on an unclustered index normally requires an indirect search as described previously. However, if the unclustered index "covers" all columns referenced by the query, then a direct search can be used and the clustered index can be ignored. For example,


select NAME
from ENLISTMENT
where SERIALNO=1000

In this case, SERIALNO_PK can be used for a direct lookup.

So, the coverage for a clustered index is the entire table, while the coverage for an unclustered index is the combination of its defined key with the distinct key of the clustered index. Examples:

The coverage of SERIALNO_PK is (SERIALNO,NAME).
The coverage of ENLISTMENT_CX is (NAME,RANK,SERIALNO,PLATOONID).
The coverage of ENLISTMENT_PLATOONID is (PLATOONID,NAME,SERIALNO).

Collation Key

In addition to coverage, the optimizer is also interested in sort order (e.g. for satisfying ORDER BY or merge join requirements). For each index, a collation key can be defined which is the list of columns according to which tuples are guaranteed to be returned in order by a sequential scan of the index. For a clustered index, this is identical to the distinct key. For an unclustered index, the collation key is identical to the coverage. Examples:

The collation key of SERIALNO_PK is (SERIALNO,NAME).
The collation key of ENLISTMENT_CX is (NAME,SERIALNO).
The collation key of ENLISTMENT_PLATOONID is (PLATOONID,NAME,SERIALNO).

End $Id: //open/dev/farrago/doc/design/TableIndexing.html#4 $