net.sf.farrago.catalog
Class FarragoColumnHistogram

java.lang.Object
  extended by net.sf.farrago.catalog.FarragoColumnHistogram
All Implemented Interfaces:
RelStatColumnStatistics

public class FarragoColumnHistogram
extends Object
implements RelStatColumnStatistics

FarragoColumnHistogram reads and interprets statistics for a column of a Farrago column set. An instance of this class is returned to summarize the result of applying predicate(s) to a column.

TODO: Review statistics analysis for handling of null semantics. Null values are less than all other values (e.g. bars might contain the starting values null,0,1,...). Because stats analysis is based on ranges, only ranges which include consecutive bars are supported. Examples:

Version:
$Id: //open/dev/farrago/src/net/sf/farrago/catalog/FarragoColumnHistogram.java#10 $
Author:
John Pham

Nested Class Summary
private static class FarragoColumnHistogram.HistogramBarCoverage
          Describes which points and ranges lie on a histogram bar
private  class FarragoColumnHistogram.HistogramRange
          Histogram range represents a set of bars in a histogram
 
Field Summary
private  int barCount
           
private  List<FemColumnHistogramBar> bars
           
(package private)  Double cardinality
           
private  FemAbstractColumn column
           
private  FemColumnHistogram histogram
           
private  Timestamp labelTimestamp
           
(package private)  Double selectivity
           
private  SargIntervalSequence sequence
           
 
Constructor Summary
protected FarragoColumnHistogram(FemAbstractColumn column, SargIntervalSequence sequence)
          Deprecated.  
protected FarragoColumnHistogram(FemAbstractColumn column, SargIntervalSequence sequence, Timestamp labelTimestamp)
          Initializes a column statistics reader.
 
Method Summary
private  boolean checkEndpoint(SargEndpoint endpoint)
          Check if the given SargEndpoint is infinite or is bounded by a literal expression.
protected static int compare(String histValue, RexLiteral coordinate)
          Compares a value in a histogram with a Sarg coordinate
protected  void evaluate()
          Analyzes column histogram to determine the selectivity and cardinality of the specified search condition.
 Double getCardinality()
          Estimates the number of distinct values returned from a relational expression that satisfy a given condition.
private  List<FarragoColumnHistogram.HistogramBarCoverage> getCoverage(SargIntervalSequence sequence)
          Computes the histogram bar coverage of an ordered sequence of intervals.
 Double getSelectivity()
          Estimates the percentage of a relational expression's rows which satisfy a given condition.
private  void readCoverages(List<FarragoColumnHistogram.HistogramBarCoverage> coverages)
          Reads collective coverages finally make estimates on requested attributes.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

column

private FemAbstractColumn column

sequence

private SargIntervalSequence sequence

labelTimestamp

private Timestamp labelTimestamp

histogram

private FemColumnHistogram histogram

barCount

private int barCount

bars

private List<FemColumnHistogramBar> bars

selectivity

Double selectivity

cardinality

Double cardinality
Constructor Detail

FarragoColumnHistogram

protected FarragoColumnHistogram(FemAbstractColumn column,
                                 SargIntervalSequence sequence)
Deprecated. 

Initializes a column statistics reader. The statistics are not actually analyzed until the user calls evaluate().

Parameters:
column - column to analyze
sequence - optional predicate on the column

FarragoColumnHistogram

protected FarragoColumnHistogram(FemAbstractColumn column,
                                 SargIntervalSequence sequence,
                                 Timestamp labelTimestamp)
Initializes a column statistics reader. The statistics are not actually analyzed until the user calls evaluate().

Parameters:
column - column to analyze
sequence - optional predicate on the column
labelTimestamp - the creation timestamp of the label setting that determines which set of stats should be used; null if there is no label setting
Method Detail

getSelectivity

public Double getSelectivity()
Description copied from interface: RelStatColumnStatistics
Estimates the percentage of a relational expression's rows which satisfy a given condition. This corresponds to the metadata query RelMetadataQuery.getSelectivity(org.eigenbase.rel.RelNode, org.eigenbase.rex.RexNode).

Specified by:
getSelectivity in interface RelStatColumnStatistics
Returns:
an estimated percentage from 0.0 to 1.0 or null if no reliable estimate can be determined

getCardinality

public Double getCardinality()
Description copied from interface: RelStatColumnStatistics
Estimates the number of distinct values returned from a relational expression that satisfy a given condition.

Specified by:
getCardinality in interface RelStatColumnStatistics
Returns:
an estimate of the distinct values of a predicate or null if no reliable estimate can be determined

evaluate

protected void evaluate()
Analyzes column histogram to determine the selectivity and cardinality of the specified search condition.


getCoverage

private List<FarragoColumnHistogram.HistogramBarCoverage> getCoverage(SargIntervalSequence sequence)
Computes the histogram bar coverage of an ordered sequence of intervals. Coverage can only be computed if the end points of each interval in the sequence are either literal or infinite.

Parameters:
sequence - sequence to lookup
Returns:
List of HistogramBarCoverage instance of null if coverage cannot be computed.

checkEndpoint

private boolean checkEndpoint(SargEndpoint endpoint)
Check if the given SargEndpoint is infinite or is bounded by a literal expression.

Parameters:
endpoint - the endpoint to evaluate
Returns:
true if the endpoint is infinite or bounded by a literal; false otherwise

readCoverages

private void readCoverages(List<FarragoColumnHistogram.HistogramBarCoverage> coverages)
Reads collective coverages finally make estimates on requested attributes. This implementation looks at each bar separately, estimating how much of each bar is covered. It then accounts for that bar's contribution to the entire results.

Parameters:
coverages - list of coverage for each bar

compare

protected static int compare(String histValue,
                             RexLiteral coordinate)
Compares a value in a histogram with a Sarg coordinate

Parameters:
histValue - a histogram value
coordinate - a sarg coordinate, or a null pointer to represent the null value. An infinite coordinate is not recognized by this function.
Returns:
-1 if value is less than point, 0 if value equals point, or 1 if value is greater than point