net.sf.farrago.ddl
Class DdlAnalyzeStmt

java.lang.Object
  extended by net.sf.farrago.ddl.DdlStmt
      extended by net.sf.farrago.ddl.DdlAnalyzeStmt
All Implemented Interfaces:
DdlMultipleTransactionStmt, FarragoSessionDdlStmt

public class DdlAnalyzeStmt
extends DdlStmt
implements DdlMultipleTransactionStmt

DdlAnalyzeStmt is a Farrago statement for computing the statistics of a relational expression and storing them in repository.

The following data are collected:

This implementation issues recursive SQL.

Version:
$Id: //open/dev/farrago/src/net/sf/farrago/ddl/DdlAnalyzeStmt.java#25 $
Author:
John Pham, Stephan Zuercher

Nested Class Summary
private  class DdlAnalyzeStmt.ColumnDetail
          ColumnDetail stores details about a column being analyzed.
private  class DdlAnalyzeStmt.ColumnHistogramBar
          Class used to store column histogram bar information
private  class DdlAnalyzeStmt.Histogram
          Class used to store column histogram information
private  class DdlAnalyzeStmt.IndexDetail
          IndexDetails stores details about an index being analyzed.
 
Field Summary
private  List<DdlAnalyzeStmt.ColumnDetail> columnDetails
           
private  List<CwmColumn> columnList
           
private  Map<FemAbstractColumn,DdlAnalyzeStmt.ColumnDetail> columnMap
           
private  boolean computeRowCount
           
private static int DEFAULT_HISTOGRAM_BAR_COUNT
           
private  boolean estimate
           
private  FemAbstractColumnSet femTable
           
private  long femTableDeletedRowCount
           
private  Long femTableRowCount
           
private  LinkedHashMap<DdlAnalyzeStmt.ColumnDetail,DdlAnalyzeStmt.Histogram> histograms
           
private  List<DdlAnalyzeStmt.IndexDetail> indexDetails
           
private static int MAX_HISTOGRAM_BAR_COUNT
           
private static long MIN_SAMPLE_SIZE
           
static String REPEATABLE_SEED
           
private  FarragoRepos repos
           
private  SqlNumericLiteral samplePercent
           
private  Integer sampleRepeatableSeed
           
private  BitSet singleUniqueCols
          BitSet of column ordinal values that are part of a unique or primary key constraint, but only for those constraints that contain a single column.
private  BitSet singleUniqueColsNullable
          BitSet of column ordinal values for unique/primary key constrained columns where the column is nullable.
private  long statsRowCount
           
private  FarragoSessionStmtContext stmtContext
           
private  CwmTable table
           
private  SqlIdentifier tableName
           
(package private)  EigenbaseTimingTracer timingTracer
           
private  SqlPrettyWriter writer
           
 
Constructor Summary
DdlAnalyzeStmt(CwmTable table)
           
 
Method Summary
private  List<DdlAnalyzeStmt.ColumnHistogramBar> buildBars(ResultSet resultSet, long rowsPerBar, FarragoCardinalityEstimator estimator)
          Given a result set from a column distribution query, compute the bars in the Histogram.
private  DdlAnalyzeStmt.Histogram buildEstimatedHistogram(DdlAnalyzeStmt.ColumnDetail column, long tableRowCount, ResultSet resultSet, boolean isUnique, boolean isUniqueNullable)
          Iterate over the sample given in the result set and generate a histogram for the column.
private  void buildFemBars(DdlAnalyzeStmt.Histogram histogram, List<FemColumnHistogramBar> femBars)
          Convert the DdlAnalyzeStmt.ColumnHistogramBar instances in the given DdlAnalyzeStmt.Histogram into FemColumnHistogramBar instances.
private  DdlAnalyzeStmt.Histogram buildHistogram(DdlAnalyzeStmt.ColumnDetail column, long tableRowCount, ResultSet resultSet)
          Iterate over the given result set and generate a histogram for the column.
private  List<FemAbstractColumn> checkCatalogTypes()
          Verifies that statistics can be estimated for the given table.
private  void checkColumnDistributionQuery()
          Validate that the RelDataType of the current stmtContext matches what's expected from a column distribution query.
private  void checkRowCountQuery()
          Validate that the RelDataType of the current stmtContext matches a table row count query.
private  void chooseSamplePercentage(long rowCount)
          Choose an appropriate sampling percentage for a table with the given row count.
 void completeAfterExecuteUnlocked(FarragoSessionDdlValidator ddlValidator, FarragoSession session, boolean success)
          Provides access to the repository after execution of the DDL.
 boolean completeRequiresWriteTxn()
          Checks whether the DdlMultipleTransactionStmt.completeAfterExecuteUnlocked( FarragoSessionDdlValidator, FarragoSession, boolean) method requires a repository write transaction.
private  void computeColumnStats(Map<DdlAnalyzeStmt.ColumnDetail,DdlAnalyzeStmt.Histogram> histograms, DdlAnalyzeStmt.ColumnDetail column, long tableRowCount)
          Build a complete histogram and calculate cardinality for the given column by querying it.
private  long computeRowCount()
          Executes a query against the table to compute the row count.
private  long computeRowsLastHistogramBar(long rowCount, long rowsPerBar)
          Compute the number of rows in the last histogram bar based on the DEFAULT_HISTOGRAM_BAR_COUNT.
private  long computeRowsPerHistogramBar(long rowCount)
          Compute the number of rows per histogram bar based on the DEFAULT_HISTOGRAM_BAR_COUNT.
private  void estimateColumnStats(Map<DdlAnalyzeStmt.ColumnDetail,DdlAnalyzeStmt.Histogram> histograms, DdlAnalyzeStmt.ColumnDetail column, long tableRowCount, boolean isUnique, boolean isUniqueNullable)
          Build sampled histogram and estimate cardinality for the given column by querying it.
private  void estimateEmptyTableStats(List<DdlAnalyzeStmt.ColumnDetail> columnDetails, LinkedHashMap<DdlAnalyzeStmt.ColumnDetail,DdlAnalyzeStmt.Histogram> histograms)
          Iterate over the given FemAbstractColumn instances and generate histograms for an empty table.
private  void estimateStats(List<DdlAnalyzeStmt.ColumnDetail> columnDetails, long rowCount, LinkedHashMap<DdlAnalyzeStmt.ColumnDetail,DdlAnalyzeStmt.Histogram> histograms)
          Iterate over the given FemAbstractColumn instances and compute histograms and cardinality from sampled data.
private  void executeAnalyzeIndexes(FarragoSessionDdlValidator ddlValidator, long rowCount, long deletedRowCount, LinkedHashMap<DdlAnalyzeStmt.ColumnDetail,DdlAnalyzeStmt.Histogram> histograms)
          Analyzes the table's indexes.
 void executeUnlocked(FarragoSessionDdlValidator ddlValidator, FarragoSession session)
          Executes long-running DDL actions.
private  String getColumnDistributionQuery(SqlIdentifier columnName)
          Generate a query to generate a columns distribution.
private  long getRowCount()
          Retrieve the table's row count.
private  String getRowCountQuery()
          Generate a query to count a table's rows.
private  boolean personalityManagesRowCount(FarragoSessionDdlValidator ddlValidator)
          Returns true if the personality maintains an accurate row count and deleted row count in FemAbstractColumnSet.
private  void prepareIndexDetails()
          Prepares for analyzing table indexes.
 void prepForExecuteUnlocked(FarragoSessionDdlValidator ddlValidator, FarragoSession session)
          Provides access to the repository in preparation for the execution of DdlStmt.
 void setColumns(List<CwmColumn> columnList)
          Sets the list of columns to be analyzed
 void setEstimateOption(boolean estimate)
           
 void setSamplePercent(SqlNumericLiteral percent)
           
private  void setSampleRepeatableSeed(FarragoSessionDdlValidator ddlValidator)
          Looks up the current session variables and sets sampleRepeatableSeed to the repeatable seed value that may be stored there for testing purposes.
private  void updateStats(FarragoRepos repos, long rowCount, boolean updateRowCount, Collection<DdlAnalyzeStmt.Histogram> histograms, List<DdlAnalyzeStmt.IndexDetail> indexDetails)
          Updates catalog records with new statistical data, all within a single MDR write txn.
 void visit(DdlVisitor visitor)
          Invokes a visitor on this statement.
 
Methods inherited from class net.sf.farrago.ddl.DdlStmt
getModelElement, isDropRestricted, postCommit, postExecute, preExecute, preValidate, requiresCommit, runsAsDml
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface net.sf.farrago.session.FarragoSessionDdlStmt
getModelElement, isDropRestricted, postCommit, postExecute, preExecute, preValidate, requiresCommit, runsAsDml
 

Field Detail

DEFAULT_HISTOGRAM_BAR_COUNT

private static final int DEFAULT_HISTOGRAM_BAR_COUNT
See Also:
Constant Field Values

MAX_HISTOGRAM_BAR_COUNT

private static final int MAX_HISTOGRAM_BAR_COUNT
See Also:
Constant Field Values

MIN_SAMPLE_SIZE

private static final long MIN_SAMPLE_SIZE
See Also:
Constant Field Values

REPEATABLE_SEED

public static final String REPEATABLE_SEED
See Also:
Constant Field Values

timingTracer

EigenbaseTimingTracer timingTracer

table

private CwmTable table

columnList

private List<CwmColumn> columnList

estimate

private boolean estimate

computeRowCount

private boolean computeRowCount

samplePercent

private SqlNumericLiteral samplePercent

sampleRepeatableSeed

private Integer sampleRepeatableSeed

femTable

private FemAbstractColumnSet femTable

femTableRowCount

private Long femTableRowCount

femTableDeletedRowCount

private long femTableDeletedRowCount

stmtContext

private FarragoSessionStmtContext stmtContext

writer

private SqlPrettyWriter writer

tableName

private SqlIdentifier tableName

columnDetails

private List<DdlAnalyzeStmt.ColumnDetail> columnDetails

columnMap

private Map<FemAbstractColumn,DdlAnalyzeStmt.ColumnDetail> columnMap

indexDetails

private List<DdlAnalyzeStmt.IndexDetail> indexDetails

repos

private FarragoRepos repos

statsRowCount

private long statsRowCount

histograms

private LinkedHashMap<DdlAnalyzeStmt.ColumnDetail,DdlAnalyzeStmt.Histogram> histograms

singleUniqueCols

private BitSet singleUniqueCols
BitSet of column ordinal values that are part of a unique or primary key constraint, but only for those constraints that contain a single column. Implies cardinality = row count. Used for estimation only.


singleUniqueColsNullable

private BitSet singleUniqueColsNullable
BitSet of column ordinal values for unique/primary key constrained columns where the column is nullable. Implies cardinality = (row count - number of null values). Used for estimation only.

Constructor Detail

DdlAnalyzeStmt

public DdlAnalyzeStmt(CwmTable table)
Method Detail

setColumns

public void setColumns(List<CwmColumn> columnList)
Sets the list of columns to be analyzed

Parameters:
columnList - list of CwmColumn repository objects

setEstimateOption

public void setEstimateOption(boolean estimate)

setSamplePercent

public void setSamplePercent(SqlNumericLiteral percent)

visit

public void visit(DdlVisitor visitor)
Description copied from class: DdlStmt
Invokes a visitor on this statement.

Specified by:
visit in class DdlStmt
Parameters:
visitor - DdlVisitor to invoke

prepForExecuteUnlocked

public void prepForExecuteUnlocked(FarragoSessionDdlValidator ddlValidator,
                                   FarragoSession session)
Description copied from interface: DdlMultipleTransactionStmt
Provides access to the repository in preparation for the execution of DdlStmt. This method is invoked within the context the original repository transaction for the DDL statement. Whether that transaction is a write transaction depends on the DDL statement being executed.

Specified by:
prepForExecuteUnlocked in interface DdlMultipleTransactionStmt
Parameters:
ddlValidator - DDL validator for this statement
See Also:
FarragoReposTxnContext

executeUnlocked

public void executeUnlocked(FarragoSessionDdlValidator ddlValidator,
                            FarragoSession session)
Description copied from interface: DdlMultipleTransactionStmt
Executes long-running DDL actions. This method is invoked outside the context of any repository transaction.

Specified by:
executeUnlocked in interface DdlMultipleTransactionStmt
Parameters:
ddlValidator - DDL validator for this statement
session - reentrant Farrago session which may be used to execute DML statements

completeRequiresWriteTxn

public boolean completeRequiresWriteTxn()
Description copied from interface: DdlMultipleTransactionStmt
Checks whether the DdlMultipleTransactionStmt.completeAfterExecuteUnlocked( FarragoSessionDdlValidator, FarragoSession, boolean) method requires a repository write transaction.

Specified by:
completeRequiresWriteTxn in interface DdlMultipleTransactionStmt
Returns:
true if a write txn must be started before executing the completion step, false if a read txn is sufficient

completeAfterExecuteUnlocked

public void completeAfterExecuteUnlocked(FarragoSessionDdlValidator ddlValidator,
                                         FarragoSession session,
                                         boolean success)
Description copied from interface: DdlMultipleTransactionStmt
Provides access to the repository after execution of the DDL. Typically implementations of this method modify the repository to store the results of DdlMultipleTransactionStmt.executeUnlocked(FarragoSessionDdlValidator, FarragoSession). This method is invoked in a locked repository transaction. The method DdlMultipleTransactionStmt.completeRequiresWriteTxn() controls whether the transaction read-only or not. This method may not access and/or modify repository objects loaded in a previous transaction unless they are reloaded by MOF ID. Be aware that objects may have been modified by another session unless some external mechanism (for instance, the "table-in-use" collection) guarantees that they have not been modified by another statement.

Note that any repository modifications made during the execution of this method will not be post-processed by DdlValidator. For instance, DdlValidator.checkJmiConstraints(RefObject) is not called, and therefore any mandatory default primitives are not automatically set, which will cause errors later if the attributes have not been explicitly initialized. See JmiObjUtil.setMandatoryPrimitiveDefaults(javax.jmi.reflect.RefObject).

Specified by:
completeAfterExecuteUnlocked in interface DdlMultipleTransactionStmt
Parameters:
ddlValidator - DDL validator for this statement
session - reentrant Farrago session which may be used to execute DML statements
success - whether the execution succeeded; detection of failure can be used to recover

checkCatalogTypes

private List<FemAbstractColumn> checkCatalogTypes()
Verifies that statistics can be estimated for the given table.

Returns:
list of FemAbstractColumn instances if the table can be analyzed
Throws:
RuntimeException - if analyze is not support for the table

personalityManagesRowCount

private boolean personalityManagesRowCount(FarragoSessionDdlValidator ddlValidator)
Returns true if the personality maintains an accurate row count and deleted row count in FemAbstractColumnSet.


getRowCount

private long getRowCount()
                  throws SQLException
Retrieve the table's row count. Queries the table if computeRowCount is true, otherwise uses the value stored in femTable.

Returns:
table's row count
Throws:
SQLException - if there's an error querying the table

computeRowCount

private long computeRowCount()
                      throws SQLException
Executes a query against the table to compute the row count.

Returns:
table's row count
Throws:
SQLException - if there's an error querying the table

getRowCountQuery

private String getRowCountQuery()
Generate a query to count a table's rows.


checkRowCountQuery

private void checkRowCountQuery()
Validate that the RelDataType of the current stmtContext matches a table row count query.


setSampleRepeatableSeed

private void setSampleRepeatableSeed(FarragoSessionDdlValidator ddlValidator)
Looks up the current session variables and sets sampleRepeatableSeed to the repeatable seed value that may be stored there for testing purposes.


chooseSamplePercentage

private void chooseSamplePercentage(long rowCount)
Choose an appropriate sampling percentage for a table with the given row count. Sets samplePercent.


estimateStats

private void estimateStats(List<DdlAnalyzeStmt.ColumnDetail> columnDetails,
                           long rowCount,
                           LinkedHashMap<DdlAnalyzeStmt.ColumnDetail,DdlAnalyzeStmt.Histogram> histograms)
                    throws SQLException
Iterate over the given FemAbstractColumn instances and compute histograms and cardinality from sampled data. Passes information about a column gleaned from its constraints to the single-column estimation method.

Parameters:
columnDetails - collection of columns to analyze
rowCount - row count of the table
histograms - a map of columns to histograms with predictable iteration order
Throws:
SQLException - if a sampling query fails

estimateColumnStats

private void estimateColumnStats(Map<DdlAnalyzeStmt.ColumnDetail,DdlAnalyzeStmt.Histogram> histograms,
                                 DdlAnalyzeStmt.ColumnDetail column,
                                 long tableRowCount,
                                 boolean isUnique,
                                 boolean isUniqueNullable)
                          throws SQLException
Build sampled histogram and estimate cardinality for the given column by querying it.

Parameters:
histograms - map in which to store generated Histogram
column - the column to generate a histogram for
tableRowCount - number of rows in the table
isUnique - if true, the column has a uniqueness constraint that applies to it only (e.g. single-column primary key or single-column unique constraint)
isUniqueNullable - if true, the column allows nulls (ignored if isUnique is false)
Throws:
SQLException - if there's an error executing the sampling query

buildEstimatedHistogram

private DdlAnalyzeStmt.Histogram buildEstimatedHistogram(DdlAnalyzeStmt.ColumnDetail column,
                                                         long tableRowCount,
                                                         ResultSet resultSet,
                                                         boolean isUnique,
                                                         boolean isUniqueNullable)
                                                  throws SQLException
Iterate over the sample given in the result set and generate a histogram for the column.

Parameters:
column - the column to generate a histogram for
tableRowCount - number of rows in the table
resultSet - containing column samples aggregated by value
isUnique - if true, the column has a uniqueness constraint that applies to it only (e.g. single-column primary key or single-column unique constraint)
isUniqueNullable - if true, the column allows nulls (ignored if isUnique is false)
Returns:
the column's sampled Histogram
Throws:
SQLException - if there's an error reading the sample

estimateEmptyTableStats

private void estimateEmptyTableStats(List<DdlAnalyzeStmt.ColumnDetail> columnDetails,
                                     LinkedHashMap<DdlAnalyzeStmt.ColumnDetail,DdlAnalyzeStmt.Histogram> histograms)
Iterate over the given FemAbstractColumn instances and generate histograms for an empty table.

Parameters:
columnDetails - a list of columns
histograms - a map of columns to histograms with predictable iteration order

computeColumnStats

private void computeColumnStats(Map<DdlAnalyzeStmt.ColumnDetail,DdlAnalyzeStmt.Histogram> histograms,
                                DdlAnalyzeStmt.ColumnDetail column,
                                long tableRowCount)
                         throws SQLException
Build a complete histogram and calculate cardinality for the given column by querying it.

Parameters:
histograms - map in which to store generated Histogram
column - the column to generate a histogram for
tableRowCount - number of rows in the table
Throws:
SQLException - if there's an error executing the query

buildHistogram

private DdlAnalyzeStmt.Histogram buildHistogram(DdlAnalyzeStmt.ColumnDetail column,
                                                long tableRowCount,
                                                ResultSet resultSet)
                                         throws SQLException
Iterate over the given result set and generate a histogram for the column.

Parameters:
column - the column to generate a histogram for
tableRowCount - number of rows in the table
resultSet - containing column data aggregated by value
Returns:
the column's Histogram
Throws:
SQLException - if there's an error reading the sample

getColumnDistributionQuery

private String getColumnDistributionQuery(SqlIdentifier columnName)
Generate a query to generate a columns distribution. If the estimate flag is set, the query uses the TABLESAMPLE keyword to sample the column's data.


checkColumnDistributionQuery

private void checkColumnDistributionQuery()
Validate that the RelDataType of the current stmtContext matches what's expected from a column distribution query.


computeRowsPerHistogramBar

private long computeRowsPerHistogramBar(long rowCount)
Compute the number of rows per histogram bar based on the DEFAULT_HISTOGRAM_BAR_COUNT.

Parameters:
rowCount - number of rows from the table that will be represented by the histogram

computeRowsLastHistogramBar

private long computeRowsLastHistogramBar(long rowCount,
                                         long rowsPerBar)
Compute the number of rows in the last histogram bar based on the DEFAULT_HISTOGRAM_BAR_COUNT.

Parameters:
rowCount - number of rows from the table that will be represented by the histogram
rowsPerBar - the result of computeRowsPerHistogramBar(long)

buildBars

private List<DdlAnalyzeStmt.ColumnHistogramBar> buildBars(ResultSet resultSet,
                                                          long rowsPerBar,
                                                          FarragoCardinalityEstimator estimator)
                                                   throws SQLException
Given a result set from a column distribution query, compute the bars in the Histogram.

Parameters:
resultSet - column distribution query result set
rowsPerBar - the number of rows per bar
estimator - an estimator to populate with data from the result set; may be null
Returns:
a list of DdlAnalyzeStmt.ColumnHistogramBar instances
Throws:
SQLException - if there's an error reading the result set

buildFemBars

private void buildFemBars(DdlAnalyzeStmt.Histogram histogram,
                          List<FemColumnHistogramBar> femBars)
Convert the DdlAnalyzeStmt.ColumnHistogramBar instances in the given DdlAnalyzeStmt.Histogram into FemColumnHistogramBar instances.


prepareIndexDetails

private void prepareIndexDetails()
Prepares for analyzing table indexes. Must be called within a repository transaction.


executeAnalyzeIndexes

private void executeAnalyzeIndexes(FarragoSessionDdlValidator ddlValidator,
                                   long rowCount,
                                   long deletedRowCount,
                                   LinkedHashMap<DdlAnalyzeStmt.ColumnDetail,DdlAnalyzeStmt.Histogram> histograms)
Analyzes the table's indexes. If estimated statistics are in use, the index statistics are normally estimated as well. An exception is made when the index can provide a more accurate cardinality for a given column than the estimation algorithm.

Parameters:
ddlValidator - used to execute index statistics gathering
rowCount - table's row count
deletedRowCount - number of deleted rows in the table store (usually zero outside LucidDb)
histograms - previously gathered column histograms

updateStats

private void updateStats(FarragoRepos repos,
                         long rowCount,
                         boolean updateRowCount,
                         Collection<DdlAnalyzeStmt.Histogram> histograms,
                         List<DdlAnalyzeStmt.IndexDetail> indexDetails)
Updates catalog records with new statistical data, all within a single MDR write txn.

Parameters:
repos - repository
rowCount - table rowcount
updateRowCount - if true, update the catalog row count
histograms - column histograms
indexDetails - index details, including index statistics