com.lucidera.opt
Class LoptSemiJoinOptimizer

java.lang.Object
  extended by com.lucidera.opt.LoptSemiJoinOptimizer

public class LoptSemiJoinOptimizer
extends Object

LoptSemiJoinOptimizer implements the logic for determining the optimal semijoins to be used in processing joins in a query.

Version:
$Id: //open/dev/farrago/src/com/lucidera/opt/LoptSemiJoinOptimizer.java#20 $
Author:
Zelaine Fong

Nested Class Summary
private  class LoptSemiJoinOptimizer.FactorCostComparator
           
 
Field Summary
private  RelNode[] chosenSemiJoins
          Semijoins corresponding to each join factor, if they are going to be filtered by semijoins.
private  Comparator<Integer> factorCostComparator
           
private  Map<Integer,Map<Integer,SemiJoinRel>> possibleSemiJoins
          Associates potential semijoins with each fact table factor.
private  RexBuilder rexBuilder
          RexBuilder for constructing new RexNodes
private static int thresholdScore
           
 
Constructor Summary
LoptSemiJoinOptimizer(LoptMultiJoin multiJoin, RexBuilder rexBuilder)
           
 
Method Summary
private  RexNode adjustSemiJoinCondition(LoptMultiJoin multiJoin, int leftAdjustment, RexNode semiJoinCondition, int leftIdx, int rightIdx)
          Modifies the semijoin condition to reflect the fact that the RHS is now the second factor into a join and the LHS is the first
 boolean chooseBestSemiJoin(LoptMultiJoin multiJoin)
          Finds the optimal semijoin for filtering the least costly fact table from among the remaining possible semijoins to choose from.
private  double computeScore(RelNode factRel, RelNode dimRel, SemiJoinRel semiJoin)
          Computes a score relevant to applying a set of semijoins on a fact table.
private  SemiJoinRel findSemiJoinIndexByCost(LoptMultiJoin multiJoin, List<RexNode> joinFilters, int factIdx, int dimIdx)
          Given a list of possible filters on a fact table, determine if there is an index that can be used, provided all the fact table keys originate from the same underlying table.
 RelNode getChosenSemiJoin(int factIdx)
           
private  int isSuitableFilter(LoptMultiJoin multiJoin, RexNode joinFilter, int factIdx)
          Determines if a join filter can be used with a semijoin against a specified fact table.
 void makePossibleSemiJoins(LoptMultiJoin multiJoin)
          Determines all possible semijoins that can be used by dimension tables to filter fact tables.
private  RexNode removeExtraFilters(List<Integer> keys, int nFields, RexNode condition)
          Removes from an expression any sub-expressions that reference key values that aren't contained in a key list passed in.
private  void removeJoin(LoptMultiJoin multiJoin, SemiJoinRel semiJoin, int factIdx, int dimIdx)
          Determines whether a join of the dimension table in a semijoin can be removed.
private  void removePossibleSemiJoin(Map<Integer,SemiJoinRel> possibleDimensions, Integer factIdx, Integer dimIdx)
          Removes a dimension table from a fact table's list of possible semijoins
private  LcsTable validateKeys(RelNode factRel, List<Integer> leftKeys, List<Integer> rightKeys, List<Integer> actualLeftKeys)
          Validates the candidate semijoin keys corresponding to the fact table.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

thresholdScore

private static int thresholdScore

rexBuilder

private RexBuilder rexBuilder
RexBuilder for constructing new RexNodes


chosenSemiJoins

private RelNode[] chosenSemiJoins
Semijoins corresponding to each join factor, if they are going to be filtered by semijoins. Otherwise, the entry is the original join factor.


possibleSemiJoins

private Map<Integer,Map<Integer,SemiJoinRel>> possibleSemiJoins
Associates potential semijoins with each fact table factor. The first parameter in the map corresponds to the fact table. The second corresponds to the dimension table and a SemiJoinRel that captures all the necessary semijoin data between that fact and dimension table


factorCostComparator

private final Comparator<Integer> factorCostComparator
Constructor Detail

LoptSemiJoinOptimizer

public LoptSemiJoinOptimizer(LoptMultiJoin multiJoin,
                             RexBuilder rexBuilder)
Method Detail

makePossibleSemiJoins

public void makePossibleSemiJoins(LoptMultiJoin multiJoin)
Determines all possible semijoins that can be used by dimension tables to filter fact tables. Constructs SemiJoinRels corresponding to potential dimension table filters and stores them in the member field "possibleSemiJoins"

Parameters:
multiJoin - join factors being optimized

isSuitableFilter

private int isSuitableFilter(LoptMultiJoin multiJoin,
                             RexNode joinFilter,
                             int factIdx)
Determines if a join filter can be used with a semijoin against a specified fact table. A suitable filter is of the form "factable.col1 = dimTable.col2".

Parameters:
multiJoin - join factors being optimized
joinFilter - filter to be analyzed
factIdx - index corresponding to the fact table
Returns:
index of corresponding dimension table if the filter is appropriate; otherwise -1 is returned

findSemiJoinIndexByCost

private SemiJoinRel findSemiJoinIndexByCost(LoptMultiJoin multiJoin,
                                            List<RexNode> joinFilters,
                                            int factIdx,
                                            int dimIdx)
Given a list of possible filters on a fact table, determine if there is an index that can be used, provided all the fact table keys originate from the same underlying table.

Parameters:
multiJoin - join factors being optimized
joinFilters - filters to be used on the fact table
factIdx - index in join factors corresponding to the fact table
dimIdx - index in join factors corresponding to the dimension table
Returns:
SemiJoinRel containing information regarding the semijoin that can be used to filter the fact table

adjustSemiJoinCondition

private RexNode adjustSemiJoinCondition(LoptMultiJoin multiJoin,
                                        int leftAdjustment,
                                        RexNode semiJoinCondition,
                                        int leftIdx,
                                        int rightIdx)
Modifies the semijoin condition to reflect the fact that the RHS is now the second factor into a join and the LHS is the first

Parameters:
multiJoin - join factors being optimized
leftAdjustment - amount the left RexInputRefs need to be adjusted by
semiJoinCondition - condition to be adjusted
leftIdx - index of the join factor corresponding to the LHS of the semijoin,
rightIdx - index of the join factor corresponding to the RHS of the semijoin
Returns:
modified semijoin condition

validateKeys

private LcsTable validateKeys(RelNode factRel,
                              List<Integer> leftKeys,
                              List<Integer> rightKeys,
                              List<Integer> actualLeftKeys)
Validates the candidate semijoin keys corresponding to the fact table. Ensure the keys all originate from the same underlying table, and they all correspond to simple column references. If unsuitable keys are found, they're removed from the key list and a new list corresponding to the remaining valid keys is returned.

Parameters:
factRel - fact table RelNode
leftKeys - fact table semijoin keys
rightKeys - dimension table semijoin keys
actualLeftKeys - the remaining valid fact table semijoin keys
Returns:
the underlying fact table if the semijoin keys are valid; otherwise null

removeExtraFilters

private RexNode removeExtraFilters(List<Integer> keys,
                                   int nFields,
                                   RexNode condition)
Removes from an expression any sub-expressions that reference key values that aren't contained in a key list passed in. The keys represent join keys on one side of a join. The subexpressions are all assumed to be of the form "tab1.col1 = tab2.col2".

Parameters:
keys - join keys from one side of the join
nFields - number of fields in the side of the join for which the keys correspond
condition - original expression
Returns:
modified expression with filters that don't reference specified keys removed

chooseBestSemiJoin

public boolean chooseBestSemiJoin(LoptMultiJoin multiJoin)
Finds the optimal semijoin for filtering the least costly fact table from among the remaining possible semijoins to choose from. The chosen semijoin is stored in the chosenSemiJoins array

Parameters:
multiJoin - join factors being optimized
Returns:
true if a suitable semijoin is found; false otherwise

computeScore

private double computeScore(RelNode factRel,
                            RelNode dimRel,
                            SemiJoinRel semiJoin)
Computes a score relevant to applying a set of semijoins on a fact table. The higher the score, the better.

Parameters:
factRel - fact table being filtered
dimRel - dimension table that participates in semijoin
semiJoin - semijoin between fact and dimension tables
Returns:
computed score of applying the dimension table filters on the fact table

removeJoin

private void removeJoin(LoptMultiJoin multiJoin,
                        SemiJoinRel semiJoin,
                        int factIdx,
                        int dimIdx)
Determines whether a join of the dimension table in a semijoin can be removed. It can be if the dimension keys are unique and the only fields referenced from the dimension table are its semijoin keys. The semijoin keys can be mapped to the corresponding keys from the fact table (because of the equality condition associated with the semijoin keys). Therefore, that's why the dimension table can be removed even though those fields are referenced elsewhere in the query tree.

Parameters:
multiJoin - join factors being optimized
semiJoin - semijoin under consideration
factIdx - id of the fact table in the semijoin
dimIdx - id of the dimension table in the semijoin

removePossibleSemiJoin

private void removePossibleSemiJoin(Map<Integer,SemiJoinRel> possibleDimensions,
                                    Integer factIdx,
                                    Integer dimIdx)
Removes a dimension table from a fact table's list of possible semijoins

Parameters:
possibleDimensions - possible dimension tables associated with the fact table
factIdx - index corresponding to fact table
dimIdx - index corresponding to dimension table

getChosenSemiJoin

public RelNode getChosenSemiJoin(int factIdx)
Parameters:
factIdx - index corresponding to the desired factor
Returns:
optimal semijoin for the specified factor; may be the factor itself if semijoins are not chosen for the factor