Class GridSearch

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, AdditionalMeasureProducer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, Summarizable

    public class GridSearch
    extends RandomizableSingleClassifierEnhancer
    implements AdditionalMeasureProducer, Summarizable
    Performs a grid search of parameter pairs for the a classifier (Y-axis, default is LinearRegression with the "Ridge" parameter) and the PLSFilter (X-axis, "# of Components") and chooses the best pair found for the actual predicting.

    The initial grid is worked on with 2-fold CV to determine the values of the parameter pairs for the selected type of evaluation (e.g., accuracy). The best point in the grid is then taken and a 10-fold CV is performed with the adjacent parameter pairs. If a better pair is found, then this will act as new center and another 10-fold CV will be performed (kind of hill-climbing). This process is repeated until no better pair is found or the best pair is on the border of the grid.
    In case the best pair is on the border, one can let GridSearch automatically extend the grid and continue the search. Check out the properties 'gridIsExtendable' (option '-extend-grid') and 'maxGridExtensions' (option '-max-grid-extensions <num>').

    GridSearch can handle doubles, integers (values are just cast to int) and booleans (0 is false, otherwise true). float, char and long are supported as well.

    The best filter/classifier setup can be accessed after the buildClassifier call via the getBestFilter/getBestClassifier methods.
    Note on the implementation: after the data has been passed through the filter, a default NumericCleaner filter is applied to the data in order to avoid numbers that are getting too small and might produce NaNs in other schemes.

    Valid options are:

     -E <CC|RMSE|RRSE|MAE|RAE|COMB|ACC|KAP>
      Determines the parameter used for evaluation:
      CC = Correlation coefficient
      RMSE = Root mean squared error
      RRSE = Root relative squared error
      MAE = Mean absolute error
      RAE = Root absolute error
      COMB = Combined = (1-abs(CC)) + RRSE + RAE
      ACC = Accuracy
      KAP = Kappa
      (default: CC)
     -y-property <option>
      The Y option to test (without leading dash).
      (default: classifier.ridge)
     -y-min <num>
      The minimum for Y.
      (default: -10)
     -y-max <num>
      The maximum for Y.
      (default: +5)
     -y-step <num>
      The step size for Y.
      (default: 1)
     -y-base <num>
      The base for Y.
      (default: 10)
     -y-expression <expr>
      The expression for Y.
      Available parameters:
       BASE
       FROM
       TO
       STEP
       I - the current iteration value
       (from 'FROM' to 'TO' with stepsize 'STEP')
      (default: 'pow(BASE,I)')
     -filter <filter specification>
      The filter to use (on X axis). Full classname of filter to include, 
      followed by scheme options.
      (default: weka.filters.supervised.attribute.PLSFilter)
     -x-property <option>
      The X option to test (without leading dash).
      (default: filter.numComponents)
     -x-min <num>
      The minimum for X.
      (default: +5)
     -x-max <num>
      The maximum for X.
      (default: +20)
     -x-step <num>
      The step size for X.
      (default: 1)
     -x-base <num>
      The base for X.
      (default: 10)
     -x-expression <expr>
      The expression for the X value.
      Available parameters:
       BASE
       MIN
       MAX
       STEP
       I - the current iteration value
       (from 'FROM' to 'TO' with stepsize 'STEP')
      (default: 'pow(BASE,I)')
     -extend-grid
      Whether the grid can be extended.
      (default: no)
     -max-grid-extensions <num>
      The maximum number of grid extensions (-1 is unlimited).
      (default: 3)
     -sample-size <num>
      The size (in percent) of the sample to search the inital grid with.
      (default: 100)
     -traversal <ROW-WISE|COLUMN-WISE>
      The type of traversal for the grid.
      (default: COLUMN-WISE)
     -log-file <filename>
      The log file to log the messages to.
      (default: none)
     -S <num>
      Random number seed.
      (default 1)
     -D
      If set, classifier is run in debug mode and
      may output additional info to the console
     -W
      Full name of base classifier.
      (default: weka.classifiers.functions.LinearRegression)
     
     Options specific to classifier weka.classifiers.functions.LinearRegression:
     
     -D
      Produce debugging output.
      (default no debugging output)
     -S <number of selection method>
      Set the attribute selection method to use. 1 = None, 2 = Greedy.
      (default 0 = M5' method)
     -C
      Do not try to eliminate colinear attributes.
     
     -R <double>
      Set ridge parameter (default 1.0e-8).
     
     
     Options specific to filter weka.filters.supervised.attribute.PLSFilter ('-filter'):
     
     -D
      Turns on output of debugging information.
     -C <num>
      The number of components to compute.
      (default: 20)
     -U
      Updates the class attribute as well.
      (default: off)
     -M
      Turns replacing of missing values on.
      (default: off)
     -A <SIMPLS|PLS1>
      The algorithm to use.
      (default: PLS1)
     -P <none|center|standardize>
      The type of preprocessing that is applied to the data.
      (default: center)
    Examples:
    • Optimizing SMO with RBFKernel (C and gamma)
      • Set the evaluation to Accuracy.
      • Set the filter to weka.filters.AllFilter since we don't need any special data processing and we don't optimize the filter in this case (data gets always passed through filter!).
      • Set weka.classifiers.functions.SMO as classifier with weka.classifiers.functions.supportVector.RBFKernel as kernel.
      • Set the XProperty to "classifier.c", XMin to "1", XMax to "16", XStep to "1" and the XExpression to "I". This will test the "C" parameter of SMO for the values from 1 to 16.
      • Set the YProperty to "classifier.kernel.gamma", YMin to "-5", YMax to "2", YStep to "1" YBase to "10" and YExpression to "pow(BASE,I)". This will test the gamma of the RBFKernel with the values 10^-5, 10^-4,..,10^2.
    • Optimizing PLSFilter with LinearRegression (# of components and ridge) - default setup
      • Set the evaluation to Correlation coefficient.
      • Set the filter to weka.filters.supervised.attribute.PLSFilter.
      • Set weka.classifiers.functions.LinearRegression as classifier and use no attribute selection and no elimination of colinear attributes.
      • Set the XProperty to "filter.numComponents", XMin to "5", XMax to "20" (this depends heavily on your dataset, should be no more than the number of attributes!), XStep to "1" and XExpression to "I". This will test the number of components the PLSFilter will produce from 5 to 20.
      • Set the YProperty to "classifier.ridge", XMin to "-10", XMax to "5", YStep to "1" and YExpression to "pow(BASE,I)". This will try ridge parameters from 10^-10 to 10^5.
    General notes:
    • Turn the debug flag on in order to see some progress output in the console
    • If you want to view the fitness landscape that GridSearch explores, select a log file. This log will then contain Gnuplot data and script block for viewing the landscape. Just copy paste those blocks into files named accordingly and run Gnuplot with them.
    Version:
    $Revision: 9733 $
    Author:
    Bernhard Pfahringer (bernhard at cs dot waikato dot ac dot nz), Geoff Holmes (geoff at cs dot waikato dot ac dot nz), fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    PLSFilter, LinearRegression, NumericCleaner, Serialized Form
    • Field Detail

      • EVALUATION_CC

        public static final int EVALUATION_CC
        evaluation via: Correlation coefficient
        See Also:
        Constant Field Values
      • EVALUATION_RMSE

        public static final int EVALUATION_RMSE
        evaluation via: Root mean squared error
        See Also:
        Constant Field Values
      • EVALUATION_RRSE

        public static final int EVALUATION_RRSE
        evaluation via: Root relative squared error
        See Also:
        Constant Field Values
      • EVALUATION_MAE

        public static final int EVALUATION_MAE
        evaluation via: Mean absolute error
        See Also:
        Constant Field Values
      • EVALUATION_RAE

        public static final int EVALUATION_RAE
        evaluation via: Relative absolute error
        See Also:
        Constant Field Values
      • EVALUATION_COMBINED

        public static final int EVALUATION_COMBINED
        evaluation via: Combined = (1-CC) + RRSE + RAE
        See Also:
        Constant Field Values
      • EVALUATION_ACC

        public static final int EVALUATION_ACC
        evaluation via: Accuracy
        See Also:
        Constant Field Values
      • EVALUATION_KAPPA

        public static final int EVALUATION_KAPPA
        evaluation via: kappa statistic
        See Also:
        Constant Field Values
      • TAGS_EVALUATION

        public static final Tag[] TAGS_EVALUATION
        evaluation
      • TRAVERSAL_BY_ROW

        public static final int TRAVERSAL_BY_ROW
        row-wise grid traversal
        See Also:
        Constant Field Values
      • TRAVERSAL_BY_COLUMN

        public static final int TRAVERSAL_BY_COLUMN
        column-wise grid traversal
        See Also:
        Constant Field Values
      • TAGS_TRAVERSAL

        public static final Tag[] TAGS_TRAVERSAL
        traversal
      • PREFIX_CLASSIFIER

        public static final java.lang.String PREFIX_CLASSIFIER
        the prefix to indicate that the option is for the classifier
        See Also:
        Constant Field Values
      • PREFIX_FILTER

        public static final java.lang.String PREFIX_FILTER
        the prefix to indicate that the option is for the filter
        See Also:
        Constant Field Values
    • Constructor Detail

      • GridSearch

        public GridSearch()
        the default constructor
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing classifier
        Returns:
        a description suitable for displaying in the explorer/experimenter gui
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses the options for this object.

        Valid options are:

         -E <CC|RMSE|RRSE|MAE|RAE|COMB|ACC|KAP>
          Determines the parameter used for evaluation:
          CC = Correlation coefficient
          RMSE = Root mean squared error
          RRSE = Root relative squared error
          MAE = Mean absolute error
          RAE = Root absolute error
          COMB = Combined = (1-abs(CC)) + RRSE + RAE
          ACC = Accuracy
          KAP = Kappa
          (default: CC)
         -y-property <option>
          The Y option to test (without leading dash).
          (default: classifier.ridge)
         -y-min <num>
          The minimum for Y.
          (default: -10)
         -y-max <num>
          The maximum for Y.
          (default: +5)
         -y-step <num>
          The step size for Y.
          (default: 1)
         -y-base <num>
          The base for Y.
          (default: 10)
         -y-expression <expr>
          The expression for Y.
          Available parameters:
           BASE
           FROM
           TO
           STEP
           I - the current iteration value
           (from 'FROM' to 'TO' with stepsize 'STEP')
          (default: 'pow(BASE,I)')
         -filter <filter specification>
          The filter to use (on X axis). Full classname of filter to include, 
          followed by scheme options.
          (default: weka.filters.supervised.attribute.PLSFilter)
         -x-property <option>
          The X option to test (without leading dash).
          (default: filter.numComponents)
         -x-min <num>
          The minimum for X.
          (default: +5)
         -x-max <num>
          The maximum for X.
          (default: +20)
         -x-step <num>
          The step size for X.
          (default: 1)
         -x-base <num>
          The base for X.
          (default: 10)
         -x-expression <expr>
          The expression for the X value.
          Available parameters:
           BASE
           MIN
           MAX
           STEP
           I - the current iteration value
           (from 'FROM' to 'TO' with stepsize 'STEP')
          (default: 'pow(BASE,I)')
         -extend-grid
          Whether the grid can be extended.
          (default: no)
         -max-grid-extensions <num>
          The maximum number of grid extensions (-1 is unlimited).
          (default: 3)
         -sample-size <num>
          The size (in percent) of the sample to search the inital grid with.
          (default: 100)
         -traversal <ROW-WISE|COLUMN-WISE>
          The type of traversal for the grid.
          (default: COLUMN-WISE)
         -log-file <filename>
          The log file to log the messages to.
          (default: none)
         -S <num>
          Random number seed.
          (default 1)
         -D
          If set, classifier is run in debug mode and
          may output additional info to the console
         -W
          Full name of base classifier.
          (default: weka.classifiers.functions.LinearRegression)
         
         Options specific to classifier weka.classifiers.functions.LinearRegression:
         
         -D
          Produce debugging output.
          (default no debugging output)
         -S <number of selection method>
          Set the attribute selection method to use. 1 = None, 2 = Greedy.
          (default 0 = M5' method)
         -C
          Do not try to eliminate colinear attributes.
         
         -R <double>
          Set ridge parameter (default 1.0e-8).
         
         
         Options specific to filter weka.filters.supervised.attribute.PLSFilter ('-filter'):
         
         -D
          Turns on output of debugging information.
         -C <num>
          The number of components to compute.
          (default: 20)
         -U
          Updates the class attribute as well.
          (default: off)
         -M
          Turns replacing of missing values on.
          (default: off)
         -A <SIMPLS|PLS1>
          The algorithm to use.
          (default: PLS1)
         -P <none|center|standardize>
          The type of preprocessing that is applied to the data.
          (default: center)
        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class RandomizableSingleClassifierEnhancer
        Parameters:
        options - the options to use
        Throws:
        java.lang.Exception - if setting of options fails
      • filterTipText

        public java.lang.String filterTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setFilter

        public void setFilter​(Filter value)
        Set the kernel filter (only used for setup).
        Parameters:
        value - the kernel filter.
      • getFilter

        public Filter getFilter()
        Get the kernel filter.
        Returns:
        the kernel filter
      • evaluationTipText

        public java.lang.String evaluationTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setEvaluation

        public void setEvaluation​(SelectedTag value)
        Sets the criterion to use for evaluating the classifier performance.
        Parameters:
        value - .the evaluation criterion
      • getEvaluation

        public SelectedTag getEvaluation()
        Gets the criterion used for evaluating the classifier performance.
        Returns:
        the current evaluation criterion.
      • YPropertyTipText

        public java.lang.String YPropertyTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getYProperty

        public java.lang.String getYProperty()
        Get the Y property (normally the classifier).
        Returns:
        Value of the property.
      • setYProperty

        public void setYProperty​(java.lang.String value)
        Set the Y property (normally the classifier).
        Parameters:
        value - the Y property.
      • YMinTipText

        public java.lang.String YMinTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getYMin

        public double getYMin()
        Get the value of the minimum of Y.
        Returns:
        Value of the minimum of Y.
      • setYMin

        public void setYMin​(double value)
        Set the value of the minimum of Y.
        Parameters:
        value - Value to use as minimum of Y.
      • YMaxTipText

        public java.lang.String YMaxTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getYMax

        public double getYMax()
        Get the value of the Maximum of Y.
        Returns:
        Value of the Maximum of Y.
      • setYMax

        public void setYMax​(double value)
        Set the value of the Maximum of Y.
        Parameters:
        value - Value to use as Maximum of Y.
      • YStepTipText

        public java.lang.String YStepTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getYStep

        public double getYStep()
        Get the value of the step size for Y.
        Returns:
        Value of the step size for Y.
      • setYStep

        public void setYStep​(double value)
        Set the value of the step size for Y.
        Parameters:
        value - Value to use as the step size for Y.
      • YBaseTipText

        public java.lang.String YBaseTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getYBase

        public double getYBase()
        Get the value of the base for Y.
        Returns:
        Value of the base for Y.
      • setYBase

        public void setYBase​(double value)
        Set the value of the base for Y.
        Parameters:
        value - Value to use as the base for Y.
      • YExpressionTipText

        public java.lang.String YExpressionTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getYExpression

        public java.lang.String getYExpression()
        Get the expression for the Y value.
        Returns:
        Expression for the Y value.
      • setYExpression

        public void setYExpression​(java.lang.String value)
        Set the expression for the Y value.
        Parameters:
        value - Expression for the Y value.
      • XPropertyTipText

        public java.lang.String XPropertyTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getXProperty

        public java.lang.String getXProperty()
        Get the X property to test (normally the filter).
        Returns:
        Value of the X property.
      • setXProperty

        public void setXProperty​(java.lang.String value)
        Set the X property.
        Parameters:
        value - the X property.
      • XMinTipText

        public java.lang.String XMinTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getXMin

        public double getXMin()
        Get the value of the minimum of X.
        Returns:
        Value of the minimum of X.
      • setXMin

        public void setXMin​(double value)
        Set the value of the minimum of X.
        Parameters:
        value - Value to use as minimum of X.
      • XMaxTipText

        public java.lang.String XMaxTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getXMax

        public double getXMax()
        Get the value of the Maximum of X.
        Returns:
        Value of the Maximum of X.
      • setXMax

        public void setXMax​(double value)
        Set the value of the Maximum of X.
        Parameters:
        value - Value to use as Maximum of X.
      • XStepTipText

        public java.lang.String XStepTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getXStep

        public double getXStep()
        Get the value of the step size for X.
        Returns:
        Value of the step size for X.
      • setXStep

        public void setXStep​(double value)
        Set the value of the step size for X.
        Parameters:
        value - Value to use as the step size for X.
      • XBaseTipText

        public java.lang.String XBaseTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getXBase

        public double getXBase()
        Get the value of the base for X.
        Returns:
        Value of the base for X.
      • setXBase

        public void setXBase​(double value)
        Set the value of the base for X.
        Parameters:
        value - Value to use as the base for X.
      • XExpressionTipText

        public java.lang.String XExpressionTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getXExpression

        public java.lang.String getXExpression()
        Get the expression for the X value.
        Returns:
        Expression for the X value.
      • setXExpression

        public void setXExpression​(java.lang.String value)
        Set the expression for the X value.
        Parameters:
        value - Expression for the X value.
      • gridIsExtendableTipText

        public java.lang.String gridIsExtendableTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getGridIsExtendable

        public boolean getGridIsExtendable()
        Get whether the grid can be extended dynamically.
        Returns:
        true if the grid can be extended.
      • setGridIsExtendable

        public void setGridIsExtendable​(boolean value)
        Set whether the grid can be extended dynamically.
        Parameters:
        value - whether the grid can be extended dynamically.
      • maxGridExtensionsTipText

        public java.lang.String maxGridExtensionsTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getMaxGridExtensions

        public int getMaxGridExtensions()
        Gets the maximum number of grid extensions, -1 for unlimited.
        Returns:
        the max number of grid extensions
      • setMaxGridExtensions

        public void setMaxGridExtensions​(int value)
        Sets the maximum number of grid extensions, -1 for unlimited.
        Parameters:
        value - the maximum of grid extensions.
      • sampleSizePercentTipText

        public java.lang.String sampleSizePercentTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getSampleSizePercent

        public double getSampleSizePercent()
        Gets the sample size for the initial grid search.
        Returns:
        the sample size.
      • setSampleSizePercent

        public void setSampleSizePercent​(double value)
        Sets the sample size for the initial grid search.
        Parameters:
        value - the sample size for the initial grid search.
      • traversalTipText

        public java.lang.String traversalTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setTraversal

        public void setTraversal​(SelectedTag value)
        Sets the type of traversal for the grid.
        Parameters:
        value - the traversal type
      • getTraversal

        public SelectedTag getTraversal()
        Gets the type of traversal for the grid.
        Returns:
        the current traversal type.
      • logFileTipText

        public java.lang.String logFileTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getLogFile

        public java.io.File getLogFile()
        Gets current log file.
        Returns:
        the log file.
      • setLogFile

        public void setLogFile​(java.io.File value)
        Sets the log file to use.
        Parameters:
        value - the log file.
      • getBestFilter

        public Filter getBestFilter()
        returns the best filter setup
        Returns:
        the best filter setup
      • getBestClassifier

        public Classifier getBestClassifier()
        returns the best Classifier setup
        Returns:
        the best Classifier setup
      • enumerateMeasures

        public java.util.Enumeration enumerateMeasures()
        Returns an enumeration of the measure names.
        Specified by:
        enumerateMeasures in interface AdditionalMeasureProducer
        Returns:
        an enumeration of the measure names
      • getMeasure

        public double getMeasure​(java.lang.String measureName)
        Returns the value of the named measure
        Specified by:
        getMeasure in interface AdditionalMeasureProducer
        Parameters:
        measureName - the name of the measure to query for its value
        Returns:
        the value of the named measure
      • getValues

        public weka.classifiers.meta.GridSearch.PointDouble getValues()
        returns the parameter pair that was found to work best
        Returns:
        the best parameter combination
      • getGridExtensionsPerformed

        public int getGridExtensionsPerformed()
        returns the number of grid extensions that took place during the search (only applicable if the grid was extendable).
        Returns:
        the number of grid extensions that were performed
        See Also:
        getGridIsExtendable()
      • buildClassifier

        public void buildClassifier​(Instances data)
                             throws java.lang.Exception
        builds the classifier
        Specified by:
        buildClassifier in class Classifier
        Parameters:
        data - the training instances
        Throws:
        java.lang.Exception - if something goes wrong
      • distributionForInstance

        public double[] distributionForInstance​(Instance instance)
                                         throws java.lang.Exception
        Computes the distribution for a given instance
        Overrides:
        distributionForInstance in class Classifier
        Parameters:
        instance - the instance for which distribution is computed
        Returns:
        the distribution
        Throws:
        java.lang.Exception - if the distribution can't be computed successfully
      • toString

        public java.lang.String toString()
        returns a string representation of the classifier
        Overrides:
        toString in class java.lang.Object
        Returns:
        a string representation of the classifier
      • toSummaryString

        public java.lang.String toSummaryString()
        Returns a string that summarizes the object.
        Specified by:
        toSummaryString in interface Summarizable
        Returns:
        the object summarized as a string
      • main

        public static void main​(java.lang.String[] args)
        Main method for running this classifier from commandline.
        Parameters:
        args - the options