Class RandomSubSpace

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler, WeightedInstancesHandler

    public class RandomSubSpace
    extends RandomizableIteratedSingleClassifierEnhancer
    implements WeightedInstancesHandler, TechnicalInformationHandler
    This method constructs a decision tree based classifier that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces.

    For more information, see

    Tin Kam Ho (1998). The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence. 20(8):832-844. URL http://citeseer.ist.psu.edu/ho98random.html.

    BibTeX:

     @article{Ho1998,
        author = {Tin Kam Ho},
        journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
        number = {8},
        pages = {832-844},
        title = {The Random Subspace Method for Constructing Decision Forests},
        volume = {20},
        year = {1998},
        ISSN = {0162-8828},
        URL = {http://citeseer.ist.psu.edu/ho98random.html}
     }
     

    Valid options are:

     -P
      Size of each subspace:
       < 1: percentage of the number of attributes
       >=1: absolute number of attributes
     
     -S <num>
      Random number seed.
      (default 1)
     -I <num>
      Number of iterations.
      (default 10)
     -D
      If set, classifier is run in debug mode and
      may output additional info to the console
     -W
      Full name of base classifier.
      (default: weka.classifiers.trees.REPTree)
     
     Options specific to classifier weka.classifiers.trees.REPTree:
     
     -M <minimum number of instances>
      Set minimum number of instances per leaf (default 2).
     -V <minimum variance for split>
      Set minimum numeric class variance proportion
      of train variance for split (default 1e-3).
     -N <number of folds>
      Number of folds for reduced error pruning (default 3).
     -S <seed>
      Seed for random data shuffling (default 1).
     -P
      No pruning.
     -L
      Maximum tree depth (default -1, no maximum)
    Options after -- are passed to the designated classifier.

    Version:
    $Revision: 1.4 $
    Author:
    Bernhard Pfahringer (bernhard@cs.waikato.ac.nz), Peter Reutemann (fracpete@cs.waikato.ac.nz)
    See Also:
    Serialized Form
    • Constructor Detail

      • RandomSubSpace

        public RandomSubSpace()
        Constructor.
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing classifier
        Returns:
        a description suitable for displaying in the explorer/experimenter gui
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Specified by:
        getTechnicalInformation in interface TechnicalInformationHandler
        Returns:
        the technical information about this class
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options.

        Valid options are:

         -P
          Size of each subspace:
           < 1: percentage of the number of attributes
           >=1: absolute number of attributes
         
         -S <num>
          Random number seed.
          (default 1)
         -I <num>
          Number of iterations.
          (default 10)
         -D
          If set, classifier is run in debug mode and
          may output additional info to the console
         -W
          Full name of base classifier.
          (default: weka.classifiers.trees.REPTree)
         
         Options specific to classifier weka.classifiers.trees.REPTree:
         
         -M <minimum number of instances>
          Set minimum number of instances per leaf (default 2).
         -V <minimum variance for split>
          Set minimum numeric class variance proportion
          of train variance for split (default 1e-3).
         -N <number of folds>
          Number of folds for reduced error pruning (default 3).
         -S <seed>
          Seed for random data shuffling (default 1).
         -P
          No pruning.
         -L
          Maximum tree depth (default -1, no maximum)
        Options after -- are passed to the designated classifier.

        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class RandomizableIteratedSingleClassifierEnhancer
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • subSpaceSizeTipText

        public java.lang.String subSpaceSizeTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getSubSpaceSize

        public double getSubSpaceSize()
        Gets the size of each subSpace, as a percentage of the training set size.
        Returns:
        the subSpace size, as a percentage.
      • setSubSpaceSize

        public void setSubSpaceSize​(double value)
        Sets the size of each subSpace, as a percentage of the training set size.
        Parameters:
        value - the subSpace size, as a percentage.
      • buildClassifier

        public void buildClassifier​(Instances data)
                             throws java.lang.Exception
        builds the classifier.
        Overrides:
        buildClassifier in class IteratedSingleClassifierEnhancer
        Parameters:
        data - the training data to be used for generating the classifier.
        Throws:
        java.lang.Exception - if the classifier could not be built successfully
      • distributionForInstance

        public double[] distributionForInstance​(Instance instance)
                                         throws java.lang.Exception
        Calculates the class membership probabilities for the given test instance.
        Overrides:
        distributionForInstance in class Classifier
        Parameters:
        instance - the instance to be classified
        Returns:
        preedicted class probability distribution
        Throws:
        java.lang.Exception - if distribution can't be computed successfully
      • toString

        public java.lang.String toString()
        Returns description of the bagged classifier.
        Overrides:
        toString in class java.lang.Object
        Returns:
        description of the bagged classifier as a string
      • main

        public static void main​(java.lang.String[] args)
        Main method for testing this class.
        Parameters:
        args - the options