Class XMeans

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, Clusterer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

    public class XMeans
    extends RandomizableClusterer
    implements TechnicalInformationHandler
    Cluster data using the X-means algorithm.

    X-Means is K-Means extended by an Improve-Structure part In this part of the algorithm the centers are attempted to be split in its region. The decision between the children of each center and itself is done comparing the BIC-values of the two structures.

    For more information see:

    Dan Pelleg, Andrew W. Moore: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: Seventeenth International Conference on Machine Learning, 727-734, 2000.

    BibTeX:

     @inproceedings{Pelleg2000,
        author = {Dan Pelleg and Andrew W. Moore},
        booktitle = {Seventeenth International Conference on Machine Learning},
        pages = {727-734},
        publisher = {Morgan Kaufmann},
        title = {X-means: Extending K-means with Efficient Estimation of the Number of Clusters},
        year = {2000}
     }
     

    Valid options are:

     -I <num>
      maximum number of overall iterations
      (default 1).
     -M <num>
      maximum number of iterations in the kMeans loop in
      the Improve-Parameter part 
      (default 1000).
     -J <num>
      maximum number of iterations in the kMeans loop
      for the splitted centroids in the Improve-Structure part 
      (default 1000).
     -L <num>
      minimum number of clusters
      (default 2).
     -H <num>
      maximum number of clusters
      (default 4).
     -B <value>
      distance value for binary attributes
      (default 1.0).
     -use-kdtree
      Uses the KDTree internally
      (default no).
     -K <KDTree class specification>
      Full class name of KDTree class to use, followed
      by scheme options.
      eg: "weka.core.neighboursearch.kdtrees.KDTree -P"
      (default no KDTree class used).
     -C <value>
      cutoff factor, takes the given percentage of the splitted 
      centroids if none of the children win
      (default 0.0).
     -D <distance function class specification>
      Full class name of Distance function class to use, followed
      by scheme options.
      (default weka.core.EuclideanDistance).
     -N <file name>
      file to read starting centers from (ARFF format).
     -O <file name>
      file to write centers to (ARFF format).
     -U <int>
      The debug level.
      (default 0)
     -Y <file name>
      The debug vectors file.
     -S <num>
      Random number seed.
      (default 10)
    Version:
    $Revision: 9986 $
    Author:
    Gabi Schmidberger (gabi@cs.waikato.ac.nz), Mark Hall (mhall@cs.waikato.ac.nz), Malcolm Ware (mfw4@cs.waikato.ac.nz)
    See Also:
    RandomizableClusterer, Serialized Form
    • Field Detail

      • R_LOW

        public static int R_LOW
        Index in ranges for LOW.
      • R_HIGH

        public static int R_HIGH
        Index in ranges for HIGH.
      • R_WIDTH

        public static int R_WIDTH
        Index in ranges for WIDTH.
      • D_PRINTCENTERS

        public static int D_PRINTCENTERS
        print the centers.
      • D_FOLLOWSPLIT

        public static int D_FOLLOWSPLIT
        follows the splitting of the centers.
      • D_CONVCHCLOSER

        public static int D_CONVCHCLOSER
        have a closer look at converge children.
      • D_RANDOMVECTOR

        public static int D_RANDOMVECTOR
        check on random vectors.
      • D_KDTREE

        public static int D_KDTREE
        check on kdtree.
      • D_ITERCOUNT

        public static int D_ITERCOUNT
        follow iterations.
      • D_METH_MISUSE

        public static int D_METH_MISUSE
        functions were maybe misused.
      • D_CURR

        public static int D_CURR
        for current debug.
      • D_GENERAL

        public static int D_GENERAL
        general debugging.
      • m_CurrDebugFlag

        public boolean m_CurrDebugFlag
        Flag: I'm debugging.
    • Constructor Detail

      • XMeans

        public XMeans()
        the default constructor.
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing this clusterer.
        Returns:
        a description of the evaluator suitable for displaying in the explorer/experimenter gui
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Specified by:
        getTechnicalInformation in interface TechnicalInformationHandler
        Returns:
        the technical information about this class
      • buildClusterer

        public void buildClusterer​(Instances data)
                            throws java.lang.Exception
        Generates the X-Means clusterer.
        Specified by:
        buildClusterer in interface Clusterer
        Specified by:
        buildClusterer in class AbstractClusterer
        Parameters:
        data - set of instances serving as training data
        Throws:
        java.lang.Exception - if the clusterer has not been generated successfully
      • checkForNominalAttributes

        public boolean checkForNominalAttributes​(Instances data)
        Checks for nominal attributes in the dataset. Class attribute is ignored.
        Parameters:
        data - the data to check
        Returns:
        false if no nominal attributes are present
      • clusterInstance

        public int clusterInstance​(Instance instance)
                            throws java.lang.Exception
        Classifies a given instance.
        Specified by:
        clusterInstance in interface Clusterer
        Overrides:
        clusterInstance in class AbstractClusterer
        Parameters:
        instance - the instance to be assigned to a cluster
        Returns:
        the number of the assigned cluster as an integer if the class is enumerated, otherwise the predicted value
        Throws:
        java.lang.Exception - if instance could not be classified successfully
      • minNumClustersTipText

        public java.lang.String minNumClustersTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property
      • setMinNumClusters

        public void setMinNumClusters​(int n)
        Sets the minimum number of clusters to generate.
        Parameters:
        n - the minimum number of clusters to generate
      • getMinNumClusters

        public int getMinNumClusters()
        Gets the minimum number of clusters to generate.
        Returns:
        the minimum number of clusters to generate
      • maxNumClustersTipText

        public java.lang.String maxNumClustersTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property
      • setMaxNumClusters

        public void setMaxNumClusters​(int n)
        Sets the maximum number of clusters to generate.
        Parameters:
        n - the maximum number of clusters to generate
      • getMaxNumClusters

        public int getMaxNumClusters()
        Gets the maximum number of clusters to generate.
        Returns:
        the maximum number of clusters to generate
      • maxIterationsTipText

        public java.lang.String maxIterationsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property
      • setMaxIterations

        public void setMaxIterations​(int i)
                              throws java.lang.Exception
        Sets the maximum number of iterations to perform.
        Parameters:
        i - the number of iterations
        Throws:
        java.lang.Exception - if i is less than 1
      • getMaxIterations

        public int getMaxIterations()
        Gets the maximum number of iterations.
        Returns:
        the number of iterations
      • maxKMeansTipText

        public java.lang.String maxKMeansTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property
      • setMaxKMeans

        public void setMaxKMeans​(int i)
        Set the maximum number of iterations to perform in KMeans.
        Parameters:
        i - the number of iterations
      • getMaxKMeans

        public int getMaxKMeans()
        Gets the maximum number of iterations in KMeans.
        Returns:
        the number of iterations
      • maxKMeansForChildrenTipText

        public java.lang.String maxKMeansForChildrenTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property
      • setMaxKMeansForChildren

        public void setMaxKMeansForChildren​(int i)
        Sets the maximum number of iterations KMeans that is performed on the child centers.
        Parameters:
        i - the number of iterations
      • getMaxKMeansForChildren

        public int getMaxKMeansForChildren()
        Gets the maximum number of iterations in KMeans.
        Returns:
        the number of iterations
      • cutOffFactorTipText

        public java.lang.String cutOffFactorTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property
      • setCutOffFactor

        public void setCutOffFactor​(double i)
        Sets a new cutoff factor.
        Parameters:
        i - the new cutoff factor
      • getCutOffFactor

        public double getCutOffFactor()
        Gets the cutoff factor.
        Returns:
        the cutoff factor
      • binValueTipText

        public java.lang.String binValueTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getBinValue

        public double getBinValue()
        Gets value that represents true in a new numeric attribute. (False is always represented by 0.0.)
        Returns:
        the value that represents true in a new numeric attribute
      • setBinValue

        public void setBinValue​(double value)
        Sets the distance value between true and false of binary attributes. and "same" and "different" of nominal attributes
        Parameters:
        value - the distance
      • distanceFTipText

        public java.lang.String distanceFTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDistanceF

        public void setDistanceF​(DistanceFunction distanceF)
        gets the "binary" distance value.
        Parameters:
        distanceF - the distance function with all options set
      • getDistanceF

        public DistanceFunction getDistanceF()
        Gets the distance function.
        Returns:
        the distance function
      • debugVectorsFileTipText

        public java.lang.String debugVectorsFileTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDebugVectorsFile

        public void setDebugVectorsFile​(java.io.File value)
        Sets the file that has the random vectors stored. Only used for debugging reasons.
        Parameters:
        value - the file to read the random vectors from
      • getDebugVectorsFile

        public java.io.File getDebugVectorsFile()
        Gets the file name for a file that has the random vectors stored. Only used for debugging purposes.
        Returns:
        the file to read the vectors from
      • initDebugVectorsInput

        public void initDebugVectorsInput()
                                   throws java.lang.Exception
        Initialises the debug vector input.
        Throws:
        java.lang.Exception - if there is error opening the debug input file.
      • getNextDebugVectorsInstance

        public Instance getNextDebugVectorsInstance​(Instances model)
                                             throws java.lang.Exception
        Read an instance from debug vectors file.
        Parameters:
        model - the data model for the instance.
        Returns:
        the next debug vector.
        Throws:
        java.lang.Exception - if there are no debug vector in m_DebugVectors.
      • inputCenterFileTipText

        public java.lang.String inputCenterFileTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setInputCenterFile

        public void setInputCenterFile​(java.io.File value)
        Sets the file to read the list of centers from.
        Parameters:
        value - the file to read centers from
      • getInputCenterFile

        public java.io.File getInputCenterFile()
        Gets the file to read the list of centers from.
        Returns:
        the file to read the centers from
      • outputCenterFileTipText

        public java.lang.String outputCenterFileTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setOutputCenterFile

        public void setOutputCenterFile​(java.io.File value)
        Sets file to write the list of centers to.
        Parameters:
        value - file to write centers to
      • getOutputCenterFile

        public java.io.File getOutputCenterFile()
        Gets the file to write the list of centers to.
        Returns:
        filename of the file to write centers to
      • KDTreeTipText

        public java.lang.String KDTreeTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setKDTree

        public void setKDTree​(KDTree k)
        Sets the KDTree class.
        Parameters:
        k - a KDTree object with all options set
      • getKDTree

        public KDTree getKDTree()
        Gets the KDTree class.
        Returns:
        the configured KDTree
      • useKDTreeTipText

        public java.lang.String useKDTreeTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setUseKDTree

        public void setUseKDTree​(boolean value)
        Sets whether to use the KDTree or not.
        Parameters:
        value - if true the KDTree is used
      • getUseKDTree

        public boolean getUseKDTree()
        Gets whether the KDTree is used or not.
        Returns:
        true if KDTrees are used
      • debugLevelTipText

        public java.lang.String debugLevelTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDebugLevel

        public void setDebugLevel​(int d)
        Sets the debug level. debug level = 0, means no output
        Parameters:
        d - debuglevel
      • getDebugLevel

        public int getDebugLevel()
        Gets the debug level.
        Returns:
        debug level
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options.

        Valid options are:

         -I <num>
          maximum number of overall iterations
          (default 1).
         -M <num>
          maximum number of iterations in the kMeans loop in
          the Improve-Parameter part 
          (default 1000).
         -J <num>
          maximum number of iterations in the kMeans loop
          for the splitted centroids in the Improve-Structure part 
          (default 1000).
         -L <num>
          minimum number of clusters
          (default 2).
         -H <num>
          maximum number of clusters
          (default 4).
         -B <value>
          distance value for binary attributes
          (default 1.0).
         -use-kdtree
          Uses the KDTree internally
          (default no).
         -K <KDTree class specification>
          Full class name of KDTree class to use, followed
          by scheme options.
          eg: "weka.core.neighboursearch.kdtrees.KDTree -P"
          (default no KDTree class used).
         -C <value>
          cutoff factor, takes the given percentage of the splitted 
          centroids if none of the children win
          (default 0.0).
         -D <distance function class specification>
          Full class name of Distance function class to use, followed
          by scheme options.
          (default weka.core.EuclideanDistance).
         -N <file name>
          file to read starting centers from (ARFF format).
         -O <file name>
          file to write centers to (ARFF format).
         -U <int>
          The debug level.
          (default 0)
         -Y <file name>
          The debug vectors file.
         -S <num>
          Random number seed.
          (default 10)
        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class RandomizableClusterer
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • toString

        public java.lang.String toString()
        Return a string describing this clusterer.
        Overrides:
        toString in class java.lang.Object
        Returns:
        a description of the clusterer as a string
      • getClusterCenters

        public Instances getClusterCenters()
        Return the centers of the clusters as an Instances object
        Returns:
        the cluster centers.
      • main

        public static void main​(java.lang.String[] argv)
        Main method for testing this class.
        Parameters:
        argv - should contain options