Class BayesianLogisticRegression

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, CapabilitiesHandler, OptionHandler, RevisionHandler, TechnicalInformationHandler

    public class BayesianLogisticRegression
    extends Classifier
    implements OptionHandler, TechnicalInformationHandler
    Implements Bayesian Logistic Regression for both Gaussian and Laplace Priors.

    For more information, see

    Alexander Genkin, David D. Lewis, David Madigan (2004). Large-scale bayesian logistic regression for text categorization. URL http://www.stat.rutgers.edu/~madigan/PAPERS/shortFat-v3a.pdf.

    BibTeX:

     @techreport{Genkin2004,
        author = {Alexander Genkin and David D. Lewis and David Madigan},
        institution = {DIMACS},
        title = {Large-scale bayesian logistic regression for text categorization},
        year = {2004},
        URL = {http://www.stat.rutgers.edu/\~madigan/PAPERS/shortFat-v3a.pdf}
     }
     

    Version:
    $Revision: 7984 $
    Author:
    Navendu Garg (gargnav at iit dot edu)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      double[] BetaVector
      Array for storing coefficients of Bayesian regression model.
      double Change
      This variable is used to keep track of change in the value of delta summation of r(i).
      int ClassIndex
      The class index from the training data
      static int CV_BASED  
      double[] Delta
      Trust Region Radius
      double[] DeltaBeta
      Array to store Regression Coefficient updates.
      double[] DeltaR
      This vector is used to store the increments on the R(i).
      double[] DeltaUpdate
      Trust Region Radius Update
      static int GAUSSIAN
      Distributions available
      java.lang.String HyperparameterRange
      CV Hyperparameter Range
      double[] Hyperparameters
      Array to store Hyperparameter values for each feature.
      int HyperparameterSelection
      Hyperparameter selection method
      double HyperparameterValue
      Best hyperparameter for test phase
      static double[] InputHyperparameterValues
      Set of values to be used as hyperparameter values during Cross-Validation.
      int iterationCounter
      Iteration counter
      static int LAPLACIAN  
      static double[] LogLikelihood
      Log-likelihood values to be used to choose the best hyperparameter.
      Filter m_Filter
      Filter interface used to point to weka.filters.unsupervised.attribute.Normalize object
      int m_seed
      seed for randomizing the instances before CV
      int maxIterations
      Maximum number of iterations
      static int NORM_BASED
      Methods for selecting the hyperparameter value
      boolean NormalizeData
      Choose whether to normalize data or not
      int NumFolds
      NumFolds for CV based Hyperparameters selection
      int PriorClass
      Distribution Prior class
      double[] R
      R(i)= BetaVector X x(i) X y(i).
      static int SPECIFIC_VALUE  
      static Tag[] TAGS_HYPER_METHOD  
      static Tag[] TAGS_PRIOR  
      double Threshold
      Threshold for binary classification of probabilisitic estimate
      double Tolerance
      Tolerance criteria for the stopping criterion.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static double bigF​(double r, double sigma)
      This is a convient function that defines and upper bound (Delta>0) for values of r(i) reachable by updates in the trust region.
      void buildClassifier​(Instances data)
      (1) Set the data to the class attribute m_Instances. (2)Call the method initialize() to initialize the values.
      double classifyInstance​(Instance instance)
      Classifies the given instance using the Bayesian Logistic Regression function.
      static double classSgn​(double value)
      This class is used to mask the internal class labels.
      double CVBasedHyperparameter()
      Method computes the best hyperparameter value by doing cross -validation on the training data and compute the likelihood.
      java.lang.String debugTipText()
      Returns the tip text for this property
      Capabilities getCapabilities()
      This method tests what kind of data this classifier can handle.
      java.lang.String getHyperparameterRange()
      Get the range of hyperparameter values to consider during CV-based selection.
      SelectedTag getHyperparameterSelection()
      Get the method used to select the hyperparameter
      double getHyperparameterValue()
      Get the hyperparameter value.
      double getLoglikeliHood​(double[] betas, Instances instances)  
      int getMaxIterations()
      Get the maximum number of iterations to perform
      int getNumFolds()
      Return the number of folds for CV-based hyperparameter selection
      java.lang.String[] getOptions()
      Gets the current settings of the Classifier.
      SelectedTag getPriorClass()
      Get the type of prior to use.
      java.lang.String getRevision()
      Returns the revision string.
      int getSeed()
      Get the seed for randomizing the instances for CV-based hyperparameter selection
      TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      double getThreshold()
      Return the threshold being used.
      double getTolerance()
      Get the tolerance value
      java.lang.String globalInfo()  
      java.lang.String hyperparameterRangeTipText()
      Returns the tip text for this property
      java.lang.String hyperparameterSelectionTipText()
      Returns the tip text for this property
      java.lang.String hyperparameterValueTipText()
      Returns the tip text for this property
      void initialize()
      (1)Initialize m_Beta[j] to 0.
      boolean isDebug()
      Returns true if debug is turned on.
      boolean isNormalizeData()
      Returns true if the data is to be normalized first
      java.util.Enumeration listOptions()
      Returns an enumeration describing the available options.
      static double logisticLinkFunction​(double r)
      This method computes the values for the logistic link function.
      static void main​(java.lang.String[] argv)
      Main method for testing this class.
      java.lang.String maxIterationsTipText()
      Returns the tip text for this property
      java.lang.String normalizeDataTipText()
      Returns the tip text for this property
      double normBasedHyperParameter()
      This function computes the norm-based hyperparameters and stores them in the m_Hyperparameters.
      java.lang.String numFoldsTipText()
      Returns the tip text for this property
      java.lang.String priorClassTipText()
      Returns the tip text for this property
      java.lang.String seedTipText()
      Returns the tip text for this property
      void setDebug​(boolean debugMode)
      Set debugging mode.
      void setHyperparameterRange​(java.lang.String hyperparameterRange)
      Set the range of hyperparameter values to consider during CV-based selection
      void setHyperparameterSelection​(SelectedTag newMethod)
      Set the method used to select the hyperparameter
      void setHyperparameterValue​(double hyperparameterValue)
      Set the hyperparameter value.
      void setMaxIterations​(int maxIterations)
      Set the maximum number of iterations to perform
      void setNormalizeData​(boolean normalizeData)
      Set whether to normalize the data or not
      void setNumFolds​(int numFolds)
      Set the number of folds to use for CV-based hyperparameter selection
      void setOptions​(java.lang.String[] options)
      Parses a given list of options.
      void setPriorClass​(SelectedTag newMethod)
      Set the type of prior to use.
      void setSeed​(int seed)
      Set the seed for randomizing the instances for CV-based hyperparameter selection
      void setThreshold​(double threshold)
      Set the threshold to use.
      void setTolerance​(double tolerance)
      Set the tolerance value
      static double sgn​(double r)
      Sign for a given value.
      boolean stoppingCriterion()
      This method implements the stopping criterion function.
      java.lang.String thresholdTipText()
      Returns the tip text for this property
      java.lang.String toleranceTipText()
      Returns the tip text for this property
      java.lang.String toString()
      Outputs the linear regression model as a string.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • LogLikelihood

        public static double[] LogLikelihood
        Log-likelihood values to be used to choose the best hyperparameter.
      • InputHyperparameterValues

        public static double[] InputHyperparameterValues
        Set of values to be used as hyperparameter values during Cross-Validation.
      • NormalizeData

        public boolean NormalizeData
        Choose whether to normalize data or not
      • Tolerance

        public double Tolerance
        Tolerance criteria for the stopping criterion.
      • Threshold

        public double Threshold
        Threshold for binary classification of probabilisitic estimate
      • TAGS_PRIOR

        public static final Tag[] TAGS_PRIOR
      • PriorClass

        public int PriorClass
        Distribution Prior class
      • NumFolds

        public int NumFolds
        NumFolds for CV based Hyperparameters selection
      • m_seed

        public int m_seed
        seed for randomizing the instances before CV
      • NORM_BASED

        public static final int NORM_BASED
        Methods for selecting the hyperparameter value
        See Also:
        Constant Field Values
      • TAGS_HYPER_METHOD

        public static final Tag[] TAGS_HYPER_METHOD
      • HyperparameterSelection

        public int HyperparameterSelection
        Hyperparameter selection method
      • ClassIndex

        public int ClassIndex
        The class index from the training data
      • HyperparameterValue

        public double HyperparameterValue
        Best hyperparameter for test phase
      • HyperparameterRange

        public java.lang.String HyperparameterRange
        CV Hyperparameter Range
      • maxIterations

        public int maxIterations
        Maximum number of iterations
      • iterationCounter

        public int iterationCounter
        Iteration counter
      • BetaVector

        public double[] BetaVector
        Array for storing coefficients of Bayesian regression model.
      • DeltaBeta

        public double[] DeltaBeta
        Array to store Regression Coefficient updates.
      • DeltaUpdate

        public double[] DeltaUpdate
        Trust Region Radius Update
      • Delta

        public double[] Delta
        Trust Region Radius
      • Hyperparameters

        public double[] Hyperparameters
        Array to store Hyperparameter values for each feature.
      • R

        public double[] R
        R(i)= BetaVector X x(i) X y(i). This an intermediate value with respect to vector BETA, input values and corresponding class labels
      • DeltaR

        public double[] DeltaR
        This vector is used to store the increments on the R(i). It is also used to determining the stopping criterion.
      • Change

        public double Change
        This variable is used to keep track of change in the value of delta summation of r(i).
      • m_Filter

        public Filter m_Filter
        Filter interface used to point to weka.filters.unsupervised.attribute.Normalize object
    • Constructor Detail

      • BayesianLogisticRegression

        public BayesianLogisticRegression()
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
      • initialize

        public void initialize()
                        throws java.lang.Exception
         (1)Initialize m_Beta[j] to 0.
         (2)Initialize m_DeltaUpdate[j].
         
        Throws:
        java.lang.Exception
      • buildClassifier

        public void buildClassifier​(Instances data)
                             throws java.lang.Exception
        • (1) Set the data to the class attribute m_Instances.
        • (2)Call the method initialize() to initialize the values.
        Specified by:
        buildClassifier in class Classifier
        Parameters:
        data - training data
        Throws:
        java.lang.Exception - if classifier can't be built successfully.
      • classSgn

        public static double classSgn​(double value)
        This class is used to mask the internal class labels.
        Parameters:
        value - internal class label
        Returns:
         
        • -1 for internal class label 0
        • +1 for internal class label 1
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Specified by:
        getTechnicalInformation in interface TechnicalInformationHandler
        Returns:
        the technical information about this class
      • bigF

        public static double bigF​(double r,
                                  double sigma)
        This is a convient function that defines and upper bound (Delta>0) for values of r(i) reachable by updates in the trust region. r BetaVector X x(i)y(i). delta A parameter where sigma > 0
        Returns:
        double function value
      • stoppingCriterion

        public boolean stoppingCriterion()
        This method implements the stopping criterion function.
        Returns:
        boolean whether to stop or not.
      • logisticLinkFunction

        public static double logisticLinkFunction​(double r)
        This method computes the values for the logistic link function.
        f(r)=exp(r)/(1+exp(r))
        Returns:
        output value
      • sgn

        public static double sgn​(double r)
        Sign for a given value.
        Parameters:
        r -
        Returns:
        double +1 if r>0, -1 if r<0
      • normBasedHyperParameter

        public double normBasedHyperParameter()
        This function computes the norm-based hyperparameters and stores them in the m_Hyperparameters.
      • classifyInstance

        public double classifyInstance​(Instance instance)
                                throws java.lang.Exception
        Classifies the given instance using the Bayesian Logistic Regression function.
        Overrides:
        classifyInstance in class Classifier
        Parameters:
        instance - the test instance
        Returns:
        the classification
        Throws:
        java.lang.Exception - if classification can't be done successfully
      • toString

        public java.lang.String toString()
        Outputs the linear regression model as a string.
        Overrides:
        toString in class java.lang.Object
        Returns:
        the model as string
      • CVBasedHyperparameter

        public double CVBasedHyperparameter()
                                     throws java.lang.Exception
        Method computes the best hyperparameter value by doing cross -validation on the training data and compute the likelihood. The method can parse a range of values or a list of values.
        Returns:
        Best hyperparameter value with the max likelihood value on the training data.
        Throws:
        java.lang.Exception
      • getLoglikeliHood

        public double getLoglikeliHood​(double[] betas,
                                       Instances instances)
        Returns:
        likelihood for a given set of betas and instances
      • listOptions

        public java.util.Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface OptionHandler
        Overrides:
        listOptions in class Classifier
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options.

        Valid options are:

         -D
          Show Debugging Output
         
         -P <integer>
          Distribution of the Prior (1=Gaussian, 2=Laplacian)
          (default: 1=Gaussian)
         -H <integer>
          Hyperparameter Selection Method (1=Norm-based, 2=CV-based, 3=specific value)
          (default: 1=Norm-based)
         -V <double>
          Specified Hyperparameter Value (use in conjunction with -H 3)
          (default: 0.27)
         -R <string>
          Hyperparameter Range (use in conjunction with -H 2)
          (format: R:start-end,multiplier OR L:val(1), val(2), ..., val(n))
          (default: R:0.01-316,3.16)
         -Tl <double>
          Tolerance Value
          (default: 0.0005)
         -S <double>
          Threshold Value
          (default: 0.5)
         -F <integer>
          Number Of Folds (use in conjuction with -H 2)
          (default: 2)
         -I <integer>
          Max Number of Iterations
          (default: 100)
         -N
          Normalize the data
         -seed <number>
          Seed for randomizing instances order
          in CV-based hyperparameter selection
          (default: 1)
        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class Classifier
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • getOptions

        public java.lang.String[] getOptions()
        Description copied from class: Classifier
        Gets the current settings of the Classifier.
        Specified by:
        getOptions in interface OptionHandler
        Overrides:
        getOptions in class Classifier
        Returns:
        an array of strings suitable for passing to setOptions
      • main

        public static void main​(java.lang.String[] argv)
        Main method for testing this class.
        Parameters:
        argv - the options
      • debugTipText

        public java.lang.String debugTipText()
        Returns the tip text for this property
        Overrides:
        debugTipText in class Classifier
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDebug

        public void setDebug​(boolean debugMode)
        Description copied from class: Classifier
        Set debugging mode.
        Overrides:
        setDebug in class Classifier
        Parameters:
        debugMode - true if debug output should be printed
      • hyperparameterSelectionTipText

        public java.lang.String hyperparameterSelectionTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getHyperparameterSelection

        public SelectedTag getHyperparameterSelection()
        Get the method used to select the hyperparameter
        Returns:
        the method used to select the hyperparameter
      • setHyperparameterSelection

        public void setHyperparameterSelection​(SelectedTag newMethod)
        Set the method used to select the hyperparameter
        Parameters:
        newMethod - the method used to set the hyperparameter
      • priorClassTipText

        public java.lang.String priorClassTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setPriorClass

        public void setPriorClass​(SelectedTag newMethod)
        Set the type of prior to use.
        Parameters:
        newMethod - the type of prior to use.
      • getPriorClass

        public SelectedTag getPriorClass()
        Get the type of prior to use.
        Returns:
        the type of prior to use
      • thresholdTipText

        public java.lang.String thresholdTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getThreshold

        public double getThreshold()
        Return the threshold being used.
        Returns:
        the threshold
      • setThreshold

        public void setThreshold​(double threshold)
        Set the threshold to use.
        Parameters:
        threshold - the threshold to use
      • toleranceTipText

        public java.lang.String toleranceTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getTolerance

        public double getTolerance()
        Get the tolerance value
        Returns:
        the tolerance value
      • setTolerance

        public void setTolerance​(double tolerance)
        Set the tolerance value
        Parameters:
        tolerance - the tolerance value to use
      • hyperparameterValueTipText

        public java.lang.String hyperparameterValueTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getHyperparameterValue

        public double getHyperparameterValue()
        Get the hyperparameter value. Used when the hyperparameter selection method is set to specific value
        Returns:
        the hyperparameter value
      • setHyperparameterValue

        public void setHyperparameterValue​(double hyperparameterValue)
        Set the hyperparameter value. Used when the hyperparameter selection method is set to specific value
        Parameters:
        hyperparameterValue - the value of the hyperparameter
      • numFoldsTipText

        public java.lang.String numFoldsTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getNumFolds

        public int getNumFolds()
        Return the number of folds for CV-based hyperparameter selection
        Returns:
        the number of CV folds
      • setNumFolds

        public void setNumFolds​(int numFolds)
        Set the number of folds to use for CV-based hyperparameter selection
        Parameters:
        numFolds - number of folds to select
      • seedTipText

        public java.lang.String seedTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setSeed

        public void setSeed​(int seed)
        Set the seed for randomizing the instances for CV-based hyperparameter selection
        Parameters:
        seed - the seed to use
      • getSeed

        public int getSeed()
        Get the seed for randomizing the instances for CV-based hyperparameter selection
        Returns:
        the seed to use
      • maxIterationsTipText

        public java.lang.String maxIterationsTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getMaxIterations

        public int getMaxIterations()
        Get the maximum number of iterations to perform
        Returns:
        the maximum number of iterations
      • setMaxIterations

        public void setMaxIterations​(int maxIterations)
        Set the maximum number of iterations to perform
        Parameters:
        maxIterations - maximum number of iterations
      • normalizeDataTipText

        public java.lang.String normalizeDataTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • isNormalizeData

        public boolean isNormalizeData()
        Returns true if the data is to be normalized first
        Returns:
        true if the data is to be normalized
      • setNormalizeData

        public void setNormalizeData​(boolean normalizeData)
        Set whether to normalize the data or not
        Parameters:
        normalizeData - true if data is to be normalized
      • hyperparameterRangeTipText

        public java.lang.String hyperparameterRangeTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getHyperparameterRange

        public java.lang.String getHyperparameterRange()
        Get the range of hyperparameter values to consider during CV-based selection.
        Returns:
        the range of hyperparameters as a Stringe
      • setHyperparameterRange

        public void setHyperparameterRange​(java.lang.String hyperparameterRange)
        Set the range of hyperparameter values to consider during CV-based selection
        Parameters:
        hyperparameterRange - the range of hyperparameter values
      • isDebug

        public boolean isDebug()
        Returns true if debug is turned on.
        Returns:
        true if debug is turned on