weka.classifiers.rules
Class FRIP

java.lang.Object
  extended by weka.classifiers.Classifier
      extended by weka.classifiers.rules.FRIP
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, weka.core.AdditionalMeasureProducer, weka.core.CapabilitiesHandler, weka.core.OptionHandler, weka.core.RevisionHandler, weka.core.TechnicalInformationHandler, weka.core.WeightedInstancesHandler

public class FRIP
extends weka.classifiers.Classifier
implements weka.core.OptionHandler, weka.core.AdditionalMeasureProducer, weka.core.WeightedInstancesHandler, weka.core.TechnicalInformationHandler

The FRip algorithm is a fuzzification of the JRip algorithm. It was designed not as a standalone classifier, but as a base-classifier for the FR3 (Fuzzy Round Robin algorithm). The main difference between FRip and JRip is that FRip is only a binary classifier which makes no use of default rules. Furthermore FRip has a changed pruning procedure, which means that the pruning during the IREP* runs was deactivated permanently. It was found out experimentally that this improved the classification rate. The following description from the JRip class was altered to describe the methodology of FRip:

Initialize RS = {}, and for each of both class DO:

1. Building stage:
Repeat 1.1 until the description length (DL) of the ruleset and examples is 64 bits greater than the smallest DL met so far, or there are no positive examples, or the error rate >= 50%.

1.1. Grow phase:
Grow one rule by greedily adding antecedents (or conditions) to the rule until the rule is perfect (i.e. 100% accurate). The procedure tries every possible value of each attribute and selects the condition with highest information gain: p(log(p/t)-log(P/T)).

2. Optimization stage:
after generating the initial ruleset {Ri}, generate and prune two variants of each rule Ri from randomized data using procedure 1.1 and X.1. But one variant is generated from an empty rule while the other is generated by greedily adding antecedents to the original rule. Moreover, the pruning metric used here is (TP+TN)/(P+N).Then the smallest possible DL for each variant and the original rule is computed. The variant with the minimal DL is selected as the final representative of Ri in the ruleset.After all the rules in {Ri} have been examined and if there are still residual positives, more rules are generated based on the residual positives using Building Stage again.
3. Delete the rules from the ruleset that would increase the DL of the whole ruleset if it were in it. and add resultant ruleset to RS.
ENDDO

Fuzzification of RS:
For each rule r in every ruleset in RS DO
4. Fuzzification of antecedents:
Apply greedy strategy to fuzzify the existing antecedents in r the following way:

4.1 Examine all possible support bounds and select the one which gains the highest purity on the training data.
4.2 Set the maximum support bound determined in 4.1 and restart with 4.1 but withouth the fuzzified antecedent.
5. Bounding of open rules:
Bound open sided rules by the last known instance value in that dimension.

6. Fuzzification of the bounded antecedent:
Fuzzify the bounded sides of the rule beyond the edge of the known dataspace.
ENDDO

X.1. Pruning:
Incrementally prune each rule and allow the pruning of any final sequences of the antecedents;The pruning metric is (p-n)/(p+n) -- but it's actually 2p/(p+n) -1, so in this implementation we simply use p/(p+n) (actually (p+1)/(p+n+2), thus if p+n is 0, it's 0.5).

BibTeX:

  * @article{huehn2008,
    author = {Jens Christian Hühn and Eyke Hüllermeier},
    journal = {},
    title = {FR3: A Fuzzy Rule Learner for Inducing Reliable Classifiers},
    year = {2008}
 }
 @inproceedings{Cohen1995,
    author = {William W. Cohen},
    booktitle = {Twelfth International Conference on Machine Learning},
    pages = {115-123},
    publisher = {Morgan Kaufmann},
    title = {Fast Effective Rule Induction},
    year = {1995}
 }
 

Valid options are:

 -F <number of folds>
  Set number of folds for REP
  One fold is used as pruning set.
  (default 3)
 -N <min. weights>
  Set the minimal weights of instances
  within a split.
  (default 2.0)
 -O <number of runs>
  Set the number of runs of
  optimizations. (Default: 2)
 -D
  Set whether turn on the
  debug mode (Default: false)
 -S <seed>
  The seed of randomization
  (Default: 1)
 -E
  Whether NOT check the error rate>=0.5
  in stopping criteria  (default: check)
Date created: 08/07/2008

Version:
$Revision: 1.0 $
Author:
Jens Christian Hühn (huehn@gmx.net)
See Also:
Serialized Form

Nested Class Summary
protected  class FRIP.Antd
          The single antecedent in the rule, which is composed of an attribute and the corresponding value.
protected  class FRIP.NominalAntd
          The antecedent with nominal attribute
protected  class FRIP.NumericAntd
          The antecedent with numeric attribute
protected  class FRIP.RipperRule
          This class implements a single rule that predicts specified class.
 
Field Summary
private  double[][] dataspaceEdges
          The edges of the known dataspace
private  boolean m_CheckErr
          Whether check the error rate >= 0.5 in stopping criteria
protected  weka.core.Attribute m_Class
          The class attribute of the data
 weka.core.Instances m_dataAllClasses
           
protected  boolean m_Debug
          Whether in a debug mode
protected  weka.core.FastVector m_Distributions
          The predicted class distribution
private  int m_Folds
          The number of folds to split data into Grow and Prune for IREP
(package private)  double m_MinNo
          The minimal number of instance weights within a split
private  int m_Optimizations
          Runs of optimizations
protected  java.util.Random m_Random
          Random object used in this class
protected  weka.core.FastVector m_Ruleset
          The ruleset
protected  weka.core.FastVector m_RulesetStats
          The RuleStats for the ruleset of each class value
protected  long m_Seed
          The seed to perform randomization
protected  double m_Total
          # of all the possible conditions in a rule
private static double MAX_DL_SURPLUS
          The limit of description length surplus in ruleset generation
private static long serialVersionUID
          for serialization
 
Constructor Summary
FRIP()
           
 
Method Summary
 void buildClassifier(weka.core.Instances instances)
          Builds Ripper: For each class it's built in three stages: building, optimization and fuzzification
 java.lang.String checkErrorRateTipText()
          Returns the tip text for this property
private  boolean checkStop(double[] rst, double minDL, double dl)
          Check whether the stopping criterion meets
 java.lang.String debugTipText()
          Returns the tip text for this property
 double[] distributionForInstance(weka.core.Instance datum)
          Classify the test instance with the rule learner and provide the class distributions
 java.util.Enumeration enumerateMeasures()
          Returns an enumeration of the additional measure names
 java.lang.String foldsTipText()
          Returns the tip text for this property
 weka.core.Capabilities getCapabilities()
          Returns default capabilities of the classifier.
 boolean getCheckErrorRate()
          Gets whether to check for error rate is in stopping criterion
 double[][] getDataspaceEdges()
          Gets the dataspace edges
 boolean getDebug()
          Gets whether debug information is output to the console
 int getFolds()
          Gets the number of folds
 double getMeasure(java.lang.String additionalMeasureName)
          Returns the value of the named measure
 double getMinNo()
          Gets the minimum total weight of the instances in a rule
 int getOptimizations()
          Gets the the number of optimization runs
 java.lang.String[] getOptions()
          Gets the current settings of the Classifier.
 java.lang.String getRevision()
          Returns the revision string.
 weka.core.FastVector getRuleset()
          Get the ruleset generated by FRipper
 weka.classifiers.rules.RuleStats getRuleStats(int pos)
          Get the statistics of the ruleset in the given position
 long getSeed()
          Gets the current seed value to use in randomizing the data
 weka.core.TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 java.lang.String globalInfo()
          Returns a string describing classifier
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options Valid options are: -F number
The number of folds for reduced error pruning.
 java.lang.String minNoTipText()
          Returns the tip text for this property
 java.lang.String optimizationsTipText()
          Returns the tip text for this property
protected  weka.core.Instances rulesetForOneClass(double expFPRate, weka.core.Instances data, double classIndex, double defDL)
          Build a ruleset for the given class according to the given data
 java.lang.String seedTipText()
          Returns the tip text for this property
 void setCheckErrorRate(boolean d)
          Sets whether to check for error rate is in stopping criterion
 void setDataspaceEdges(double[][] d)
          Sets the edges of the dataspace
 void setDebug(boolean d)
          Sets whether debug information is output to the console
 void setFolds(int fold)
          Sets the number of folds to use
 void setMinNo(double m)
          Sets the minimum total weight of the instances in a rule
 void setOptimizations(int run)
          Sets the number of optimization runs
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setSeed(long s)
          Sets the seed value to use in randomizing the data
 java.lang.String toString()
          Prints the all the rules of the rule learner.
 
Methods inherited from class weka.classifiers.Classifier
classifyInstance, forName, makeCopies, makeCopy, runClassifier
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

serialVersionUID

private static final long serialVersionUID
for serialization

See Also:
Constant Field Values

MAX_DL_SURPLUS

private static double MAX_DL_SURPLUS
The limit of description length surplus in ruleset generation


m_Class

protected weka.core.Attribute m_Class
The class attribute of the data


m_Ruleset

protected weka.core.FastVector m_Ruleset
The ruleset


m_Distributions

protected weka.core.FastVector m_Distributions
The predicted class distribution


m_Optimizations

private int m_Optimizations
Runs of optimizations


m_Random

protected java.util.Random m_Random
Random object used in this class


m_Total

protected double m_Total
# of all the possible conditions in a rule


m_Seed

protected long m_Seed
The seed to perform randomization


m_Folds

private int m_Folds
The number of folds to split data into Grow and Prune for IREP


m_MinNo

double m_MinNo
The minimal number of instance weights within a split


m_Debug

protected boolean m_Debug
Whether in a debug mode


m_CheckErr

private boolean m_CheckErr
Whether check the error rate >= 0.5 in stopping criteria


m_RulesetStats

protected weka.core.FastVector m_RulesetStats
The RuleStats for the ruleset of each class value


dataspaceEdges

private double[][] dataspaceEdges
The edges of the known dataspace


m_dataAllClasses

public weka.core.Instances m_dataAllClasses
Constructor Detail

FRIP

public FRIP()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing classifier

Returns:
a description suitable for displaying in the explorer/experimenter gui

getTechnicalInformation

public weka.core.TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface weka.core.TechnicalInformationHandler
Returns:
the technical information about this class

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options Valid options are:

-F number
The number of folds for reduced error pruning. One fold is used as the pruning set. (Default: 3)

-N number
The minimal weights of instances within a split. (Default: 2)

-O number
Set the number of runs of optimizations. (Default: 2)

-D
Whether turn on the debug mode -S number
The seed of randomization used in FRipper.(Default: 1)

-E
Whether NOT check the error rate >= 0.5 in stopping criteria. (default: check)

*

Specified by:
listOptions in interface weka.core.OptionHandler
Overrides:
listOptions in class weka.classifiers.Classifier
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -F <number of folds>
  Set number of folds for REP
  One fold is used as pruning set.
  (default 3)
 -N <min. weights>
  Set the minimal weights of instances
  within a split.
  (default 2.0)
 -O <number of runs>
  Set the number of runs of
  optimizations. (Default: 2)
 -D
  Set whether turn on the
  debug mode (Default: false)
 -S <seed>
  The seed of randomization
  (Default: 1)
 -E
  Whether NOT check the error rate>=0.5
  in stopping criteria  (default: check)
 -P
  Whether NOT use pruning
  (default: use pruning)

Specified by:
setOptions in interface weka.core.OptionHandler
Overrides:
setOptions in class weka.classifiers.Classifier
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Classifier.

Specified by:
getOptions in interface weka.core.OptionHandler
Overrides:
getOptions in class weka.classifiers.Classifier
Returns:
an array of strings suitable for passing to setOptions

enumerateMeasures

public java.util.Enumeration enumerateMeasures()
Returns an enumeration of the additional measure names

Specified by:
enumerateMeasures in interface weka.core.AdditionalMeasureProducer
Returns:
an enumeration of the measure names

getMeasure

public double getMeasure(java.lang.String additionalMeasureName)
Returns the value of the named measure

Specified by:
getMeasure in interface weka.core.AdditionalMeasureProducer
Parameters:
additionalMeasureName - the name of the measure to query for its value
Returns:
the value of the named measure
Throws:
java.lang.IllegalArgumentException - if the named measure is not supported

foldsTipText

public java.lang.String foldsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setFolds

public void setFolds(int fold)
Sets the number of folds to use

Parameters:
fold - the number of folds

getFolds

public int getFolds()
Gets the number of folds

Returns:
the number of folds

minNoTipText

public java.lang.String minNoTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setMinNo

public void setMinNo(double m)
Sets the minimum total weight of the instances in a rule

Parameters:
m - the minimum total weight of the instances in a rule

getMinNo

public double getMinNo()
Gets the minimum total weight of the instances in a rule

Returns:
the minimum total weight of the instances in a rule

seedTipText

public java.lang.String seedTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setSeed

public void setSeed(long s)
Sets the seed value to use in randomizing the data

Parameters:
s - the new seed value

getSeed

public long getSeed()
Gets the current seed value to use in randomizing the data

Returns:
the seed value

optimizationsTipText

public java.lang.String optimizationsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setOptimizations

public void setOptimizations(int run)
Sets the number of optimization runs

Parameters:
run - the number of optimization runs

getOptimizations

public int getOptimizations()
Gets the the number of optimization runs

Returns:
the number of optimization runs

debugTipText

public java.lang.String debugTipText()
Returns the tip text for this property

Overrides:
debugTipText in class weka.classifiers.Classifier
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setDebug

public void setDebug(boolean d)
Sets whether debug information is output to the console

Overrides:
setDebug in class weka.classifiers.Classifier
Parameters:
d - whether debug information is output to the console

getDebug

public boolean getDebug()
Gets whether debug information is output to the console

Overrides:
getDebug in class weka.classifiers.Classifier
Returns:
whether debug information is output to the console

checkErrorRateTipText

public java.lang.String checkErrorRateTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setCheckErrorRate

public void setCheckErrorRate(boolean d)
Sets whether to check for error rate is in stopping criterion

Parameters:
d - whether to check for error rate is in stopping criterion

getCheckErrorRate

public boolean getCheckErrorRate()
Gets whether to check for error rate is in stopping criterion

Returns:
true if checking for error rate is in stopping criterion

setDataspaceEdges

public void setDataspaceEdges(double[][] d)
Sets the edges of the dataspace

Parameters:
d - The edges of the dataspace

getDataspaceEdges

public double[][] getDataspaceEdges()
Gets the dataspace edges

Returns:
the dataspace edges

getRuleset

public weka.core.FastVector getRuleset()
Get the ruleset generated by FRipper

Returns:
the ruleset

getRuleStats

public weka.classifiers.rules.RuleStats getRuleStats(int pos)
Get the statistics of the ruleset in the given position

Parameters:
pos - the position of the stats, assuming correct
Returns:
the statistics of the ruleset in the given position

getCapabilities

public weka.core.Capabilities getCapabilities()
Returns default capabilities of the classifier.

Specified by:
getCapabilities in interface weka.core.CapabilitiesHandler
Overrides:
getCapabilities in class weka.classifiers.Classifier
Returns:
the capabilities of this classifier
See Also:
Capabilities

buildClassifier

public void buildClassifier(weka.core.Instances instances)
                     throws java.lang.Exception
Builds Ripper: For each class it's built in three stages: building, optimization and fuzzification

Specified by:
buildClassifier in class weka.classifiers.Classifier
Parameters:
instances - the training data
Throws:
java.lang.Exception - if classifier can't be built successfully

distributionForInstance

public double[] distributionForInstance(weka.core.Instance datum)
Classify the test instance with the rule learner and provide the class distributions

Overrides:
distributionForInstance in class weka.classifiers.Classifier
Parameters:
datum - the instance to be classified
Returns:
the distribution

rulesetForOneClass

protected weka.core.Instances rulesetForOneClass(double expFPRate,
                                                 weka.core.Instances data,
                                                 double classIndex,
                                                 double defDL)
                                          throws java.lang.Exception
Build a ruleset for the given class according to the given data

Parameters:
expFPRate - the expected FP/(FP+FN) used in DL calculation
data - the given data
classIndex - the given class index
defDL - the default DL in the data
Throws:
java.lang.Exception - if the ruleset can be built properly

checkStop

private boolean checkStop(double[] rst,
                          double minDL,
                          double dl)
Check whether the stopping criterion meets

Parameters:
rst - the statistic of the ruleset
minDL - the min description length so far
dl - the current description length of the ruleset
Returns:
true if stop criterion meets, false otherwise

toString

public java.lang.String toString()
Prints the all the rules of the rule learner.

Overrides:
toString in class java.lang.Object
Returns:
a textual description of the classifier

getRevision

public java.lang.String getRevision()
Description copied from interface: weka.core.RevisionHandler
Returns the revision string.

Specified by:
getRevision in interface weka.core.RevisionHandler
Returns:
the revision