I have recently been asked about how to address the imbalanced class distribution problem using WEKA cost sensitive classifiers. In particular, the Weighting method supported by WEKA can be used to simulate stratification, avoiding donwsampling the majority classs, and thus taking adavantage of the full available data.
The idea is simple. WEKA supports increasing the weight of examples. If class A distribution is 1%, most classifiers would learn a trivial rejector, because it is 99% effective. But you can increase the weight of mistakes on class A (false negatives, FN), for instance in a 10:1 relation. The classifier will then try to avoid false negatives, because each one is equivalent to 10 false positives (FP).
To do this in WEKA, just on the Explorer:
- Load a data collection in the Preprocess tab, and go to the Classify tab.
- Select the meta.CostSensitiveClassifier, and click on it the classifier textbox to get its properties.
- Click on the cost matrix field, select a 2x2 matrix and configure the costs. For instance, set the FN to 10.0 and the FP to 1.0. True positives and negatives should be usually 0.0, because a success rarely has a cost.
- Click on the classifier field to select the appropriate classifier. Every classifier that tries to optimize accuracy or error can be cost-sensitive. In particular, all decision trees, rule learner, and even Support Vector Machines.
- Go on with your experiment.
Here you can see a capture of the cost matrix edition. Click on it to get a better view at Picasa.
This is my 5 cent tip for WEKA. More coming :-)
Powered by Zoundry