On thing that is fascinating about using Machine Learning or Knowledge Discovery in Databases, especially in text problems, is that with enough training data, surprisingly simple algorithms can achive excellent results. However, when you want to go further, that small points in the scale that can make your system being usable or completely unuseful, you often find outliers. Outliers are defined by Hawkins:
An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism. (Hawkins, D. 1980. Identification of Outliers. Chapman and Hall)
So I have put this topic in my research agenda, long ago, and I have recently happen to find an interesting tutorial by Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek about the topic:
Hans-Peter Kriegel, Peer Kröger, Arthur Zimek: Outlier Detection Techniques. Tutorial at 10th SIAM International Conference on Data Mining (SDM 2010), Columbus, Ohio, 2010. [ abstract | slides (pdf) ]
For those interested on attending to it alive, there is the oportunity at KDD 2010.