If I had to decide which was going to be the topic of the next ECML/PKDD Discovery Challenge, I would have chosen this one.
The guys organizing the challenge have access to Bibsonomy data, a very interesting social networking site for sharing bookmarks and lists of literature. A site that, as many others, it has caught the attention of spammers. According to the statistics of the dataset, spammers have passed over that strange things called BibTeX records, and they have focused on tags and bookmarks.
- Number of legitimate tag assignments: 816,197 / Number of spam tag assignments: 13,258,759
- Number of legitimate bookmarks: 181,833 / Number of spam bookmarks: 2,059,991
- Number of legitimate BibTeX records: 219,417 / Number of spam BibTeX records: 716
The challenge will be at Antwerp, Belgium, on 15 Sept. 2008.