28.1.09

Repeatability Guideline at KDD 2009

It was a (very positive) surprise for me to read the repeatability guideline at the Knowledge Discovery in Databases CFP for the 2009 edition (the most respected conference in Machine Learning, IMHO):

Repeatability is a cornerstone of any scientific endeavor. To ensure the long term viability of the research output of the SIGKDD community, we require open-source/public distribution of the code and the datasets. In those cases where this is not possible due to proprietary considerations, every effort should be made to provide the binary executable. If proprietary datasets are used, every effort should also be made to apply the approach to similar publicly available datasets. Furthermore, the description of experimental results in submitted papers should be accompanied by all relevant implementation details and exact parameter specifications.

It is great guideline, that will be considered in the evaluation of the submitted papers. I suppose that most of us has done the same mistake sometimes (or always), that is avoiding to provide enough details to make our experiments reproducible. And also we have seen this in hundreds of others' papers. If this is more than an event, if it becomes a trend, I believe that the field will improve greatly its methods. And not only this field, but others!