Ted Pedersen: Empiricism is Not a Matter of Faith

Ted Pedersen, Associate Professor in the Department of Computer Science at the University of Minnesota, and Principal Investigator and Associate Fellow at the Minnesota Supercomputing Institute, has recently published a paper in Computational Linguistics, that is worth a comment. The reference of the paper is:

Ted Pedersen, Empiricism is Not a Matter of Faith, Computational Linguistics, Volume 34, Number 3, pp. 465-470, September 2008.

This simply wonderful paper deals with one of the dooms of most empiricist papers nowadays, which is "non reproducibility" of experiments. Our papers in statistical Natural Language Processing, Machine Learning, and other, usually show impressive tables with hundreds of interesting (statistically significant) numbers that support our new cutting-edge idea to improve a rather cryptic feature of a rather unused parser, tagger or such. Also they discuss the main ideas of our approach, with the obvious limitations of space. With this information... Can anybody reprogram our software and achieve the same results? Or improve them?

Unfortunately, nobody will be able to reproduce most of our experiments. Moreover, we will not be able either, because the intern or posdoc has left the team, the software was lost, etc. Ted only claims for:

  • Approaching experimental software development as it was going to be used by others. You do not have to worry too much about excessive "software engineering", but design and program it with other in mind. It is an work overhead, but you will be able to make your software available to other researchers in order to repeat your experiments and to get the same results. Also, you can improve it with time, and get more papers from it, possibly with other interested researchers.
  • Releasing software as opensource. This will make your software spread and improve, it will keep your copyright protected, you will be able to give your software (probably programmed in part by one of your students) to a student, as an introduction to the topic he/she will work in, and he/she will also improve the software.

A natural way is that reviewers of papers, as they are getting used with papers in which the software is released, will be asking new papers for the same. And experiments will be easily reproduced, and the field will improve greatly its methods.

Simple and strong ideas. Just a very good thought in this times in which we are too pressed for high impact papers and research proposals.