LETOR - Benchmark for Learning to Rank for Information Retrieval

LETOR is a package of benchmark data sets for the research on LEarning TO Rank, released from Microsoft Research Asia. This dataset contains standard features, relevance judgments, data partitioning, evaluation tools, and several baselines, for the OHSUMED data collection and the '.gov' data collection.

The version 1.0 of LETOR was released in March 2007, and the version 2.0 was released in Jan. 2008. Since the release of LETOR 2.0, we have received valuable feedback from many people, such as bug reports, feasibility studies of the tools, and so on. Based on the feedback, the project of LETOR 3.0 was launched several months ago. Now LETOR 3.0 is available at http://research.microsoft.com/users/LETOR/.

LETOR3.0 contains the following significant updates:

  1. Four new datasets were added: homepage finding 2003, homepage finding 2004, named page finding 2003 and named page finding 2004. Plus the three datasets (OHSUMED, topic distillation 2003 and topic distillation 2004) in LETOR 2.0, there are seven datasets in LETOR3.0;
  2. More reasonable document sampling strategy was adopted. As a result, there are some changes on the documents associated with each query in the three datasets in LETOR 2.0.
  3. More features for learning were extracted.
  4. Meta data for each document was provided to enable research on features for learning to rank.
  5. Experimental results for baseline algorithms were provided.

Very, very interesting :-)