International Conference on Weblogging and Social Media - Data Challenge

Data Challenge - Call for Participation

http://www.icwsm.org/2009/data

Continuing the ICWSM tradition, ICWSM 2009 is making a dataset available to researchers in the blog and social media fields. Researchers are invited to download the dataset, explore it, learn something interesting about it, and submit a paper about it to ICWSM 2009.

The dataset, provided by Spinn3r.com, is a set of 44 million blog posts made between August 1st and October 1st, 2009. The post includes the text as syndicated, as well as metadata such as the blog's homepage, timestamps, etc. The data is formatted in XML and is further arranged into tiers approximating to some degree search engine ranking. The total size of the dataset is 142 GB uncompressed, (27 GB compressed).

For details on how to get the dataset, including a usage agreement, see the data page on the conference website, http://www.icwsm. org/2009/data/. There is also a mailing list and Google Code site for sharing ideas and resources.

This dataset spans a number of big news events (the Olympics; both US presidential nominating conventions; the beginnings of the financial
crisis; ...) as well as everything else you might expect to find posted to blogs. ICWSM invites research studies of this data, including but not limited to:

  • link analysis
  • social network extraction
  • tracing the evolution of news
  • blog search and filtering
  • psychological, sociological, ethnographic, or personality-based studies
  • analysis of influence among bloggers
  • blog summarization and discourse analysis

Instructions for submitting papers to ICWSM may be found at http://icwsm. org/2009/cfp.shtml. When submitting a paper, indicate that it makes use of the dataset. Dataset papers will be reviewed for the main conference, and additionally for presentation at the data challenge workshop to take place on May 20th, 2009 (the last day of the conference). It is anticipated that several dataset papers may appear in the main conference, and the data challenge workshop will provide an opportunity for in-depth discussion of the dataset in a more focused forum.

The organizers will be making a collaborative website available for sharing tools, indexes, or other extracts of the dataset. Please see
the ICWSM website for links.

Posted by Ian Soboroff, NIST, and Akshay Java, UMBC (ICWSM 2009 Data Chairs) to the WEBIR Mailing List.