Nihil Obstat: A list of datasets for opinion mining in Twitter

10.1.13

A list of datasets for opinion mining in Twitter

In a recent thread at the SentimentAI group (list), a number of links to datasets for training / testing opinion mining / sentiment classifiers over Twitter have been contributed. I list them here for the case somebody considers this information useful:

Three datasets provided by Hassan Saif, including an annotated subset of the Stanford Twitter Sentiment Corpus, and two for the specific topics of the Health Care Reform and the Obama-McCain Debate.
The Stanford Twitter Corpus itself, provided by Alec Go and others at Sentiment140. You can download the ST Corpus directly (70Mb).
The Sanders Analytics Twitter Sentiment Corpus , provided by Niek Sanders.
The mejaj datasets , provided by Nibir Bora and others.
The SemEval-2013: Sentiment Analysis in Twitter evaluation campaign (or competition) dataset. Note the competition is still active, you can join it! Check the dates at the SemEval-2013 website.
The RepLab 2012 Profiling task dataset. The profiling task is a bit different from the standard sentiment classification task. For instance, factual tweets can imply bad reputation ("Lehmann Brothers goes bankrupt") and negative sentiment tweets can imply good reputation ("R.I.P. Michael Jackson. We'll miss you").
UPDATE (8/10/2013): Contributed by Eugenio Martínez Cámara (thanks!), the Spanish-language dataset used in the TASS workshop organized at the anual meeting of the SEPLN.

You can find the SentimentAI thread on Twitter datasets here.

8 comentarios:

Unknown dijo...: Gracias Jose, estoy iniciando mi projecto de investigacion relacionado con este tema, excelente me ha servido de mucho!

Muy buen blog!; 6:46 a. m.
Unknown dijo...: Este comentario ha sido eliminado por el autor.; 6:08 p. m.
Unknown dijo...: Este comentario ha sido eliminado por el autor.; 6:09 p. m.
Unknown dijo...: A la lista hay que incluir el corpus del de tweets en español del workshop TASS que organiza la SEPLN http://www.daedalus.es/TASS2013/corpus.php; 6:10 p. m.
Jose Maria Gomez Hidalgo dijo...: Gracias, Eugenio. ¡Agregado!; 8:22 p. m.
Unknown dijo...: Excelente recopilación, Jose María. Estamos trabajando en comparación de herramientas y estamos buscando corpus públicos que podamos usar con diferentes APIs.

Muchas gracias,
Antonio; 7:12 p. m.
Jose Maria Gomez Hidalgo dijo...: Muchas gracias, Antonio. me alegra mucho que os sea de utilidad.; 11:55 a. m.
Andres Mendez dijo...: Hola, visto que ya han pasado un par de años desde la publicación de este post, quisiera preguntar si hay algún curpus nuevo en español para hacer "sentiment analysis" ? (particularmente tweets)
Gracias; 8:32 p. m.

Publicar un comentario