4.12.09

Learning over Social Network Datasets & Opinion Mining post with Rapid Miner

A recent URL received via FB via TW from Jose Carlos Cortizo has driven my attention into Social Network Datasets. It begins with the Social Tagging list by Markus Strohmaier at his blog Intentialicious. But the comments include several other lists of datasets, most prominently:

What in fact leads me to make the question: do opinion mining datasets apply? OK, it is not Social Tagging (at least, not thematic tags). So I share the two only Opinion Mining/Sentiment Analysis datasets in Spanish I am aware of:

And after a quick search on Opinion Mining datasets, I have found the blog by Bruno Ohana with a very interesting post which presents a short tutorial about Opinion Mining with Rapid Miner. While in fact it is not really Opinion Mining (as it does not use any sentiment features, it just approaches the task as any classification task: bag of words, etc.), I see it very interesting because it is a great tutorial on using this suite to make text classification!

3 comentarios:

JoSeK dijo...

Great post! It's needed more available datasets on Sentiment Analysis and Opinion Mining in order to foster the research in those areas.

Iosu Santurtun dijo...

thx! not so many spanish datasets around the interwebs ;)

Jose Maria Gomez Hidalgo dijo...

Thanks for your support. There are new datasets in Spanish, some of them in Social Networks:

* The TASS Corpus for the Workshop on Sentiment Analysis at SEPLN, sentiment analysis in Spanish on Twitter: http://www.daedalus.es/TASS2013/corpus.php

* The PAN 2012 Sexual Predator Identification task involves a corpus which includes chats in Spanish: http://www.uni-weimar.de/medien/webis/research/events/pan-12/pan12-web/authorship.html#corpus

* The PAN 2013 Author Profiling task includes texts in Spanish as well: http://www.uni-weimar.de/medien/webis/research/events/pan-13/pan13-web/author-identification.html

If you know about more labeled/tagged corpora in Spanish, please let me know.