Web Data Mining: Exploring Hyperlinks, Contents and Usage Data
Bing Liu, Springer, December, 2006
The book, recently re-edited, is a comprenhensive review of algorithms, techniques and datasets available for different types on Web Mining, including Web structure, Web content and Web usage mining. It covers a wide number of topics (chapters) like: Association Rules, Supervised and Unsupervised Learning Information Retrieval and Web Search, Link Analysis, Web Crawling, Opinion Mining, or Web Usage Mining.
Since the book is conceived as handbook for advanced students and above, it is supported by several slidesets by Bing Liu himself, which includes one on Opinion Mining (chapter 11) in particular. I have found these slides specially interesting because:
- The presentation is quite practical and application oriented, avoiding other traditional discussions on how emotions are represented in text, and so on.
- Is is quite affordable as well, with clear details on the basic techniques, and frequent references to more advanced developments.
- It separates long text (e.g. movie reviews) from short text (e.g. Twitter comments) techniques, with a clear motivation.
- And last, but not least, it is written by one of the "gurus" of the field, it is authoritative.
It has brought to me new ideas on how to approach Opinion Mining on short texts, specially a clever one that is to bootstrap on the features (words and phrases) instead of on texts; I mean, in each iteration, enrich the vocabulary at the same time the collection gets tagged. This is somehow borrowed from a work by Riloff and Wiebe at EMNLP'03, discussed in the slide 29.