Nihil Obstat: diciembre 2010

30.12.10

Microsoft Research vs. Google Research - quick look to two systems

Two different ways of thinking (& acting) regarding research, but the same interests most of the time. You can get a list of Microsoft Research projects, two of them very interesting because I am used to rely on their competition, Google's ones:

Microsoft Bing Translator (vs. Google Translate). Although Google Translate has scored top in competitions (with some criticism about quality metrics, however), I believe this is clearly to explore. In next weeks I will be testing both on my daily writings, let's see what happens.
Microsoft Academic Search (vs. Google Scholar). Here we have an interesting point, as the result list for a search is quite different, perhaps reflecting two ways of doing things and even thinking about the business.

Comparing Academic vs. Scholar, on, of course, searching myself, I get the following two results lists (screenshots):

Microsoft Academic Search

MS Academic Search

Google Scholar

Google Scholar

Comparing:

MS Academic Search

Google Scholar

Less results, probably better precision

Typed results (persons, journals, etc.)

Structured view

More results, probably better recall

Less typed results (cites - The can be excluded)

Papers and cites mixed

For the purpose of getting an outlook of somebody's publication record, I like MS Academic Search more. For getting the most about impact, probably it is Google Scholar. In fact, I have had to restrict the search in Scholar using phase search ("word1 word2") in order to get (many) less false positives.

How to generate spam in Facebook (and other Social Networks)

This is not a bright discovery, moreover it is being done right now and it has been reported in several places. But I believe it is interesting... and I find funny to write about it.

The goal of spam is to get your potential reader receive and read the message. Well, the final goal is to get the fake product purchased, but you have to start getting the message read! Social Networks usually send you updates and other messages regarding what is happening with you and your contacts. For instance, somebody is now following you at Twitter, you have a message at LinkedIn, or you have been tagged in a picture at Facebook. The idea is using these messages to get your attention; you will read them, they are trusted by you and related to personal topics!

Just follow this procedure:

Get several accounts in the Social Network.
Collect a list of users to send the spam to.
If it is possible to personalize a "request contact" message, send messages to a hundred of other users. Not too many, this behavior will not pass unnoticed to the Social Network. E.g. this is possible in LinkedIn.
If it is possible to tag people in pictures, create several pictures with your favorite Rolex-Viagra-whatever message and the link, post several in each of your fake accounts, and tag each one with the target users (4-5 users per picture...). E.g. this is possible in Facebook.
(...) Check and use whatever other method to send an alert to a user in the Social Network, and exploit it in the same fashion.
Obviously, you will be getting your accounts blocked. Get more!

In order to make this effectively, you need to automate all steps in the process. Using the APIs, it is not difficult, but you have to automatically solve several CAPTCHAs in the way. Check how at "Strong CAPTCHA Guidelines".

What makes me smile about this is that spam has been getting more and more about social engineering, and what Social Networks enable is just that! OK, perhaps these attacks are not so feasible...

29.12.10

Forthcoming CFPs

Here is a list of forthcoming CFPs:

Ninth International Workshop on Content-Based Multimedia Indexing, 13-15 June 2011, Madrid, Spain. Deadline: 14 January 2011
Twenty-Eighth International Conference on Machine Learning (ICML), 28 June - 02 July, 2011. Deadline: 01 February 2011
Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI), 16-22 July, 2011, Barcelona, Spain. Deadline: 24 January, 2011
- Fifth International AAAI Conference on Weblogs and Social Media, 17-21 July 2011, Barcelona, Spain. Deadline: 31 January, 2011
- Check other co-located events that may be of interest.
Virus Bulletin 2011, 5-7 October 2011, Barcelona, Spain. Deadline: 11 March 2011

28.12.10

CFP: Second Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA)

Second Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA)
To be held in conjuntion with the ACL-HLT 2011 Joint Conference
Portland, Oregon, USA, June 24, 2010

Scope

Recent years have marked the beginning and expansion of the Social Web, in which people freely express and respond to opinion on a whole variety of topics. While the growing volume of subjective information available allows for better and more informed decisions of the users, the quantity of data to be analyzed imposed the development of specialized Natural Language Processing (NLP) systems that automatically detect subjectivity in text and subsequently extract, classify and summarize the opinions available on different topics. Although these research fields have been highly dynamic in the past years, dealing with subjectivity in text has proven to be a complex, interdisciplinary problem that remains far from being solved. Its challenges include the need to address the issue from different perspectives and at different levels, depending on the characteristics of the textual genre, the language(s) treated and the final application for which the analysis is done.

Inspired by the objectives the organizers aimed at in the first edition of this workshop and the final outcome, the purpose of WASSA 2.011 is to create a framework for presenting and discussing the challenges related to subjectivity and sentiment analysis in NLP, from a theoretical and practical point of view. Moreover, taking into account that subjectivity-related phenomena have also been studied by other disciplines, such as Psychology, Philosophy, Economics, with WASSA 2.011 the organizers would also like to open the door to an interdisciplinary dialogue on the nature, implications and applications of the topic(s) discussed. They envisage WASSA as a forum to discuss the achievements obtained so far and to analyse the different approaches to tackle the difficulties researchers are confronted with in this research area.

Topics

Affect, emotion, feeling, subjectivity, sentiment - concept definition and related NLP tasks;
Resources for subjectivity and sentiment analysis;
Subjectivity and opinion retrieval, extraction, categorization, aggregation and summarization;
Topic and sentiment studies and applications of topic-sentiment analysis;
Mass opinion estimation based on NLP and statistical models;
Domain, topic and genre dependency of sentiment analysis;
Ambiguity issues and word sense disambiguation of subjective language;
Proposals involving the computational treatment of large amounts of data;
Pragmatic analysis of the opinion mining task;
Use of Semantic Web technologies for subjectivity and sentiment analysis;
Improvement of NLP tasks using subjectivity and/or sentiment analysis;
Adaptation of traditional tasks to the opinion scenario: opinion IR, QA, summarization;
Intrinsic and extrinsic evaluation methodologies;
Real-world applications of opinion mining systems.

The organizers also encourage participants to provide demos of their systems, thus giving them the opportunity to obtain feedback on their achievements and issues. At the same time, with the help of demos, it is aimed at enriching the discussion forum with application-specific topics for debate.

Dates

Paper due date: March 15, 2011
Notification of acceptance: April 20, 2011
Camera-ready deadline: May 06, 2011

21.12.10

FBI funny spam

Interesting spam:

FBI Headquarters in Washington, D.C.
Federal Bureau of Investigation
J. Edgar Hoover Building
935 Pennsylvania Avenue, NW
Washington, D.C. 20535-0001
Tel: 203-413-1789
<hidden link>

Attn: Beneficiary,

VIEW THE ATTACHMENT FOR YOUR CONFIRMATION

It is not surprising, I receive invitations from the FBI everyday; moreover, I open the attachments and I have switched off the antivirus, because I DO TRUST XD

15.12.10

Donde dice... El Boletin de la Fundación del Español Urgente

Otra navegación casual, en esta ocasión para buscar el origen de la expresión "A buenas horas, mangas verdes", me ha llevado a descubrir tanto la Fundéu BBVA, (Fundación del Español Urgente), como su boletín períódico llamado "Donde dice...". La Fundación del Español Urgente responde a la necesidad que indica Víctor García de la Concha, Director de la Real Academia Española y presidente de la misma:

"Desde su propio nombre proclama la Fundación del Español Urgente la voluntad de atender a lo inmediato, a la actualidad palpitante. Nació, en efecto, para dar respuesta urgente a las dudas lingüísticas que de continuo nos asaltan. En concreto, a las que sobrevienen al periodista que está redactando una noticia: ¿cómo transliterar el nombre de esa rara tribu tibetana que acaba de ser noticia?; ¿necesita talibán en su versión española una marca de pluralidad que ya lleva en su forma árabe?, ¿talibán, pues, o talibanes?" -- Víctor García de la Concha, pág. 1, "Donde dice..." número 12 julio-agosto-septiembre 2008.

Uno de los primeros menesteres de los que se ha ocupado esta fundación es la puesta en marcha de un seminario internacional dedicado a "El lenguaje de los jóvenes", evento ampliamente glosado en el número anteriormente citado del boletín "Donde dice...". Aconsejo encarecidamente leer dicho número, ya que casi todas las aportaciones que incluye son de gran valor.

A mi me ha llamado particularmente la atención el artículo "El rostro bárbaro del mañana", del escritor José Ángel Mañas, y que es la transcripción de la lección inaugural pronunciada por el mismo en el seminario. Se trata en mi opinión de un discurso brillante por su claridad y por su amor a un lenguaje siempre cambiante, que en manos de los jóvenes parece haberse sumido en la barbarie -- cuando son ellos los que hacen el lenguaje del mañana, aportando riqueza y belleza y no siempre empobreciendo la comunicación. Como Mañas clama:

"Dejemos que entren los anglicismos y naturalmente se verán los que arraigan, porque tienen su utilidad, porque rellenan una laguna conceptual y enriquecen el idioma, o porque nos gustan, y los que simplemente resultan modismos pasajeros, pues los unos permanecerán y los otros desaparecerán igual de naturalmente que llegaron. (...) Aceptemos todo lo que conlleva riqueza, nuevos matices, polisemia incrementada; pero luchemos contra todo lo que suponga pobreza o imprecisión lógica." -- José Ángel Mañas, pág. 9, "Donde dice..." número 12 julio-agosto-septiembre 2008.

En resumen, dejemos que el ecosistema lingüístico se autoregule, pero con el ojo vigilante de lo que da valor y no lo resta. No podría estar más de acuerdo.

8.12.10

Ultimas contribuciones a Novática

Una lista de mis últimas contribuciones a Novática:

José María Gómez Hidalgo. Experiencias de investigación en la universidad y en la empresa. Novática, Tevista de la Asociación de Técnicos de Informática, No. 206, julio-agosto 2010, año XXXVI.

Con Manuel Maña, las Referencias Autorizadas, sección de Acceso y Recuperación de Información:

Referencias Autorizadas del Número 205 - El paquete Mallet, y los sistemas de recomendación musical.
Referencias Autorizadas del Número 204 - Facebook Open Graph, Web People Search Evaluation, y el libro Estimating the Query Difficulty for Information Retrieval.
Referencias Autorizadas del Número 203 - El artículo "From Frequency to Meaning: Vector Space Models of Semantics", Yahoo! Research Learning to Rank Challenge, el sistema Snappy Words.

CAPTCHA advertisers

When a technology is mature, somebody finds out how to exploit it to increase the biggest business in the Web, which is advertising. Since the first times of CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart), first used by Altavista for protecting the "Add URL" service and avoiding Search Engine Spam, the techonology has evolved to commercial services that promise to convert your Web form protection into revenues.

All these services are based on a simple idea: substitute the text of the CAPTCHA by a commercial message (your slogan, etc.) in a richer layout (more "advertising"). The attention paid to solve the challenge is re-directed to the message, the advertiser gets a click (and possibly more sales), and the form publisher a part in the business.

Some of the available services include:

Still not widespread, we do not know how long will keep these services unbroken -- it is amatter of time they get the attention of hackers and spammers; or researchers :-)