31.3.09

First International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion Measurement

First International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion Measurement
November 6, 2009, Hong Kong

This workshop seeks to bring together researchers in both computer science and social sciences who are interested in developing and using topic-sentiment analysis methods to measure mass opinion, and to foster communications between the research community and industry practitioners as well.

The increasing amount of user-generated content on the Internet and social media and the digitization of large number of government and institutional documents provide new opinion-rich data sources for researchers to examine individual and group perceptions on products, organizations, and social issues at a large scale, and thus contribute to the research and practice in the areas of political science, social policy, communications, and business intelligence.

On the other hand, researchers are tackling the problem of processing large amount of opinion-rich data using various approaches. The increasing number of relevant publications in top data mining, information retrieval and natural language processing conferences (KDD, SIGIR, ACL, WWW, etc.) has witnessed the growing interest in automatic opinion analysis. Both TREC and TAC (Text Analysis Conference) have set up individual tracks for opinion retrieval and analysis tasks.

In recent years topic detection and tracking techniques have been well developed to identify the issues discussed in a large text collection. Sentiment analysis is catching up to detect the polarity of opinions expressed in texts. However, many times real-world applications have to take into consideration of both topics and sentiments for precise opinion measurement. Topic and sentiment alignment is crucial for opinion retrieval, extraction, categorization, and aggregation on various issues. Topics and sentiments could also have sophisticated interactions. For example, the choice of topics and the attention distribution among topics might bear hidden opinions as well.

How do we build synergistic topic and sentiment models for text documents? How do we tackle the domain-dependency problem of sentiment analysis? How do we identify users' needs and integrate them into the design of opinion analysis systems? What are the successful applications of topic-sentiment analysis for mass opinion measurement? What lessons have the pioneers learned? How do we evaluate the automatic mass opinion measuring tools with regard to the reliability and validity? This workshop solicits submissions to address these problems and more.

We hope this workshop can advance research in topic-sentiment analysis, make connections between research community and industry practitioners and encourage development of high performance tools and systems that can work at the web scale for real world applications.

Suggested topics include, but are not limited to:

  • Opinion retrieval, extraction, categorization, and aggregation
  • Topic and sentiment alignment in opinion analysis
  • Applications of topic-sentiment analysis, e.g. corporate reputation measurement, political orientation categorization, customer preference study, public opinion study
  • Issues in using topic-sentiment analysis as a new research method for mass opinion estimation, such as reliability, validity, sample bias, etc.
  • Sentiment identification and filtering at various text granularity
  • Domain-dependency of sentiment analyzers
  • Evaluation methodologies
  • Performance issues, scalability and efficiency
  • Web-based system demonstration
  • Novel algorithms, tools and systems
  • Construction of benchmark data sets

Important dates

  • Individual workshop papers due: July 20, 2009
  • Notification of Acceptance: August 10, 2009
  • Camera ready: August 15, 2009
  • Early registration deadline: August 15, 2009
  • Workshop: November 6, 2009

Seen through the SIG-IRList.

Recientes sobre botnets

Aunque el tema de las botnets no es nuevo, y despúes del Storm Worm ya se puede hablar de que ha habido un antes y un después, parece que están de moda, especialmente en lo que a especialización se refiere. Recientes vía Hispasec (una al día):

  • GhostNet, red espía al servicio de... La universidad de Toronto ha publicado un interesante documento de investigación sobre "GhostNet", una botnet concebida para un objetivo muy concreto: el espionaje. No estamos ante una botnet con un crecimiento exponencial y centrado en la adquisición de más nodos para aumentar su poder, en este caso los objetivos eran tratados de forma muy personalizada y atendiendo a su importancia. Mientras una botnet puede oscilar entre los 30.000 y más de 100.000 bots, el número de máquinas infectadas en "GhostNet" es de solo 1.295, una población casi insignificante. (29/03/2009).
  • Routers, modems y botnets Desde hace unas semanas DroneBL ha sufrido un ataque distribuido de denegación de servicio procedente de una botnet llamada 'psyb0t'. Nada nuevo si tenemos en cuenta que DroneBL ofrece un servicio gratuito de publicación de listas negras de IP en tiempo real, lo cual no es precisamente una manera de ganarse admiradores entre las filas de creadores de malware, spammers, etc. Lo interesante del asunto se lo encontraron cuando recabaron información sobre su atacante. (25/03/2009).
  • ¿Aprende Conficker más rápido que los internautas? Enésima versión de Conficker (en este caso llamada B++ por algunas casas) que salta a los sistemas (y a los medios). Se trata del azote vírico del año, en un paralelismo sorprendente en muchos aspectos con lo que se dio en llamar el "Storm worm" y que se convirtió en la pesadilla de todo 2007 y parte de 2008. Los niveles de infección de Conficker siguen al alza, quedando ya lejos aquella primera versión que solo aprovechaba una vulnerabilidad de Microsoft. ¿Acaso no hemos aprendido nada? ¿Puede presentarse otro malware de manual y evolucionar exactamente de la misma forma que uno que ya sufrimos hace dos años? (01/03/2009).

30.3.09

Google Ads: From context to behaviour, through privacy

I have read through Larry Dignan's blog at ZDNet that Google is claiming they will be making online advertising more behavioral. This means that Google will be taking as much as navigation info they can to make their ads more relevant. Is this bad or good?

Well, as many other things about technology (and life), there is no black or white (IMHO). The main pro is that they can transform boring ads (ads that fit the context of the page they are in, or in other words, ads that fit your short time interests) into useful ads (ads that take into account your long term interests, less intrusive, more appealing). But at what cost?

The cost in user modeling, ad targeting, etc. is always the same: user privacy. Google has follow a re-worked, more clever and considerably more useful and respectful approach that old Internet portals had. Original portals tried to capture all user traffic by offering a lot of services and information inside the portal itself. Old Yahoo!, Infoseek, Altavista and other (some of them dead) followed that approach, and failed with the fresh air of Google's "I just want you to begin your Web experience here". Google has instead been offering more and more beta services to capture more and more of our net traffic, with two wins:

  • As services are betas, they do not have to give support. Use them at your own risk.
  • The more services you use (starting at Gmail and then more and more), the more they know about your interests.

The technology is the same that online marketing corporations have been using for years, that is, cookies in your browser, served by affiliated sites:

These ads will associate categories of interest - say sports, gardening, cars, pets - with your browser, based on the types of sites you visit and the pages you view.

Cookies are the most widespread privacy damaging technology (apart from rootkits, etc.).

OK, my point is not to say "Google is being evil again". At least, they tell you they will be automatically processing the information about you, obviously you can just forget that services, and you can also opt-out.

Just make your balance.

ECML PKDD Discovery Challenge on Social Networking Tag Recommendation

ECML PKDD Discovery Challenge Call for Participation
Bled, Slovenia, September 7-11 2009

This year's discovery challenge deals with three tasks in the area of tag recommendations for social bookmarking services. The first task covers content-based and the second task graph-based tag recommendations. An additional third task allows participants to deliver online recommendations to a running social bookmarking service. The dataset the challenge is based on is a snapshot of the social bookmark and publication sharing system Bibsonomy <http://www.bibsonom y.org/>. More details about the tasks can be found at the challenge website.

Important dates

  • March 25, 2009 Tasks and datasets available online.
  • July 6th, 2009 Test dataset will be released (by midnight CEST).
  • July 8th, 2009 Result submission deadline (by midnight CEST).
  • July 10th, 2009 Workshop paper submission deadline.
  • July 14th, 2009 Notification of winners, publication of results on webpage, notification of paper acceptance.
  • August 5th, 2009 Workshop proceedings (camera-ready) deadline.
  • September 7/11th, 2009 ECML PKDD Workshop

26.3.09

Selected SIGIR 2009 Workshops

Among the (all) interesting workshops that the 32nd Annual ACM SIGIR Conference (the best event about research on Information Retrieval), I would like to select some that are of my very taste:

  • Information Retrieval and Advertising: While computational advertising is still a relatively young research field, its significance is enormous as it provides the primary business model behind most of today's Web experience. Online advertising systems employ many IR techniques alongside approaches developed in statistical modeling and machine learning, large-scale data processing, optimization, microeconomics, and human-computer interaction. The purpose of this workshop is to bring together researchers from the different areas relevant to online advertising, strengthen collaborations between industry and academia, and provide a forum for discussion and presentation of late-breaking research. Date for papers: May 19, 2009
  • Search in Social Media: Social applications are the fastest growing segment of the web. While there has been progress on searching particular kinds of social media, such as blogs, search in others (facebook/myspace/flickr) are not as well understood. The purpose of this workshop is to focus the attention of the research community on this emerging topic, and to bring together information retrieval and social media researchers to consider the following questions: How should we search in social media? What are the needs of users, and models of those needs, specific to social media search? What models make the most sense? How does search interact with existing uses of social media? What works and what doesn't? Date for papers: June 8, 2009
  • Understanding the user - Logging and interpreting user interactions in information search and retrieval: Modern information search systems can benefit greatly from using additional information about the user and the user's behavior. Feedback data based on direct interaction (e.g., clicks, scrolling, etc.) as well as on general user profiles/preferences has been proven valuable for personalizing the retrieval process. New technology has made it inexpensive and easy to collect more feedback data and more different types of data (e.g., gaze, emotional, or biometric data). The workshop focuses on discussing and identifying most promising research directions with respect to logging, interpreting, integrating, and using feedback data. Ultimately, it will be aimed at arranging a commonly shared collection of user interaction logging tools for various purposes and based on a variety of feedback data sources. The workshop brings together researchers from IR as well as from human-computer interaction. Date for papers: May 18, 2009

I see the first and the last as very complementary. The second one is a must, specially with the great Marti Hearst in the chairs.

Noticias sobre robótica (1/X)

Noticias relacionadas con Robótica y Realidad Virtual recibidas vía NotiWeb, el boletín de Madri+d:

  • Un robot capaz de obedecer a gestos humanos. Imagínese un mundo en el que existieran mayordomos robóticos que estarían 24 horas a su disposición, responderían sin rechistar a cada orden que recibieran y podrían satisfacer de inmediato cada uno de sus caprichos. FUENTE | El Mundo Digital.
  • Un ojo biónico permite a un invidente ver luz después de 30 años. Un invidente británico puede percibir la luz después de 30 años de total oscuridad gracias a un ojo biónico que le ha sido implantado en un hospital británico, informó la cadena pública BBC. FUENTE | La Razón digit@l.
  • Un casco diseñado para viajes virtuales hiperrealistas. ¿Cómo se vivía en la Grecia antigua? ¿Qué sonidos se escuchaban por sus calles? ¿Qué olores emanaban? ¿A qué sabía su comida? Hasta ahora, la realidad virtual únicamente permitía estimular algunos sentidos por separado, por lo general la vista y el oído, éste era su límite, pero esto puede cambiar en un futuro muy cercano gracias a un proyecto que están desarrollando distintas universidades británicas y con el que se podría llegar a estimular los cinco sentidos a la vez. FUENTE | El Mundo Digital.

Noticias sobre Internet (1/X)

Noticias relacionadas con Internet recibidas vía NotiWeb, el boletín de Madri+d:

25.3.09

Alias-i LingPipe list of NLP Tools

The people at Alias-i, who provide the Natural Language Processing package LingPipe (mostly used now for biomedicine, providing a suite of NLP tools - Part Of Speech Tagging, Named Entity Recognition, etc.), have kindly collected a list of NLP packages, including a big number of opensource ones.

I love this comment they make:

Search, Speech, Translation, OCR, ...? We have intentionally not listed competitors focused on things other than basic language processing tools. Companies in these businesses are more likely to be LingPipe customers than LingPipe competitors.

A good point for LingPipe people is that they provide an extensive number of tutorials on so many appealing topics that you could spend a month there testing and testing. Just do not miss the one on Sentiment Analysis (a black pearl in a box of white pearls!).

24.3.09

SEMAPRO 2009: The Third International Conference on Advances in Semantic Processing

SEMAPRO 2009: The Third International Conference on Advances in Semantic Processing
October 11-16, 2009 - Sliema, Malta

The topics suggested by the conference can be discussed in term of concepts, state of the art, research, standards, implementations, running experiments, applications, and industrial case studies. Authors are invited to submit complete unpublished papers, which are not under review in any other conference or journal in the following, but not limited to, topic areas.

  • Basics on semantics
  • Ontology fundamentals for semantic processing
  • Semantic technologies
  • Semantic Deep Web
  • Semantic reasoning
  • Semantic content searching
  • Hypertext and hypermedia semantics
  • Semantic voice-video-speech (VVS) searching
  • Semantic multimedia
  • Semantic social media
  • Semantic networking
  • Domain-oriented semantic applications
  • Economics and governance of semantics technologies
  • Semantic applications/platforms/tools

Important deadlines:

  • Submission (full paper):
    May 20, 2009
  • Notification: June 25, 2009
  • Registration: July 12, 2009
  • Camera ready: July 15, 2009

GENIA treebank corpus version 1.0

The GENIA treebank corpus version 1.0 is available from now at the GENIA project homepage: http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/

The corpus contains 1,999 PubMed abstracts with part-of-speech and syntactic tree annotation.

Seen through the BioNLP list.

Workshop on Multimedia Information Retrieval

La semana que viene, concretamente el 30 y 31 de marzo, dentro de la International Week of Technological Innovation organizada en la Universidad Europea de Madrid, tendrá lugar un workshop sobre Multimedia Information Retrieval a cargo de Andreas Nürnberger.

La asistencia es gratuita y se desarrollará en inglés en el campus de la UEM en Villaviciosa de Odón.

Workshop on Multimedia Information Retrieval
Prof. Andreas Nürnberger (Otto-von-Guericke-Universität Magdeburg)

Lunes 30 de marzo 15:30 a 18:30, Sala de Grados del Edificio A
Martes 31 marzo 10:30 a 13:00, Sala de Grados del Edifcio A

Universidad Europea de Madrid
Campus de Villaviciosa de Odón
http://www.uem.es/es/como-llegar/campus-villaviciosa-de-odon

Contacto: Diego Gachet Páez

23.3.09

SpamCop Real Time Spam Statistics

I have found several antispam vendors that feature realtime statistics about the percentage of spam received, the size, number or location of botnets, etc. The big numbers are important, interesting and useful, but sometimes details are also very interesting.

SpamCop, affiliated to Ironport, provides real time statistics about the sources of spam and the URLs that are marketed by it. It is very funny to reload the page and see the last 10-30 minutes spam sources (addresses and IPs), and advertised Web sites. In the second case, you can also track back the IP address of the web site.

Worth a while!

Seminario gratuito Seguridad en Redes Sociales en la UPM

Seminario gratuito Seguridad en Redes Sociales en la UPM

Fecha: miércoles 6 de mayo de 2009
Hora: de 09:00 a 14:00 horas
Inscripción: gratuita pero requiere una inscripción previa
Lugar: Sala de Grados 3004 de la EUITT-UPM, en Madrid
Cómo llegar: http://www.euitt.upm.es/escuela/como_llegar

El 6 de mayo de 2009 se celebrará en la Escuela Universitaria de Ingeniería Técnica de Telecomunicación de la Universidad Politécnica de Madrid, el seminario "Seguridad en redes sociales: ¿están nuestros datos protegidos?"

La asistencia es gratuita, si bien se requiere y recomienda una inscripción previa.

Organizado por la Cátedra UPM Applus+ de Seguridad y Desarrollo de la Sociedad de la Información, de 09:00 a 14:00 horas se analizarán y debatirán temas relacionados con la seguridad de nuestros datos de carácter personal en este tipo de redes sociales, entornos virtuales que han adquirido una notable notoriedad pública en estos últimos años, convirtiéndose en un verdadero fenómeno de masas y de máxima actualidad.

Siguiendo el reciente informe "Estudio sobre la privacidad de los datos personales y la seguridad de la información en las redes sociales online", elaborado por el Instituto Nacional de Tecnologías de la Comunicación INTECO y la Agencia Española de Protección de Datos AEPD, con fecha de febrero de 2009, podemos destacar de dicho documento:

"La notoriedad de estos espacios sociales online no queda exenta de riesgos o posibles ataques malintencionados. Es una preocupación de las organizaciones nacionales, europeas e internacionales con competencias en las materias afectadas por el uso de estas redes, que han impulsado la elaboración de normas y recomendaciones dirigidas a garantizar el acceso seguro de los usuarios -con especial atención a colectivos de menores e incapaces- a estas nuevas posibilidades online."

Por tal motivo, este seminario está dirigido a público en general y de forma muy especial a jóvenes, adolescentes y padres de familia.

El seminario contará con destacados invitados:

  • D. Artemi Rallo Lombarte, Director de la Agencia Española de Protección de Datos AEPD
  • D. Arturo Canalda González, Defensor del Menor de la Comunidad de Madrid
  • Dña. María Teresa González Aguado, Defensora Universitaria de la UPM
  • D. Antonio Troncoso Reigada, Director de la Agencia de Protección de Datos de la Comunidad de Madrid APDCM
  • D. Emilio Aced Félez, Subdirector de Registro de Ficheros y Consultoría de la APDCM
  • Dña. Gemma Déler Castro, Directora IT & Telecom BU de Applus+
  • D. Ícaro Moyano Díaz, Director de Comunicación de Tuenti
  • D. Pablo Pérez San-José, Gerente del Observatorio de la Seguridad de la Información, Instituto Nacional de Tecnologías de la Comunicación INTECO

Las conferencias planificadas son las siguientes:

  • "Los menores y las nuevas tecnologías", de D. Arturo Canalda.
  • "Las redes sociales como nuevo entorno de confianza", de D. Ícaro Moyano.
  • "Redes sociales: nueva frontera para la privacidad de los digital babies", de D. Emilio Aced.
  • "Diagnóstico sobre la seguridad de la información y privacidad en las redes sociales online", de D. Pablo Pérez.

Además, se contará con un Coloquio de una hora y media de duración, donde los asistentes podrán realizar sus preguntas a un grupo de expertos así como debatir sobre esta temática.

PREINSCRIPCIONES: por limitación del aforo, deberá realizarse una preinscripción, recomendándose hacerlo en la página Web de la Cátedra UPM Applus+, siguiendo las instrucciones que aparecen en dicho servidor.

El seminario se transmitirá por videostreaming a través de los servicios del Gabinete de Tele-educación de la UPM, GATE, desde una url que se informará en breve en el sitio Web de la Cátedra. En este caso, no hace falta inscribirse pues su visualización es libre.

Puede descargar el tríptico en formato pdf con el programa del seminario desde la página Web de la Cátedra UPM Applus+.

Para información adicional, por favor dirigirse a Dña. Beatriz Miguel Gutiérrez a la dirección de correo bmiguelATeuitt.upm.es o bien al teléfono 91 336 7842, en este último caso con atención solamente de 09:00 a 11:30 horas.

Se entregará certificado de asistencia con 3 créditos CPE a quien lo solicite.

17.3.09

BBVA Open Talent: Wipley: Social Gaming Platform

En el concurso de emprendedores BBVA Open Talent, está seleccionado el proyecto Wipley. Aparte de tener el gusto de conocer a sus autores, me parece una propuesta muy interesante y merecedora de no solo mi voto, sino del widget para votarlo. Vota desde su página en el BBVA Open Talent: Wipley: Social Gaming Platform, o desde el widget: