Nihil Obstat: diciembre 2009

22.12.09

El día y la noche (01): países emisores de spam

Cisco 2009 Annual Security Report: Brasil desbanca a Estados Unidos como el principal emisor de spam (09/12/2009, Silicon News)

McAfee: Estados unidos, primero en Spam (14/12/2009, Revista de Internet)

21.12.09

CFP: Fourteenth European Conference on Digital Libraries (ECDL 2010)

Fourteenth European Conference on Digital Libraries (ECDL 2010)
September 6-10, 2010
Glasgow, UK

The European Conference on Digital Libraries (ECDL) is the leading European scientific forum on digital libraries and associated technical, practical, and social issues, bringing together researchers, developers, content providers and users in the field. ECDL 2010, the 14th conference in this series, will be organised by the University of Glasgow. The proceedings will be published as a volume of Springer's Lecture Notes on Computer Science (LNCS) series.

Topics of interest include, but are not limited to:

Digital Libraries and Mobility
Digital Library Architectures
Digital Library Infrastructure
Digital Preservation and Curation
Information Mining in Digital Libraries
Information Retrieval in Digital Libraries
Interoperability of Digital Library Systems and Services
Knowledge Organisation Systems
Metadata Standards and Protocols in Digital Library Systems
Multilinguality in Digital Libraries
Multimedia Digital Libraries
Personal Information Management and Personal Digital Libraries
Personalisation in Digital Library Systems and Settings
Policies for Digital Library systems
Social Networking, Web 2.0 and Collaborative Interfaces in Digital Libraries
User Interfaces for Digital Libraries
User Studies for and Evaluation of Digital Library Systems and Applications
Visualisation in Digital Libraries

Important dates

Abstract submission: February 26, 2010
Full paper submission: March 1, 2010
Notification of acceptance: May 3, 2010
Submission of final version: May 24, 2010

19.12.09

La editorial Software Press Sp. ha puesto como número para libre descarga el de enero de 2010 de Linux+. A partir de ahora, la
revista será online. En este número sale el primero de una serie de artículos de programación con Inteligencia Artificial que estoy
preparando. En este se introducen los conceptos fundamentales del Aprendizaje Automático con ejemplos de la biblioteca WEKA.

La referencia es:

Gómez Hidalgo, J.M. Programando con inteligencia (artificial). Linux+ (ISSN: 1732-7121), Número 62, Enero, 2010.

Léelo y comenta :-)

NAACL-HLT 2010 Workshop on Active Learning for Natural Language Processing (ALNLP)

NAACL-HLT 2010 Workshop on Active Learning for Natural Language Processing (ALNLP)
June 5 or 6, 2010, Los Angeles, CA

Labeled training data is required to achieve state-of-the-art performance for many machine learning solutions to NLP tasks. While traditional supervised methods rely exclusively on existing labeled data to induce a model, active learning allows the learner to select unlabeled data for labeling in an effort to reduce annotation costs without sacrificing performance. Thus, active learning appears promising for NLP applications where unlabeled data is readily available (e.g., web pages, audio recordings, minority language data), but obtaining labels is cost-prohibitive.

Ample recent work has demonstrated the effectiveness of active learning over a diverse range of applications. Despite these findings, active learning has not yet been widely adopted for many ongoing large-scale corpora annotation efforts -- resulting in a dearth of real-world case studies and copious research questions. Machine learning literature has primarily focused on active learning in the context of classification, devoting less attention to issues specific to NLP including annotation user studies, incorporation of semantic information, and more complex prediction tasks (e.g. parsing, machine translation).

TOPICS

The aim of this workshop is to foster innovation and discussion that advances our understanding in these and other practical issues for active learning in NLP. Topics of particular interest include:

Alternative query types: labeling features rather than instances, mixed-resolution queries for structured instances, etc.
Creative ways for obtaining data via active learning (e.g., online games, Mechanical Turk)
Managing multiple, possibly non-expert annotators (e.g., "crowdsourcing" environments)
Reusability: using data acquired with one active learner to train other model classes
Domain adaptation and active learning
Multi‐task active learning
Criteria for stopping and monitoring active learning progress
Active learning in coordination with semi-supervised or unsupervised learning approaches
Interactive active learning interfaces and other HCI issues
Parallelization of active learning and its computational challenges
Software engineering considerations for active learning and NLP
Theoretical analysis of active learning

We also welcome case-study papers describing the application of active learning in real-world annotation projects and lessons learned thereby. Additionally, we would consider papers with insights applicable to NLP from other machine learning communities (e.g., computer vision, bioinformatics, and data mining), where annotation costs are also high.

IMPORTANT DATES

March 1, 2010: Paper Submission Deadline
March 30, 2010: Notification of acceptance
June 5 or 6, 2010: Workshop held in conjunction with NAACL-HLT

18.12.09

Funny CAPTCHAS

Some time ago I re-posted a joke at XKCD on CAPTCHAs. Now it is the time for more jokes and/or funny CAPTCHAs:

CFP: Fourth Workshop on Information Credibility on the Web (WICOW 2010)

Fourth Workshop on Information Credibility on the Web (WICOW 2010)
In conjunction with the 19th World Wide Web Conference 2010
April 26-30 (one day) 2010, Raleigh, NC, USA

WORKSHOP DESCRIPTION

As computers and computer networks become more common, a huge amount of information such as that found in Web documents has been accumulated and circulated. Such information gives many people a framework for organizing their private and professional lives. However, in general, the quality control of Web content is insufficient due to low publishing barriers. In result there is a lot of mistaken or unreliable information on the Web that can have detrimental effects on users. This situation calls for technology that would facilitate judging the trustworthiness of content and the quality and accuracy of the information that users encounter on the Web. Such technology should be able to handle a wide range of tasks: extracting credible information related to a given topic, organizing this information, detecting its provenance, clarifying background, facts, and other related opinions and the distribution of them, and so on. The problem of Web information reliability and Web data quality has become also apparent in the view of the recent emergence of many popular Web 2.0 applications.

TOPICS

The aim of this workshop is to provide a forum for discussion on issues related to information credibility criteria and the process of its evaluation. We invite submissions on any aspect of information credibility on the Web. Topics include, but are not limited to:

Information credibility evaluation and its applications
Web content analysis for credibility evaluation
Author's intent detection
Content quality and credibility in Web archiving
Credibility of Web search results
Search models for trustworthy content on the Web
Conflicting opinion detection
News credibility
Multimedia content credibility
Credibility evaluation of user-generated content (ex. Wikipedia, Q&A)
Information credibility evaluation in social networks
Analysis of information dissemination on the Web
Spatial and temporal aspects in information credibility on the Web
Information credibility theory and fundamentals
Estimation of information age, provenance and validity
Estimation of author's and publisher's reputation
Sociological and psychological aspects of information credibility estimation
Users study for information credibility evaluation
Persuasive technologies
Information credibility in online advertising
Web spam detection
Data consistency and provenance
Processing uncertain data and information
Modeling trust on the Web
Credible interaction on the Web
Credibility and trust in e-commerce

IMPORTANT DATES

January 25, 2010 - Paper submission deadline
February 12, 2010 - Notification of acceptance
February 19, 2010 - Camera ready deadline
April 26-30 (one day), 2010 - Workshop

15.12.09

Linux+: Protegiendo formularios Web con reCAPTCHA

Un nuevo artículo en Linux+, esta vez centrado en los CAPTCHAs con ejemplos prácticos de reCAPTCHA. La entradilla:

Protegiendo formularios Web con reCAPTCHA

La esencia de la Web, y más aún de la Web 2.0, es la interactividad. Se acabaron los tiempos en los que sólo unos cuantos privilegiados tenían la oportunidad de verter contenidos en la Web. Hoy en día, cientos de millones de personas alimentan la red con sus comentarios, fotografías y vídeos, usan redes sociales como Facebook o Tuenti a diario para permanecer en contacto con sus amistades y compañeros, o para conocer a otras personas, entretenerse y divertirse. Pero las oportunidades para verter contenidos lo son también para que spammers y crackers abusen de ellas, difundiendo miles de millones de correos basura, infectando millones de equipos, y practicando el fraude a gran escala.

La referencia completa:

Gómez Hidalgo, J.M. Protegiendo formularios Web con reCAPTCHA. Linux+ (ISSN: 1732-7121), Número 61, Diciembre, 2009.

Finalmente, como en Linux+ se han puesto en en la Web los primeros seis números del año 2009 de manera gratuita, me he permitido la licencia de extraer un part de artículos míos de este año, enla zados en las referencias de debajo:

Gómez Hidalgo, J.M. Privacidad en Flickr. Linux+ (ISSN: 1732-7121), Número 53, Abril, 2009.

Gómez Hidalgo, J.M. Puertas Sánz, E. Filtrado de pornografía usando análisis de imagen. Linux+ (ISSN: 1732-7121), Número 51, Febrero, 2009.

7.12.09

IEEE Multimedia Magazine Special Issue on "Knowledge Discovery Over Community-Contributed Multimedia Data: Opportunities and Challenges"

IEEE Multimedia Magazine Special Issue on "Knowledge Discovery Over Community-Contributed Multimedia Data: Opportunities and Challenges"

The explosive growth of digital photos and videos; the prevalence of capture devices; and the advent of media-sharing services, such as Flickr and YouTube, have drastically increased the volume of community-contributed multimedia resources. Billions of photos, videos, and music shared on Web sites profoundly impact human society and pose a new challenge for designing efficient indexing, search, mining, and visualization methods for manipulating such largescale media. Besides plain visual or audio signals, social media are augmented with rich context-such as user-provided tags, comments, geolocations, time, and device metadata-benefiting a wide variety of potential applications such as annotation, search, recommendation, advertising, and visualization.

The goal of this special issue is to present a concise reference of state-ofthe-art efforts in knowledge discovery over large-scale social media, and in particular the entailed opportunities and challenges given the nascent status of this arena. Specifically, the special issue is intended to present both survey and original research articles (in a tutorial manner readable by nonspecialists) on emerging theoretical and practical deployments as well as illustrative applications for annotation, indexing and search, mining, recommendation, advertising, and visualization over social media. It also focuses on the rich context information and its mobile usage for social media. The Special Issue organizers believe it will offer a timely collection of information to benefit the researchers and practitioners working in the broad multimedia community.

Topics of interest include, but are not limited to

social media annotation and tagging,
large-scale social media search,
event detection and summarization in social media,
visualization of social media (for example, event summarization and 3D scene navigation),
personalized media recommendation,
contextual media advertising,
modeling and mining context in social media,
context-aware mobile multimedia applications,
social media as training data for classification/detection learning,
benchmark data for large-scale social media applications,
distributed/parallel algorithms and platforms for large-scale social media computation, and
novel and challenging applications of social media.

Important Dates:

15 December 2009: Full submission due.
1 March 2010: Notification of acceptance.
5 May 2010: Revisions due.
15 June 2010: Final versions due.

4.12.09

Learning over Social Network Datasets & Opinion Mining post with Rapid Miner

A recent URL received via FB via TW from Jose Carlos Cortizo has driven my attention into Social Network Datasets. It begins with the Social Tagging list by Markus Strohmaier at his blog Intentialicious. But the comments include several other lists of datasets, most prominently:

Folksonomy data sets listed by the Tagora project.
Social Tagging datasets @ UNED NLP Group (already included in Markus compilation).
An incredibly long list of datasets (most not involving Social Tagging, nor even Social Networks, but interesting anyway...) collected by Peter Skomoroch @ Data Wrangling.

What in fact leads me to make the question: do opinion mining datasets apply? OK, it is not Social Tagging (at least, not thematic tags). So I share the two only Opinion Mining/Sentiment Analysis datasets in Spanish I am aware of:

Movie review corpus in Spanish by Fermín L. Cruz Mata
The SFU Spanish Review Corpus, by Maite Taboada & Julian Brooke

And after a quick search on Opinion Mining datasets, I have found the blog by Bruno Ohana with a very interesting post which presents a short tutorial about Opinion Mining with Rapid Miner. While in fact it is not really Opinion Mining (as it does not use any sentiment features, it just approaches the task as any classification task: bag of words, etc.), I see it very interesting because it is a great tutorial on using this suite to make text classification!

2.12.09

Amazon Payments Announces Parental Controls for Online Shopping

A way to control how your children are using Amazon, or even enabling them to buy with control and supervision. A message to me:

We're excited to introduce to you, and your whole family, a new way to checkout online. It's Amazon PayPhrase - the easy-to-remember shortcut to your Amazon.com shipping and payment settings. With their own PayPhrase, teens and college students can shop online within limits you set, and you don't have to share your credit card number or account credentials.

A PayPhrase is a phrase you create, such as "slam dunk," "totally awesome," or "Jake's Allowance." You specify the shipping address and credit card for the PayPhrase, in addition to monthly spending limits and approvals.

Setup a PayPhrase for your teen to buy clothes or for your student to buy text books. With their own PayPhrase and PIN, your teen or student can shop online within spending limits you set or you can approve each purchase.

We invite you to learn more at www.amazon.com/allowance.

I believe it is a good point...

CFP: Tenth ACM Symposium on Document Engineering (DocEng 2010)

Tenth ACM Symposium on Document Engineering
Manchester, UK, September 21-24, 2010

The ACM Symposium on Document Engineering provides an annual international forum for presentations and discussions on principles, tools and processes that improve our ability to create, manage and maintain documents. Proceedings are available through the ACM Digital Library.

TOPICS & TECHNOLOGIES

Document Representations ñ Standards (ODF, PDF), models, type representation, metadata (MPEG-7, RDF), Style Sheets (CSS, XSL), Markup Languages (SGML, XML), Multimedia (MPEG-4, SMIL, MHEG, NCL), Multilingual representations, temporal aspects
Document Manipulation - Document transformation (XSLT, XQuery), Adaptive Documents, Document presentation (typography, formatting, layout)
Document Systems - Workflow, cooperation, web services, social networking, engineering life cycle
Document System Components - Security, APIs (SAX, DOM), synchronization, System performance
Document collections - Databases: Storage, Indexing, Retrieval, Content Management Systems, E-books
Document Linking - Techniques (XLink, XPath, Xpointer), Blogs, Wikis, Integration with other digital artefacts
Document generation - authoring tools and systems, variable data printing, automatically generated documents
Document Analysis - structure, layout and content analysis, categorization, classification, character recognition

IMPORTANT DATES

Full papers & working sessions

Abstracts due April 2, 2010
Papers due April 16, 2010
Acceptance notice by May 14, 2010

Short papers, posters & demos

Abstracts due May 21, 2010
Papers due May 28, 2010
Acceptance notice by June 18, 2010

All papers

Revised versions due July 2, 2010