Cisco 2009 Annual Security Report: Brasil desbanca a Estados Unidos como el principal emisor de spam (09/12/2009, Silicon News)
McAfee: Estados unidos, primero en Spam (14/12/2009, Revista de Internet)
Blog de José María Gómez Hidalgo
Mis reflexiones sobre tecnología e Internet, seguridad e inteligencia artificial.
My opinions about technology, Internet, security and Artificial Intelligence
Cisco 2009 Annual Security Report: Brasil desbanca a Estados Unidos como el principal emisor de spam (09/12/2009, Silicon News)
McAfee: Estados unidos, primero en Spam (14/12/2009, Revista de Internet)
Fourteenth European Conference on Digital Libraries (ECDL 2010)
September 6-10, 2010
Glasgow, UK
The European Conference on Digital Libraries (ECDL) is the leading European scientific forum on digital libraries and associated technical, practical, and social issues, bringing together researchers, developers, content providers and users in the field. ECDL 2010, the 14th conference in this series, will be organised by the University of Glasgow. The proceedings will be published as a volume of Springer's Lecture Notes on Computer Science (LNCS) series.
Topics of interest include, but are not limited to:
Important dates

La editorial Software Press Sp. ha puesto como número para libre descarga el de enero de 2010 de Linux+. A partir de ahora, la
revista será online. En este número sale el primero de una serie de artículos de programación con Inteligencia Artificial que estoy
preparando. En este se introducen los conceptos fundamentales del Aprendizaje Automático con ejemplos de la biblioteca WEKA.
La referencia es:
Gómez Hidalgo, J.M. Programando con inteligencia (artificial). Linux+ (ISSN: 1732-7121), Número 62, Enero, 2010.
Léelo y comenta :-)
NAACL-HLT 2010 Workshop on Active Learning for Natural Language Processing (ALNLP)
June 5 or 6, 2010, Los Angeles, CA
Labeled training data is required to achieve state-of-the-art performance for many machine learning solutions to NLP tasks. While traditional supervised methods rely exclusively on existing labeled data to induce a model, active learning allows the learner to select unlabeled data for labeling in an effort to reduce annotation costs without sacrificing performance. Thus, active learning appears promising for NLP applications where unlabeled data is readily available (e.g., web pages, audio recordings, minority language data), but obtaining labels is cost-prohibitive.
Ample recent work has demonstrated the effectiveness of active learning over a diverse range of applications. Despite these findings, active learning has not yet been widely adopted for many ongoing large-scale corpora annotation efforts -- resulting in a dearth of real-world case studies and copious research questions. Machine learning literature has primarily focused on active learning in the context of classification, devoting less attention to issues specific to NLP including annotation user studies, incorporation of semantic information, and more complex prediction tasks (e.g. parsing, machine translation).
TOPICS
The aim of this workshop is to foster innovation and discussion that advances our understanding in these and other practical issues for active learning in NLP. Topics of particular interest include:
We also welcome case-study papers describing the application of active learning in real-world annotation projects and lessons learned thereby. Additionally, we would consider papers with insights applicable to NLP from other machine learning communities (e.g., computer vision, bioinformatics, and data mining), where annotation costs are also high.
IMPORTANT DATES
Some time ago I re-posted a joke at XKCD on CAPTCHAs. Now it is the time for more jokes and/or funny CAPTCHAs:
XD
Fourth Workshop on Information Credibility on the Web (WICOW 2010)
In conjunction with the 19th World Wide Web Conference 2010
April 26-30 (one day) 2010, Raleigh, NC, USA
WORKSHOP DESCRIPTION
As computers and computer networks become more common, a huge amount of information such as that found in Web documents has been accumulated and circulated. Such information gives many people a framework for organizing their private and professional lives. However, in general, the quality control of Web content is insufficient due to low publishing barriers. In result there is a lot of mistaken or unreliable information on the Web that can have detrimental effects on users. This situation calls for technology that would facilitate judging the trustworthiness of content and the quality and accuracy of the information that users encounter on the Web. Such technology should be able to handle a wide range of tasks: extracting credible information related to a given topic, organizing this information, detecting its provenance, clarifying background, facts, and other related opinions and the distribution of them, and so on. The problem of Web information reliability and Web data quality has become also apparent in the view of the recent emergence of many popular Web 2.0 applications.
TOPICS
The aim of this workshop is to provide a forum for discussion on issues related to information credibility criteria and the process of its evaluation. We invite submissions on any aspect of information credibility on the Web. Topics include, but are not limited to:
IMPORTANT DATES

Un nuevo artículo en Linux+, esta vez centrado en los CAPTCHAs con ejemplos prácticos de reCAPTCHA. La entradilla:
Protegiendo formularios Web con reCAPTCHA
La esencia de la Web, y más aún de la Web 2.0, es la interactividad. Se acabaron los tiempos en los que sólo unos cuantos privilegiados tenían la oportunidad de verter contenidos en la Web. Hoy en día, cientos de millones de personas alimentan la red con sus comentarios, fotografías y vídeos, usan redes sociales como Facebook o Tuenti a diario para permanecer en contacto con sus amistades y compañeros, o para conocer a otras personas, entretenerse y divertirse. Pero las oportunidades para verter contenidos lo son también para que spammers y crackers abusen de ellas, difundiendo miles de millones de correos basura, infectando millones de equipos, y practicando el fraude a gran escala.
La referencia completa:
Gómez Hidalgo, J.M. Protegiendo formularios Web con reCAPTCHA. Linux+ (ISSN: 1732-7121), Número 61, Diciembre, 2009.
Finalmente, como en Linux+ se han puesto en en la Web los primeros seis números del año 2009 de manera gratuita, me he permitido la licencia de extraer un part de artículos míos de este año, enla zados en las referencias de debajo:
Gómez Hidalgo, J.M. Privacidad en Flickr. Linux+ (ISSN: 1732-7121), Número 53, Abril, 2009.
Gómez Hidalgo, J.M. Puertas Sánz, E. Filtrado de pornografía usando análisis de imagen. Linux+ (ISSN: 1732-7121), Número 51, Febrero, 2009.
IEEE Multimedia Magazine Special Issue on "Knowledge Discovery Over Community-Contributed Multimedia Data: Opportunities and Challenges"
The explosive growth of digital photos and videos; the prevalence of capture devices; and the advent of media-sharing services, such as Flickr and YouTube, have drastically increased the volume of community-contributed multimedia resources. Billions of photos, videos, and music shared on Web sites profoundly impact human society and pose a new challenge for designing efficient indexing, search, mining, and visualization methods for manipulating such largescale media. Besides plain visual or audio signals, social media are augmented with rich context-such as user-provided tags, comments, geolocations, time, and device metadata-benefiting a wide variety of potential applications such as annotation, search, recommendation, advertising, and visualization.
The goal of this special issue is to present a concise reference of state-ofthe-art efforts in knowledge discovery over large-scale social media, and in particular the entailed opportunities and challenges given the nascent status of this arena. Specifically, the special issue is intended to present both survey and original research articles (in a tutorial manner readable by nonspecialists) on emerging theoretical and practical deployments as well as illustrative applications for annotation, indexing and search, mining, recommendation, advertising, and visualization over social media. It also focuses on the rich context information and its mobile usage for social media. The Special Issue organizers believe it will offer a timely collection of information to benefit the researchers and practitioners working in the broad multimedia community.
Topics of interest include, but are not limited to
Important Dates:
A recent URL received via FB via TW from Jose Carlos Cortizo has driven my attention into Social Network Datasets. It begins with the Social Tagging list by Markus Strohmaier at his blog Intentialicious. But the comments include several other lists of datasets, most prominently:
What in fact leads me to make the question: do opinion mining datasets apply? OK, it is not Social Tagging (at least, not thematic tags). So I share the two only Opinion Mining/Sentiment Analysis datasets in Spanish I am aware of:
And after a quick search on Opinion Mining datasets, I have found the blog by Bruno Ohana with a very interesting post which presents a short tutorial about Opinion Mining with Rapid Miner. While in fact it is not really Opinion Mining (as it does not use any sentiment features, it just approaches the task as any classification task: bag of words, etc.), I see it very interesting because it is a great tutorial on using this suite to make text classification!
A way to control how your children are using Amazon, or even enabling them to buy with control and supervision. A message to me:
We're excited to introduce to you, and your whole family, a new way to checkout online. It's Amazon PayPhrase - the easy-to-remember shortcut to your Amazon.com shipping and payment settings. With their own PayPhrase, teens and college students can shop online within limits you set, and you don't have to share your credit card number or account credentials.
A PayPhrase is a phrase you create, such as "slam dunk," "totally awesome," or "Jake's Allowance." You specify the shipping address and credit card for the PayPhrase, in addition to monthly spending limits and approvals.
Setup a PayPhrase for your teen to buy clothes or for your student to buy text books. With their own PayPhrase and PIN, your teen or student can shop online within spending limits you set or you can approve each purchase.
We invite you to learn more at www.amazon.com/allowance.
I believe it is a good point...
Tenth ACM Symposium on Document Engineering
Manchester, UK, September 21-24, 2010
The ACM Symposium on Document Engineering provides an annual international forum for presentations and discussions on principles, tools and processes that improve our ability to create, manage and maintain documents. Proceedings are available through the ACM Digital Library.
TOPICS & TECHNOLOGIES
IMPORTANT DATES
Full papers & working sessions
Short papers, posters & demos
All papers