Mis reflexiones sobre tecnología e Internet, seguridad e inteligencia artificial.
My opinions about technology, Internet, security and Artificial Intelligence
As Data Mining enters the mainstream, the market with more and more applications, and reaches the average user through a range of online applications, people gets more and more conscious about what can be done, but more importantly, how it is being done. And lack of information about data, methods, and so, let applications to show anything you did not expect (and far from reality, also, somtimes).
"You have to feel it yourself", may have thought Aaron Zinman, a researcher at the MIT Sociable Media Group. "And you will, with Personas". And he has prepared an online piece, Personas, framed by the installation by the Sociable Media Group at the MIT Museum. You can get a feeling of the installation by watching the MIT TV Video:
The philosophy of the installation is (quoting Aaron):
In a world where fortunes are sought through data-mining vast information repositories, the computer is our indispensable but far from infallible assistant. Personas demonstrates the computer's uncanny insights and its inadvertent errors, such as the mischaracterizations caused by the inability to separate data from multiple owners of the same name. It is meant for the viewer to reflect on our current and future world, where digital histories are as important if not more important than oral histories, and computational methods of condensing our digital traces are opaque and socially ignorant.
In Personas, the user enters his/her name and gets a bunch of categories that are expected to explain what is around him/her on the Web. The application runs in two steps:
Collecting information about the name by querying Yahoo! with specially crafted queries, and post-processing the hits to avoid hate speech and other irrelevant material.
Apply a unsupervised categorization process named Latent Dirichlet Allocation, that assigns a number of keywords and weights (shown as the size of the final bars) to the name. The basic data for this categorization has been collected from 2 million queries.
I strongly recommend to go through the explanation in the read more link inside Personas.
For instance, Personas starts with the query field:
The process is fully "visual":
And you get your Personas characterization.
To what extent does this information shows a real picture of me? Well, at least almost all the hits by the system are mine, but correlation is, eh... say a bit strange. Sports? Genealogy?...FAME?
Ok, for me the goal is done. What is bad at the process (if there is something wrong, or I am just disturbed)? Try to guess without precise information about the process. That is the goal. And it is done.
2009/10/10 -- Workshop proposals due 2009/10/26 -- Abstracts due for papers and demos 2009/11/02 -- Papers and demos due 2009/11/15 -- Tutorial proposals due
The European Bioinformatics Institute (EBI) and the National Centre for Text Mining (University of Manchester) are organising a joint training event at the EBI, on October 5th / 6th, 2009.
The purpose of this event is to teach basic techniques in information retrieval (IR) and information extraction (IE) in the biomedical domain and to give hands-on training on existing solutions provided by the two centres. This seminar will give you the opportunity to meet the experts behind the established solutions.
Intended audience: Biomedical researchers, biocurators, bioinformaticians, medical informaticians and any other researcher active in biomedical research.
In cooperation with: BCS-IRSG, ACM SIGIR, The Open University, Dublin City University, University of Essex
The European Conference on Information Retrieval provides an opportunity for both new and established researchers to present research papers reporting new, unpublished, and innovative research results within information retrieval.
The Program Chairs invite for the submission of original research papers and posters in all areas of Information Retrieval, including but not limited to:
Enterprise Search, Intranet, Desktop, Adversarial IR
Web IR
Digital libraries
IR Theory and Formal Models
Web log analysis
Distributed IR, peer to peer IR, Mobile IR, Fusion/Combination
Multimedia IR
Cross-language retrieval, Multilingual retrieval, Machine translation for IR