29.4.08

Estadísticas sobre spam/porno en España

España siempre ha sobresalido en las estadísticas sobre seguridad, especialmente para lo malo. Y ahora parece que Madrid también. Recupero tres estadísticas, alguna de ellas antigua:

  • Según las estadísticas recientes de Sophos sobre envío de spam (PDF), seguimos bien colocados entre los doce primeros países, sobre todo por el gran número de zombies (ordenadores controlados ocultamente por hackers, para usarlos para enviar spam, realizar ataques de denegación de servicio distribuídos, o para albergar contenidos ilícitos - servidores de phishing, pornografía infantil, etc). Actualmente España es el undécimo país enviador de spam, con un 3,1% del total (EE.UU. envía un 15,4%).
  • G Data, una compañía alemana de seguridad, alerta sobre que Madrid es la ciudad del mundo con más ordenadores zombies. No está claro como han conseguido esta estadística, pero desde luego no suena poco creible.
  • Recientemente se ha presentado el "Libro Rojo del Cibercrimen", obra de Francesc Canals, que presenta un nuevo vocabulario de términos de este mundillo, que merece la pena. En su presentación, Benjamín Blanco, inspector jefe de la sección de delitos tecnológicos de la Policía Nacional, hizo referencia al informe (PDF) de Anesvad que sitúa a España como el segundo país del mundo en consultas a páginas Web con pornografía infantil. Una cosa interesante es que esta información la ha recopilado Anesvad usando un honeypot de pornografía infantil, denominado Niphasex.

Concienciación (de los usuarios), autocontrol (de las operadoras), calidad técnica (de los sistemas de seguridad) y recursos (para los cuerpos y fuerzas de seguridad del Estado, y para los estamentos judiciales), es lo que hace falta para ir controlando un poco estas plagas. Como poco, seguir el decálogo de recomendaciones (PDF) que recomienda Inteco a los usuarios de Internet para la protección del PC, a través del Centro de Alerta Antivirus.

28.4.08

Palo Alto Networks - Report on Application Usage and Risk

Palo Alto Networks, a provider of firewalling solutions, has released a report on the utilization of Internet by employees at corporations. The report is based on the analysis of 350,000 corporate end users of 20 large organizations across financial services, healthcare, government, retail and education. Their main findings are:

  1. End users are actively circumventing IT control mechanisms
    • External proxies, the kind IT does not support, such as CGIProxy and KProxy, were present in 80% of the customer networks.
    • Encrypted tunneling applications such as TOR (The Onion Router) was found 15% of the time.
    • Web-based file transfer and storage applications such as Megaupload, YouSendIt and MediaMax were detected in 30% of the sites.
  2. Port 80 is much more than Web surfing
    • Over 90% of the applications traversing port 80 are not "web browsing".
    • Most applications (over 50%) using port 80 and not business related.
    • Webmail was found in 95% of the cases while IM use was found in 100% of the cases.
    • Google applications such as Google Docs and Google Desktop are in use in 60% of the sites.
  3. Bandwidth hogging applications are more common than ever
    • Video over HTTP is consuming significant bandwidth in 100% of the sites.
    • Streaming audio was present in 95% of the cases.
    • Peer-to-peer file sharing applications were found in 90% of the sites assessed, indicating that enterprise control efforts are falling short.
    • Applications such as TvAnts and UUSee that use P2P as the underlying video streaming technology was found in 25% of the sites.

They note that "acceptable application use policies are inconsistent", setting up the basis for further abuse.

25.4.08

Papers about CAPTCHAS

This is a very short selection of paper about CAPTCHAs. A CAPTCHA (source, Wikipedia) is a test used to prevent bots from submutting Web forms automatically, although they have several other applications. You must have seen one, at least! Want an example? OK, just comment this post :-) Well, take this one by the very CAPTCHA creators:

I have uploaded a number of paper to CiteUlike. It is so easy...

If you want to check them, just click on the link: papers about CAPTCHAs. Or if you prefer to get informed when I upload more, subscribe to the RSS feed :-) (If I were you, I would rather subscribe to my whole library feed).

Cuatro tiras / Four strips

Si estas leyendo estas líneas, probablemente ya conozcas las cuatro tiras siguientes / If you are reading these lines, you are probably aware of these four strips:

  • Xkcd (English) - A geek philosophy look to otherwise bizarre situations.
  • PhdComics (English) - I am happy of not having this whe I was on my PHD; I would have quit!
  • Dilbert (español / English) - Mejor en inglés - la visión pesimista de una vida laboral, por otra parte, triste. A pesimistic view of an otherwise sad work life.
  • Tira Ecol (español) - Situaciones reales de los iniciados en los misterios (de la computación).

Disfruta de ellas y dime si conoces algo por el estilo :-)

Enjoy them and post a comment with others you know :-)

Echo de menos en Xampp...

Xampp es una distribución de Apache con los módulos más corrientes que facilita su instalación y puesta en marcha inmediata, y una administración realmente simple. Xampp incluye, según que versión:

  • Linux: Apache, MySQL, PHP & PEAR, Perl, ProFTPD, phpMyAdmin, OpenSSL, GD, Freetype2, libjpeg, libpng, gdbm, zlib, expat, Sablotron, libxml, Ming, Webalizer, pdf class, ncurses, mod_perl, FreeTDS, gettext, mcrypt, mhash, eAccelerator, SQLite and IMAP C-Client.
  • MS Windows: Apache, MySQL, PHP + PEAR, Perl, mod_php, mod_perl, mod_ssl, OpenSSL, phpMyAdmin, Webalizer, Mercury Mail Transport System for Win32 and NetWare Systems v3.32, Ming, JpGraph, FileZilla FTP Server, mcrypt, eAccelerator, SQLite, and WEB-DAV + mod_auth_mysql.

Echo de menos dos cosas:

  • Un filtro de correo basura, a ser posible SpamAssassin (desde luego en Linux seguro que es factible).
  • Un CMS para blogs, por ejemplo WordPress.

24.4.08

Recent reports regarding spam

I want post here two of reports somehow related to spam, which I have known about recently. They are the following ones:

Just enjoy them :-)

22.4.08

Searching in Social Networks vs. Zotero

Every day I discover a feature in Zotero that makes me love it more.

When I came across CiteUlike, I quicky found that the search capabilities of the system were very limited. In fact, there are two exclusive search operations: by keyword and by tag. Searching by keywords is as usual, but very restricted if you are used to Google :-( Unfortunately tags are not better, as you can search for a tag at a time. I miserably miss tag boolean expressions!!!

Correction: I have had 5 minutes more for checking the question mark besides the Search text field, and I have discovered taht the full power of Lucene is behind the scene; that includes tagging searches with metadata (author:Gómez, tag:filter), boolean sintax, and wildcards (simple regexes).

Other social networks are looking for more advanced search paradigms, like Flickr with the new feature "clusters". This feature allows to, given a set of pics filed under a tag, get them orgainzed in groups (clusters). The user selects a cluster and gets its elements again organized into smaller groups. I infer that groups are built according to a image similarity measure based on shared tags. This interaction paradigm was first presented as Scatter-Gather [1], a decade ago.

Regarding tags, I believe that boolean sintax is a must, at least as an "advanced search" feature. Zotero does include the feature, as it has a tag selection box that allows to select two or more tags. Ok, it is a simple "and", but it is something. +1 for Zotero, +1 for Flikr, -1 for CiteUlike +2 for CiteUlike (for being as clever as to use Lucene).

Besides, Zotero advanced search covers boolean keyword expressions on the whole field range (title, author, etc.) with an straightforward interface (click on the advanced search button). Another +1 for Zotero!

[1] Douglass Cutting, David Karger, Jan Pedersen, and John W. Tukey. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections, Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen, 1992 (preprint, ACM).

A quote by Udi Manber

In an interview, Udi Manber (currently vice president in charge of search quality for Google, and previously a senior vice president at Amazon and Yahoo's chief scientist), has said:

Interviewer: What do you think a person expects from a Web search?
Udi Manber: They want to get an answer. Our goal is very simple: We want to return to the user the answer that they need.

So simple and so hard. These words should be on stone. Most business' failures come from ignoring users demands and opinions. Besides, getting to know what the user wants to know is being able to determine their information need, as it is called in Information Retrieval. The very first problem, still unresolved. Fortunately, still in the agenda of the very search engine.

18.4.08

Darwin papers online - reflection on online cultural stuff

The University of Cambridge, with the help of some others, has put nearly all Charles Darwin papers online. The project is named The Complete Work of Charles Darwin Online. These papers include from influential books like the On the Origin of Species, to personal letters, reprints, notes and drawings, etc. No doubt this virtual library will be of help for teachers and students, and why not, for researchers.

As more and more contents in lifbraries and museums are put online, the value of the net increases. This includes, for instance, the virtual visit to the Museo Thyssen in Madrid, or the Biblioteca Virtual Miguel de Cervantes (digital panhispanic library).

But a major criticism to these otherwise valuable works, is their licenses. Fors instance, the license of the Darwin library claims:

The materials found on the Website may not be posted or in any way mirrored on the WWW or any other part of the Internet except at the following publication site: http://darwin-online.org.uk.

What a pity. In fact, I have taken the picture above from a more solidary website, the Wikipedia.

Freedom of Speech vs. Xenophobia and Racism

It is a very difficult matter to keep the suitable equilibrium between two fundamental human rights: the freedom of speech, and the freedom from racism and discrimination. As you try to limit discourse promoting speech, you quicky find yourself limiting freedom of speech, either on the media or the Internet. Mr. Michael Head, member of the European Commission against Racism and Intolerance (ECRI - "All human beings are born free and equal in dignity and rights"), was the rapporteur in charge of summarizing the Expert Seminar: Combating Racism while Respecting Freedom of Expression, hold in Strasbourg (16-17 November 2006 - get the proceedings in PDF). A paragraph in the proceedings, that has strongly caught my attention, is the following one:

Freedom of expression and freedom from racism and racial discrimination are not conflicting, but complementary rights. We should keep in mind that human rights are interdependent and interconnected. This means that (i) there can be no such thing as two conflicting human rights and that (ii) human rights need to be interpreted in light of each other. (pg. 7)

To what extent is this proposition just an intellectual wording? Having expent more than five years building Internet filters in order to avoid abuse in certain situations (where the Internet access is intended for work or study, like at the workplace, or the schools), this is still a fundamental question. I am certainly convinced of filters being not the solution but an aiding tool for some situations.

But, when does security prevails over freedom, in other words? (note that I do not answer the question)

As a side note, the Council of Europe promoted a treaty named Convention on Cybercrime in 2002. This treaty, regarded as limiting freedom of speech by the Electronic Privacy Information Center, and its complemetary Additional Protocol to the Convention on cybercrime, concerning the criminalisation of acts of a racist and xenophobic nature committed through computer systems, have been regarded in the expert meeting as useful tools for combating racism while protecting freedom of speech. I must note that Spain signed but not ratified the first convention (what means nothing: Spain is not legally required to implement the measures proposed in the convention), and simply ignored the additional protocol.

16.4.08

Google I/O, the Google developer event

Google I/O

A two day developer gathering in San Francisco, May 28-29 2008

Two days of in-depth, technical sessions on how to build the next generation of web applications with Google and open technologies. An excellent opportunity to share experiences with other users of the powerful Google APIs, like the recent Google AJAX Language API.

http://code.google.com/events/io/

15.4.08

WhyFLOSS - Conferencias gratuitas de tecnologías abiertas de IT en Madrid

En el mes de Mayo de este año se está realizando la 4ta edición de la WhyFLOSS Conference, con entrada LIBRE y GRATUITA y con CERTIFICADOS DE ASISTENCIA y PONENCIA. Un evento internacional organizado por Neurowork que se realiza en España y Argentina y que esta vez se realizará por segunda vez en la ciudad de Madrid.

Con un importante apoyo de la Escuela de Informática de la Universidad Politécnica de Madrid, Campus Sur se presentarán conferencias variadas entorno a las tecnologías abiertas de IT.

Se encuentra abierta la convocatoria a PROPUESTAS DE PONENCIAS, la INSCRIPCION ON-LINE GRATUITA y también el PATROCINIO.

Para mayor información:

PROPUESTAS DE PONENCIAS
http://www.whyfloss.com/es/conference/madrid08/paper

REGISTRO ONLINE GRATUITO
http://www.whyfloss.com/es/conference/madrid08/register

SOLICITUD DE PATROCINIO
http://www.whyfloss.com/es/conference/madrid08/sponsor

EDICIONES ANTERIORES
http://www.whyfloss.com/es/conference/editions

SOBRE NEUROWORK
http://www.neurowork.net

Cualquier consulta contactarse por email a conference@whyfloss.com

What makes me love CiteUlike and Zotero

One of the most tedious tasks of research is to manage bibliographic references for papers and such. In particular, it is extremely boring and discouraging to copy and paste references fields (authors, title, proceedings or journal, etc.) into your favourite reference manager, and to build a bibliography for a particular work (that is, to include references from your reference manager).

Two tools are very helpful for these tasks:

An interesting analysis is to test how effective are both when capturing references from the web. This will have to wait, although as in Xkcd, "it would make a great LiveJournal entry" :-)

My CiteUlike user library is at my CiteUlike home. I promise I have NOT yet read all the papers :-) Great, it uses reCAPTCHA technology!

I promote CiteUlike: CiteULike

I promote Zotero: Get Zotero

11.4.08

CAPTCHAs broken? Few thoughts

After this post about the break of Google CAPTCHAs by Justin Mason, the prestigious author of Spam Assassin, it was clear enough for me that CAPTCHAs had been broken by human beings (in other words, they had not been broken at all).

Now there are evidences of spammers hiring persons at India for solving CAPTCHAs by hand, in order to register email account used for sending spam. So I must rethink it: CAPTCHAS have been actually broken, if there is a cost-efficient method for solving them. And paying a few $$$ a day for solving hundreds of them is quite efficient. Jeremy Jaynes, a convict spammer, was spending $50.000 a month in sending spam (and getting upto $750.000 a month). If much less is not cost-efficient, please tell whit is it.

The next generation of Search Engine Spam, no doubt it will use human farms to post links in blogs and blog comments to promote Web pages :-(

8.4.08

The beginning of a new spam epidemic?

Alexander Klink at Cynops GmbH has made public a new vulnerability of Microsoft Crypto API that allows spammers to check is a given user has valid email address. As described in his white paper, "HTTP over X.509 - a whitepaper":

Microsoft Outlook and Windows Live Mail (the successor of Outlook Express) both support the S/MIME standard for signed and encrypted emails. When opening an S/MIME-signed email (even using just the preview pane), the applications will try to fetch the URIs specified in the certificate using the Microsoft CryptoAPI. This vulnerability could for example be used by spammers to verify email addresses and that their email has not been filtered by a spam filter on the mail server. Note that this is computationally cheap for the spammer, as the S/MIME signature does not even have to be valid. Combined with IP geolocation, the spammer could also learn where the user fetches his mail from, which could be used in targeted advertising or phishing attacks.

May this be the beginning of a new spam epidemic? I believe that are few the opportunities like this one, that spammers have to reach so many users. The kind of things spammers have been routinely doing with images (no longer loaded automatically by email clients), clearly show that they can not waste such an opportunity.

Happily, my email client Thunderbird is not affected by this vulnerability :-)

7.4.08

Jornadas de conocimiento libre en la UEM

Desde GLUEM y el área de Informática de la UEM se ha organizado la semana del 14 de Abril al 18, las primeras Jornadas de conocimiento libre en la UEM. Dentro de este evento se enmarcado un ciclo de conferencias que mantienen fuerte vinculación al software libre entre otras ramas.

Estarán en las jornadas los fundadores de Menéame.net. Jorge Cortell hablará del modelo de software libre aplicado a las empresas. Bram de Jong y Jaume Ferrete de FreeSound contarán su novedoso proyecto musical. Álvaro López también hablará sobre el servidor web Cherokee. Javier Echeverría y Vicente J. Ruiz Jurado hablarán de la importancia del software libre en distintos ámbitos; así como estarán Fernando Palacio de Safe Creative explicando como funciona esta alternativa a la propiedad intelectual. Por la parte de seguridad Fernando Acero hablará del la e-administración y el voto electrónico, mientras que Adrián Yanes hablará sobre la evolución de la identidad en las redes de telecomunicaciones.

El lugar de celebración es en la Universidad Europea de Madrid. En la página web de las Jornadas se puede encontrar toda la parrilla de conferencias. Está invitado todo aquel que este interesado en el conocimiento libre a asistir al evento.

2.4.08

Artículo sobre imgSeek en Linux+

Ya ha salido el artículo que envié a Linux+ sobre imgSeek, en el número de abril. Las amigas de Linux+ han tenido a bien destacarlo como tema de portada :-)

portada linux+ abril

El título del artículo es "Organizando tus imágenes con imgSeek", y la entradilla es:

Las técnicas de análisis de imagen están madurando a una velocidad asombrosa. Buena prueba de ello es la presencia en el mercado de más y más aplicaciones prácticas, como el reconocimiento óptico de caracteres y el reconocimiento de rostros humanos (con aplicaciones a la seguridad biométrica y a la clasificación de imágenes en álbumes web). En particular, en la actualidad existe software libre capaz de ayudar a un usuario a organizar sus fotografías personales usando técnicas ya consolidadas, con resultados interesantes y útiles. En este artículo revisamos imgSeek, un sistema multiplataforma para la organización de colecciones de fotografías, que permite búsquedas basadas en el contenido y no sólo por palabras clave.

Una presentación breve de imgSeek es mi post "Búsqueda en imágenes por el contenido: imgSeek". Sigo trabajando en convertirlo en un bloqueador de imágenes pornográficas. Los scripts están, ahora me falta integrarlo con algún proxy-caché como Squid.