In the list I am subscribed, there are announces of new data collections from time to time. These resources are extremely valuable for Bio-NLP research, we must disseminate them and strongly appreciate their builders work:
- The Corpora for Named Entity Recognition of Chemical Compounds by the folks at the Fraunhofer SCAI.
- The data for the TREC Chemistry Track.
- The GENIA project corpus by the Tsujii Laboratory (University of Tokyo).
A number of tools are available at the Bio-NLP Resources page compiled by Martin Krallinger and his group.