NAACL-HLT 2010 Workshop on Active Learning for Natural Language Processing (ALNLP)
June 5 or 6, 2010, Los Angeles, CA
Labeled training data is required to achieve state-of-the-art performance for many machine learning solutions to NLP tasks. While traditional supervised methods rely exclusively on existing labeled data to induce a model, active learning allows the learner to select unlabeled data for labeling in an effort to reduce annotation costs without sacrificing performance. Thus, active learning appears promising for NLP applications where unlabeled data is readily available (e.g., web pages, audio recordings, minority language data), but obtaining labels is cost-prohibitive.
Ample recent work has demonstrated the effectiveness of active learning over a diverse range of applications. Despite these findings, active learning has not yet been widely adopted for many ongoing large-scale corpora annotation efforts -- resulting in a dearth of real-world case studies and copious research questions. Machine learning literature has primarily focused on active learning in the context of classification, devoting less attention to issues specific to NLP including annotation user studies, incorporation of semantic information, and more complex prediction tasks (e.g. parsing, machine translation).
The aim of this workshop is to foster innovation and discussion that advances our understanding in these and other practical issues for active learning in NLP. Topics of particular interest include:
- Alternative query types: labeling features rather than instances, mixed-resolution queries for structured instances, etc.
- Creative ways for obtaining data via active learning (e.g., online games, Mechanical Turk)
- Managing multiple, possibly non-expert annotators (e.g., "crowdsourcing" environments)
- Reusability: using data acquired with one active learner to train other model classes
- Domain adaptation and active learning
- Multi‐task active learning
- Criteria for stopping and monitoring active learning progress
- Active learning in coordination with semi-supervised or unsupervised learning approaches
- Interactive active learning interfaces and other HCI issues
- Parallelization of active learning and its computational challenges
- Software engineering considerations for active learning and NLP
- Theoretical analysis of active learning
We also welcome case-study papers describing the application of active learning in real-world annotation projects and lessons learned thereby. Additionally, we would consider papers with insights applicable to NLP from other machine learning communities (e.g., computer vision, bioinformatics, and data mining), where annotation costs are also high.
- March 1, 2010: Paper Submission Deadline
- March 30, 2010: Notification of acceptance
- June 5 or 6, 2010: Workshop held in conjunction with NAACL-HLT