CSE
CSE
CSE CSE


Defense Event
AI Seminar
Natural Language Processing Seminar

Practical Natural Language Processing for Minority Languages

Ben King


 
Monday, April 20, 2015
6:00pm - 8:00pm
3725 Beyster Bldg.

Add to Google Calendar

About the Event

Most work in Computational Linguistics and Natural Language Processing (NLP) focuses on English or other languages that have text corpora of hundreds of millions of words. In this thesis, we present methods for automatically building NLP tools for minority languages with minimal need for human annotation in these languages. We start first with language identification, the problem of recognizing a text’s language in the absence of an explicit label. We specifically focus on word-level language identification, an understudied variant that is necessary for processing Web text and develop highly accurate machine learning methods for this problem. From there we move onto the problems of part-of-speech (POS) tagging and dependency parsing. With both of these problems we take the approach of adapting tools built from English and other well-supported languages for use on a minority language that doesn’t have large annotated corpora. By projecting annotations from many different languages across parallel text, we are able to create accurate tools in the low-resource target language

Additional Information

Sponsor(s): Professor Dragomir R. Radev

Open to: Public