Multilingual Complex Named Entity Recognition with Data and Context Augmentation

Chahyon Ku, London Lowmanstone IV, Asal Shavandi, Josh Spitzer-Resnick

University of Minnesota, Twin Cities

Proposal Paper
Proposal Slides
Midterm Paper
Final Paper
Final Slides
Code

Extended XLM-RoBERTA + Conditional Random Field (CRF) baseline with data and context augmentation

Data Augmentation

Translation based data augmentation with Google Cloud API (MulDA), utilizing multilingual dataset.

Context Augmentation

Following procedures of KB-NER[1], we use Elasticsearch to index Wikipedia articles in 100 languages and use the index to augment the training data with context information.