报告题目:Named Entity Recognition with few or zero Annotated Training Data



邀请人:东北大学 2138com太阳集团 王斌 副教授

Named Entity Recognition (NER) is a fundamental task in Natural Language Processing. While most proposed models are data-driven and based on deep neural networks, they inevitably require large amounts of human-annotated training data. However, it is hard to obtain sufficient human-annotated data in many specific yet important domains such as biomedical and national public security. This motivate us to explore effective methods which can tackle the NER problem with few or even zero human-annotated data.We proposed a stacking model which utilizes existing NER tools with only few annotated data. With 100 annotated sentences, our model achieves competitive performance to neural models with around 10,000 training sentences. In order to handle cases with none human-annotated data,we proposed a generalized distant supervision method to obtain high quality machine annotated data and use them to train a span-level classifier. With entity types predicted by the model, we can infer the named entities through a dynamic programming algorithm. Experiment shows that, with sufficient raw text, our model achieves similar or even better results than semi-supervised models.


Dr. Yifang Sun is a research associate from School of Computer Science and Engineering, University of New South Wales, Australia. His research interests include machine learning, natural language processing, and query processing on massive data with different data types. He is mainly focusing on knowledge graph construction, high dimensional nearest neighbor search, and security issues in retrieval systems. He has published more than 10 papers in top-tier conferences and journals, such as VLDB, ICDE, TKDE, AAAI, ACL.