Technical Reports
HPL-2010-166
Fine Grained Classification of Named Entities In Wikipedia
Tkachenko, Maksim; Ulanov, Alexander; Simanovsky, Andrey
HP Laboratories
HPL-2010-166
Keyword(s): named entity recognition; Wikipedia; classification
Abstract: This report describes the study on classifying Wikipedia articles into an extended set of named entity classes. We employed semi-automatic method to extend Wikipedia class annotation and created a training set for 15 named entity classes. We implemented two classifiers. A binary named-entity classifier decides between articles about named entities and other articles. A support vector machine (SVM) classifier trained on a variety of Wikipedia features determines the class of a named entity. Combination of the two classifiers helped us to boost classification quality and obtain classification quality that is better than state of the art.
10 Pages
External Posting Date: October 21, 2010 [Fulltext]. Approved for External Publication
Internal Posting Date: October 21, 2010 [Fulltext]