Abstract—Most of Natural Language Processing tasks including part-of-speech tagging, chunking, named entity recognition can be seen as tasks assigning labels to words. Many existing methods including hidden Markov models, maximum entropy Markov models and conditional random fields have been applied to label sequential data, which rely on amount of training data and can’t solve the problem of out-of-lexicon words. In this paper, we propose a new method based on word representations and conditional random fields to solve these problems. We preprocess input features via computing word similarity based on word representations which can capture semantic similarity of words on the basis of vast amounts of unlabeled training data, and then use these preprocessed features as input features of training data to train conditional random fields model. The experiment results show that our approach has improvements in labeling accuracies upon the existing methods.
Index Terms—Conditional random fields, label sequential data, word representations, word similarity.
The authors are with the Institute of Automation, Chinese Academy of Sciences, China (e-mail: email@example.com).
Cite: Xiuying Wang, Bo Xu, Changliang Li, and Wendong Ge, "Labeling Sequential Data Based on Word Representations and Conditional Random Fields," International Journal of Machine Learning and Computing vol.5, no. 6, pp. 439-444, 2015.