• Jun 14, 2017 News!Vol.6, No.3 has been indexed by EI(Inspec)!   [Click]
  • May 03, 2016 News!Vol.5, No.5 has been indexed by EI(Inspec)!   [Click]
  • May 03, 2016 News!Vol.5, No.4 has been indexed by EI(Inspec)!   [Click]
General Information
    • ISSN: 2010-3700
    • Frequency: Bimonthly
    • DOI: 10.18178/IJMLC
    • Editor-in-Chief: Dr. Lin Huang
    • Executive Editor:  Ms. Cherry L. Chen
    • Abstracing/Indexing: Engineering & Technology Digital Library, Google Scholar, Crossref, ProQuest, Electronic Journals Library, DOAJ and EI (INSPEC, IET).
    • E-mail: ijmlc@ejournal.net
Editor-in-chief
Dr. Lin Huang
Metropolitan State University of Denver, USA
It's my honor to take on the position of editor in chief of IJMLC. We encourage authors to submit papers concerning any branch of machine learning and computing.
IJMLC 2015 Vol.5(5): 384-387 ISSN: 2010-3700
DOI: 10.7763/IJMLC.2015.V5.538

Structured Vectors for Chinese Word Representations

Changliang Li, Bo Xu, Xiuying Wang, Gaowei Wu, Guanhua Tian, and Wendong Ge
Abstract—The use of word representations has been a key reason for the success of many NLP tasks. A lot of work has focused on improving the learning of word representations, and most approaches treat word as atomic unit. However, in some languages, for example Chinese, some words cannot be recognized correctly. This leads to the corruption of word embeddings’ ability to capture semantic information. This paper addresses this shortcoming by proposing structured embeddings for word representations. Our method utilizes sub-word and atomic unit embeddings to represent word embeddings. We build structured vectors for Chinese word representations based on the method, and evaluateon SemEval-2012 Task 4: Measuring Chinese word similarity. The result shows that our method is remarkably effective in capturing semantic information and outperforms previous best performance by a large margin. Our method can be extended to the languages which do not have a trivial word segmentation process.

Index Terms—Word embeddings, word segmentation, semantic information.

The authors are with the Institute of Automation Chinese Academy of Sciences 95 Zhongguancun East Road, 100190, Beijing, China (e-mail: changliang.li@ia.ac.cn, xubo@ia.ac.cn, xiuying.wang@ia.ac.cn, gaowei.wu@ia.ac.cn, guanhua.tian@ia.ac.cn, wending.ge@ia.ac.cn).

[PDF]

Cite: Changliang Li, Bo Xu, Xiuying Wang, Gaowei Wu, Guanhua Tian, and Wendong Ge, "Structured Vectors for Chinese Word Representations," International Journal of Machine Learning and Computing vol.5, no. 5, pp. 384-387, 2015.

Copyright © 2008-2015. International Journal of Machine Learning and Computing. All rights reserved.
E-mail: ijmlc@ejournal.net