Abstract—The convergence of artificial intelligence (AI)
technology and natural language processing (NLP) has rapidly
increased the demands for an analysis on the natural language
that involves plenty of ambiguities not present in formal
language. For this reason, the language model (LM), a statistical
approach, has been used as a key role in this area. Recently, the
emerging field of deep learning, which applies complex deep
neural networks for machine learning tasks, has been applied to
language modeling and achieved more remarkable results than
traditional language models. One of the important techniques
that have led neural network-based LM success is the attention
mechanism. Attention mechanism makes neural networks pay
attention to specific words in the input sentence when
generating the output words. However, although the attention
mechanism has improved the performance of many neural
network models, it requires tons of parameters to achieve the
state-of-art level performance. This is because attention
mechanism encodes the context of a word by simply
accumulating the outputs from the network for all the input
words, which may cause information loss. To compensate for
this limitation, we propose an extension of attention mechanism
by adopting a convolutional neural network to replace the
accumulation. With only far fewer parameters, our model
achieved comparable performance to the recent state-of-the-art
models on the very popular benchmark datasets, yielding
perplexity scores of 58.4 on the Penn Treebank dataset and 50.1
on the Wikitext-2 dataset, respectively.
Index Terms—Attention mechanism, deep learning, neural
language model, neural network.
Seongik Park and Yanggon Kim are with the Department of Computer &
Information Sciences at Towson University, USA (e-mail:
spark32@students.towson.edu, ykim@towson.edu).
Ki Yong Lee is with the Division of Computer Science at Sookmyung
Women's University, South Korea (e-mail: kiyonglee@sookmyung.ac.kr).
Cite: Seongik Park, Ki Yong Lee, and Yanggon Kim, "A Neural Language Model with Attention Mechanism Based on Convolutional Neural Network," International Journal of Machine Learning and Computing vol. 12, no. 2, pp. 57-62, 2022.
Copyright © 2022 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).