Extraction of Trend Keywords and Stop Words from Thai Facebook Pages Using Character n-Grams

Home > Archive > 2018 > Volume 8 Number 6 (Dec. 2018) >

IJMLC 2018 Vol.8(6): 589-594 ISSN: 2010-3700
DOI: 10.18178/ijmlc.2018.8.6.750

Nattapong Ousirimaneechai and Sukree Sinthupinyo

Abstract—In the era of data and information, insight of user’s behavior such as trend is normally used in real-time marketing for improvement of gross profit, therefore, it is beneficial to know the trend in social media. Word tokenization and stop words list are the conventional method for keyword extraction task, however for Thai language in social media platform, there are still no efficient word tokenization tools and stop words list to extract trend from platform such as Facebook. Therefore, in this research, we propose an algorithm that require no word tokenization tools and external stop words list for the purpose of Trend Keywords extraction. The core idea is using Character n-Grams, instead of Word n-Grams, to tokenize, process, and combine n-Grams into keyword. After that we identified Trend Keywords from other keywords by using our algorithm to generate stop words list for filtering out stop words. For the evaluation of result, we use human to classify the retrieved Trend Keywords and compare them with Trend Keywords from baseline method. As a result, our algorithm can identify more keyword than baseline method. Finally, the precision of generated stop words list is 97.6%, and the precision of Trend Keywords is 40% with the used of 1-month generated stop words list. Furthermore, by using 2-months generated stop words list, the precision can be increased to 44% by consuming more processing time for list of stop words.

Index Terms—Information retrieval, keyword extraction, social media mining, stop words.

The authors are with Chulalongkorn University, Thailand (e-mail: 6070188521@student.chula.ac.th, sukree.s@chula.ac.th).

[PDF]

Cite: Nattapong Ousirimaneechai and Sukree Sinthupinyo, "Extraction of Trend Keywords and Stop Words from Thai Facebook Pages Using Character n-Grams," International Journal of Machine Learning and Computing vol. 8, no. 6, pp. 589-594, 2018.

PREVIOUS PAPER

Remedial Actions Recommendation via multi-Label Classification: A Course Learning Improvement Method

NEXT PAPER

Tweet Semantic Classification in Civic Engagement Research

General Information

E-ISSN: 2972-368X
Abbreviated Title: Int. J. Mach. Learn.
Frequency: Quaterly
DOI: 10.18178/IJML
Editor-in-Chief: Dr. Lin Huang
Executive Editor: Ms. Cherry L. Chen
Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library, CNKI.
E-mail: ijml@ejournal.net

Home

About IJML

Editorial Board

Author Guideline

Editor Guideline

Reviewer Guideline

Special Issues

Archive

Home > Archive > 2018 > Volume 8 Number 6 (Dec. 2018) >

Extraction of Trend Keywords and Stop Words from Thai Facebook Pages Using Character n-Grams

General Information

Article Metrics in Dimensions