Abstract—The paper discusses the approach in creating a
Filipino WordNet. A semi-supervised learning approach using
Decision Tree and Language Modeling. This will take
advantage on the information found on the web. It will help
future NLP researchers in Filipino language. The approach
uses words from a dictionary as preliminary data and as seed
for the search engine to start crawling the WWW. To decide if
the word is part of Filipino language, the word will first
undergo in Code-Switching Points Module (CSPD). CSPD
scores the word by using the frequency counts of word bigrams
and unigrams from language models which were trained from
an existing and available corpus. After scoring, Filipino
Stemmer will get the stem of the word and examine if the stem
word is part of the said language. Once the words were scored
and stemmed, the archive will evaluate if the word is Filipino.
To test the accuracy of the system, we collected different articles
around the web and then grouped it into two groups — Plain
Filipino and Bilingual. The result shows the F-measure for
Plain Filipino Category range between 65.65% - 96.85% with
an average of 85.64% while for Bilingual range between 60% -
100% with an average of 88.17%.
Index Terms—Corpus building, information retrieval, data and web mining, lexicography.
R. A. Sagum is with the Department of Computer Science, College of Computer and Information Sciences (CCIS), Polytechnic University of the Philippines, Philippines (e-mail: firstname.lastname@example.org).
A. D. Ramos and M. T. Llanes are with Polytechnic University of the Philippines, Philippines
Cite: Ria Ambrocio Sagum, Aldrin D. Ramos, and Monique T. Llanes, "FICOBU: Filipino WordNet Construction Using Decision Tree and Language Modeling," International Journal of Machine Learning and Computing vol. 9, no. 1, pp. 103-107, 2019.