• Jul 16, 2019 News!Good News! All papers from Volume 9, Number 3 have been indexed by Scopus!   [Click]
  • Mar 27, 2019 News!Good News! All papers from Volume 9, Number 1 have been indexed by Scopus!   [Click]
  • Jul 08, 2019 News!Vol.9, No.4 has been published with online version.   [Click]
Search
General Information
    • ISSN: 2010-3700 (Online)
    • Abbreviated Title: Int. J. Mach. Learn. Comput.
    • Frequency: Bimonthly
    • DOI: 10.18178/IJMLC
    • Editor-in-Chief: Dr. Lin Huang
    • Executive Editor:  Ms. Cherry L. Chen
    • Abstracing/Indexing: Scopus (since 2017), EI (INSPEC, IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library.
    • E-mail: ijmlc@ejournal.net
Editor-in-chief
Dr. Lin Huang
Metropolitan State University of Denver, USA
It's my honor to take on the position of editor in chief of IJMLC. We encourage authors to submit papers concerning any branch of machine learning and computing.

IJMLC 2019 Vol.9(2): 222-229 ISSN: 2010-3700
DOI: 10.18178/ijmlc.2019.9.2.790

SAP: Standard Arabic Profiling Toolset for Textual Analysis

Khalid M. O. Nahar, Ahmed F. Al Eroud, Malek Barahoush, and Abdallah M Al-Akhras
Abstract—This paper defines a Standard Arabic Profiling (SAP) toolset that helps researchers for textual analysis and comparing between different Arabic corpora. Since tools for Arabic language are needed, we present the SAP toolset to simplify the textual analysis process. The approach consists of three profilers: The Part of Speech (POS) profiler that gives statistical analysis for a given document, vocabulary profiler which provides user with an indication out the vocabulary used in a document with reference to Open Source Arabic Corpus (OSAC) of two news agencies (CNN and BBC). The process is accomplished by computing similarity between documents and corpus using Log likelihood measure. Lastly the newly added profiler is the Readability profiler which is used to 1) assess the readability level for a document according to Flesch Reading Ease Readability Formula, and 2) measure the simplicity and ambiguity levels of the document. We described the current part-of-speech for this toolset and how we can extend its functionality to embrace vocabulary and readability profiling.

Index Terms—Arabic natural language processing, part-of-speech tagging (POST), text analysis, software.

Khalid M. O. Nahar and Malek Barahoush are with the Department of Computer Sciences, Faculty of IT and Computer Sciences, Yarmouk University, Irbid, 21163, Jordan (Corresponding author: Khalid M.O. Nahar; e-mail: khalids@yu.edu.jo).
F. Al Eroud and Abdallah M Al-Akhras are with the Department of Computer Information System, Faculty of IT and Computer Sciences, Yarmouk University, Irbid, 21163, Jordan.

[PDF]

Cite: Khalid M. O. Nahar, Ahmed F. Al Eroud, Malek Barahoush, and Abdallah M Al-Akhras, "SAP: Standard Arabic Profiling Toolset for Textual Analysis," International Journal of Machine Learning and Computing vol. 9, no. 2, pp. 222-229, 2019.

Copyright © 2008-2019. International Journal of Machine Learning and Computing. All rights reserved.
E-mail: ijmlc@ejournal.net