Abstract—Social media has received overwhelming interest globally from their functions to post users contents, some of which has not been validated and may contain false or misleading content. There have been several studies to assess the credibility of social media posts to tackle such problem. Most existing online assessment systems evaluated the credibility of every post. This practice may be suboptimal. Many contents are not newsworthy (e.g., selfies and personal opinions, which are irrelevant to credibility notion). Assigning credibility score to a non-newsworthy post may confuse users. In addition, a recent study has shown that the inclusion of such irrelevant non-newsworthy data deteriorates the quality of credibility assessment. Therefore, identification and exclusion of non-newsworthy posts are crucial to reliable credibility assessment. This article investigates how different types of post features are effective for automatic non-newsworthiness removal. Three post features, i.e., text, topic, and social features, were evaluated with two classification methods which were machine learning and cosine similarity. Our findings reveal the essence of social features and its combination for non-newsworthiness identification.
Index Terms—Social network analysis, credibility measurement, information credibility, supervised machine learning, TF-IDF.
The authors are with Department of Computer Engineering, Faculty of Engineering, Khon Kaen University, Khon Kaen, 40002 Thailand (e-mail: email@example.com, firstname.lastname@example.org, email@example.com).
Cite: Chaluemwut Noyunsan, Tatpong Katanyukul, and Kanda Runapongsa Saikaew, "Non-Newsworthy Message Removal for Efficient Credibility Assessment," International Journal of Machine Learning and Computing vol. 7, no. 6, pp. 203-207, 2017.