• May 23, 2018 News![CFP] 2018 the annual meeting of IJMLC Editorial Board, ACMLC 2018, will be held in Ho Chi Minh, Vietnam, December 7-9, 2018   [Click]
  • May 23, 2018 News!Good News! All papers from Volume 8, Number 1 have been indexed by Scopus!   [Click]
  • Jun 21, 2018 News!Vol.8, No.3 has been published with online version.   [Click]
Search
General Information
Editor-in-chief
Dr. Lin Huang
Metropolitan State University of Denver, USA
It's my honor to take on the position of editor in chief of IJMLC. We encourage authors to submit papers concerning any branch of machine learning and computing.
IJMLC 2018 Vol.8(1): 74-79 ISSN: 2010-3700
DOI: 10.18178/ijmlc.2018.8.1.666

Learning Random Forest from Histogram Data Using Split Specific Axis Rotation

Ram B. Gurung, Tony Lindgren, and Henrik Boström
Abstract—Machine learning algorithms for data containing histogram variables have not been explored to any major extent. In this paper, an adapted version of the random forest algorithm is proposed to handle variables of this type, assuming identical structure of the histograms across observations, i.e., the histograms for a variable all use the same number and width of the bins. The standard approach of representing bins as separate variables, may lead to that the learning algorithm overlooks the underlying dependencies. In contrast, the proposed algorithm handles each histogram as a unit. When performing split evaluation of a histogram variable during tree growth, a sliding window of fixed size is employed by the proposed algorithm to constrain the sets of bins that are considered together. A small number of all possible set of bins are randomly selected and principal component analysis (PCA) is applied locally on all examples in a node. Split evaluation is then performed on each principal component. Results from applying the algorithm to both synthetic and real world data are presented, showing that the proposed algorithm outperforms the standard approach of using random forests together with bins represented as separate variables, with respect to both AUC and accuracy. In addition to introducing the new algorithm, we elaborate on how real world data for predicting NOx sensor failure in heavy duty trucks was prepared, demonstrating that predictive performance can be further improved by adding variables that represent changes of the histograms over time.

Index Terms—Histogram random forest, histogram data, random forest PCA. histogram features.

Ram B. Gurung and Tony Lindgren are with Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden (e-mail: gurung@dsv.su.se, tony@dsv.su.se).
Henrik Boström was with Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden. He is now at KTH Royal Institute of Technology, School of Information and Communication Technology, Kista, Sweden (e-mail: bostromh@kth.se).

[PDF]

Cite: Ram B. Gurung, Tony Lindgren, and Henrik Boström, "Learning Random Forest from Histogram Data Using Split Specific Axis Rotation," International Journal of Machine Learning and Computing vol. 8, no. 1, pp. 74-79, 2018.

Copyright © 2008-2018. International Journal of Machine Learning and Computing. All rights reserved.
E-mail: ijmlc@ejournal.net