Abstract—To prevent the misuse of sensitive data, it is essential that the privacy of the data is adequately maintained without compromising on its usability. Privacy preservation thus has become an essential prerequisite to the process of data mining. Various methods including association rule mining, k-anonymizing and data hiding have been suggested for the same. In this paper, a novel technique is suggested that makes use of the concept of moments to preserve the privacy of data streams along with compression of data. The technique uses overlapping fixed size sliding windows to calculate the sequence of moments which would constitute the compressed and privacy preserved data stream. Group of points in each window is mapped to a single point in the two dimensional plane which is the centroid/moment of the graph represented by the group. This approach is promising as all the details of the data are maintained in the moments representing them. Applying this technique on a data stream reduces its size tremendously thereby making the analysis of the resultant data stream faster. Also, this method inherently achieves privacy preservation of the stream as the actual values cannot be retrieved from the moments. This technique has been tested on real world data sets as well as on synthetic data sets.
Index Terms—Privacy preservation, data compression, data mining, data streams, moments, centroids
Anushree Goutam Ringne and Deeksha Sood are students of Indian Institute of Technology Roorkee, Roorkee, India. (e-mail: anu03uec@ iitr.ernet.in; deeksuec@ iitr.ernet.in).
Durga Toshniwal is an Assistant Professor at Indian Institute of Technology Roorkee, Roorkee, India (e-mail: firstname.lastname@example.org).
Cite: Anushree Goutam Ringne, Deeksha Sood, and Durga Toshniwal, "Compression and Privacy Preservation of Data Streams using Moments," International Journal of Machine Learning and Computing vol. 1, no. 5, pp.473-478, 2011.