Folksonomy Graphs Based Context-Aware Recommender System Using Spectral Clustering

The advent of collaborative information systems has scaled up the growth of the web into a huge repository of all kind of resources. The web user can share and annotate any identifiable thing, resource or item on the web. The social web has also empowered users by the tagging practice that enables a collaborative classification, folksonomy, of their shared resources. Still, the abundant web contents are mostly unorganized which make it hard for users to find and discover items of their interests. Thus, many major websites and companies' platforms use recommender systems in the user interface. Recommender systems assist users in their searching and exploring experience and provide them with relevant items matching their preferences. This article presents a folksonomy graphs based context-aware recommender system of resources. The generated graphs express the semantic relatedness between resources by effectively modelling the folksonomy relationship between user-resource-tag and integrating contextual information. The proposed approach incorporates spectral clustering to deal with the graph partitioning problem. The experimental evaluation shows relevant performances results of the Goodbooks-10k dataset for book recommendations. Future perspectives will integrate the graph theory and network analysis to improve the resources recommendation.


I. INTRODUCTION
A large number of available web resources is tremendously growing on the web. A web resource is anything that can be obtained from the Web (e.g., websites, videos, articles, pictures, etc.). For the web users' perspective, the web is witnessed as a huge repository of items. Among others usages, end-users annotate and enrich the existing web content by free tagging the web resources of their interest with their own tags for indexing purposes [1]. Yet, the fundamental purpose of using the web is to find the information that fits the web user preferences and needs in overloaded search spaces. The users are making use of the web to meet their intended purposes by seeking resources or items of their interest. The lack of a complete indexation and organization of the web resources makes them not easily reachable [1]. The information filtering system, like recommender system, enhances the chances of finding and discovering relevant items. It suggests items that might not have been yet founded through the user's searching. The Recommender system is an information retrieval tool that Manuscript  suggests personalized recommendations for users in response to their challenge of finding resources that best fit their preferences and needs. The performance of the recommender system answers the users' inquiries not only by retrieving adequate and relevant information but also proposing personalized items. The recommendations process concerns with information filtering by using the traditional filtering approach, namely collaborative filtering (CF), content-based (CB) filtering, and hybrid filtering [2]. The CF approach is based on users' past behavior. It analyzes similar users sharing the same interest and the users' behavior regarding similar items. CF based recommender system operates on the user rating matrix and recommends items rated by similar users in the past (memory-based recommendations). The CB filtering approach focuses on the contents of items that were previously rated by the user. The recommendations are based on the similarity among items rated in past by different users to suggest the best matching items. The hybrid-based recommender systems combine two or more filtering techniques. The classical 2-dimensional approaches (users x items) has been extended with the current researches in the recommender system. For better-personalized user recommendations, the recent recommender system tries to leveraging the contextual information in their process of recommendation, called context-aware recommendation system CARS. The term "context" is commonly defined as [3]: "Any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and application themselves". In recommender system, the contextual information can be used as information identifying the geographical situation (spatial context), the period of time (temporal context) and permanent features (static context) of the users and/or the items. The static context characterizes a possessed attribute describing the users and/or the items that are unchanged over time, like metadata of a book: its authors, languages, date of publication, category, etc.
With the arrival of social networks and the collaborative software, tags have been extensively used by users to annotate available items on the web. They are a kind of contextual information characterizing multimedia content. The emergence of social interactions has involved the creation of collaborative tagging system, known by the name of folksonomy. Folksonomy refers to a collection of tags created by folks to describe shared contents. It evolves from the great amount of the relation relaying user-resource-tag. The user's annotations or tags emerge from the wisdom of the crowd. Folksonomy, Folk's taxonomy, is a community classification of annotated resources. It is a fine example of collective intelligence.
In this article, we aim to construct folksonomy graphs to enhance the recommendation process. The folksonomy's knowledge is formalized by the relationship between user-resource-tag. In recommender system terminology, the term "item" denotes what the system recommends to users [4]. In this paper, the terms "item" and "resource" will be used interchangeably. The resource's descriptive will be enriched with the contextual information evolving from the application domain, also with their assigned descriptive tags. The concept of folksonomy graphs based recommender system is analogical to the knowledge graphs used to enhance the search engines. In fact, thousands of large and medium-sized companies manage and maintain their knowledge in the form of a corporate knowledge graph [5]. For instance, Facebook and Amazon and other giant companies have constructed their knowledge graphs that incorporate their large amounts of data. The folksonomy graphs will contain relevant knowledge about the items emerging from their descriptive metadata and their attributed tags. The recommender system can be empowered by the folksonomy graph reasoning tasks applied over large amounts of data provided from the social interactions. The article describes the use of spectral clustering as a pre-processing step before constructing the folksonomy graph based recommendation algorithm.
The rest of the paper is organized as follows: Section II presents the related work of some recent recommender systems approaches. Section III depicts the proposed approach of folksonomy graphs based context-aware recommender system using spectral clustering. Section IV describes the evaluation and experimental results applied to the Goodbooks-10k [6] dataset for book recommendations. Finally, the conclusion and future directions are delineated in Section V.

II. RELATED WORK
Recommender system RS has become popular in the daily online user experience. It offers recommendations of information of interest, such as web contents, things, events, places and people. Many research fields have contributed to recommendation methods like information retrieval, machine learning and human-computer interaction. The common concern is to model the user's interests by increasing the chances of finding interesting items. The recommender system is a subclass of information filtering system that aims to assist the user's search behavior by suggesting items that best meet their interests and preferences. The main commonly distinguished filtering algorithms of recommender systems are collaborative (CF), content-based (CB) and hybrid filtering recommendations methods [7].
For the CF recommendation method, the user profile is built by filtering information from the user's behaviors like ratings, assigned tags and comments or implicitly ranking by liking the items. There are two categories of CF: User-based CF that measures the similarity between users' profiles such as the nearest-neighbor method; And Item-based CF that uses the target users' ratings to find similarity between items. The CF based RS suggests items of users with similar preferences. For CB filtering, the user profile is constructed based on his previously preferred items by filtering metadata describing those items such as the keywords, category and other descriptive features. The CB based RS suggests similar items to those that were preferred by the user in the past. Knowledge recommender system [8] has emerged with the large amount generated knowledge. It deals with the knowledge overload by filtering the most relevant ones that match the user's preferences. The knowledge recommendation approach is applied to overcome the cold start problem and help users in decision making. The recommender system establishes a similarity relationship among items or users to generate recommendations. The authors [9] incorporate similarity relationships to improve the accuracy of the recommender system. The processing of ubiquitous information over the web has called the attention of researchers for developing context-aware recommender systems CARSs that harness context-awareness with the information filtering to offer the most accurate recommendations [10]. Generally, context is an information characterizing and surrounding the situations pertaining to the items to be recommended. CARSs integrate different type of contextual data, like temporal, spatial, environmental data and others. For instance, the mobile applications of the tourism domain have significantly improved by using contextual information. The user's current context is modeled by his contextual information collected form his smartphone's diverse range of sensors [11]. However, the CARSs come across some challenges that affect the precision of recommendations [12], such as sparsity, cold start and scalability issues.
In recommender system, the sparsity issue occurs due to the insufficient of data required to extract descriptive metadata, rating and contextual information about the items. The CF based recommender system come across this issue because of its filtering technique depends on the ratings of similar users. For example, the MovieLens data is represented by the user-item matrix that increases its dimension with the users' ratings. This matrix suffers from data sparseness when the majority of users do not rate a large number of items [13]. The cold start problem occurs when a user or an item is new to the system which has insufficient ratings or records at the start. Most CF based recommender systems lack in offering accurate recommendations because of the challenging cold start issue [7]. It has been addressed by the CB filtering. A scalable system is capable of handling efficiently and effectively a huge volume of data. However, the current recommender system deals with the scability issues that increases the time processing and reduces the accuracy of the recommendations [12].
The proposed approach of folksonomy graphs context-aware recommender system will not only improve recommendation precision but also substantially mitigate the aforementioned issues.

III. PROPOSED APPROACH
The social web has initiated the use of collaborative tagging over the years. It has called the attention to analyze the inter-connectivity of the user-resource-tag to improve the recommender system. The proposed approach models the folksonomy characteristics by inferring user-resource-tag in graphs of tags and a graph of items defined and linked to one International Journal of Machine Learning and Computing, Vol. 10, No. 1, January 2020 another (see Fig. 1). The use of spectral clustering will pre-process the construction of the graph of resources to reduce the scalability issue. For each graph, we examine the relationship between its entities and identify actionable knowledge.
For a community of users U= {us}, they annotate a set of resources R= {rk} with a set of tags T={ti}.
And h, m and n are respectively the total number of users, resources and tags.

A. Step 1: Spectral Clustering
The use of spectral clustering stages a pre-processing phase before constructing the graph of resources. Spectral clustering deals with the graph partitioning problem. It transforms the current space to bring connected data points close to each other to form clusters. In this context, the data points to be clustered are the resources.

1) Clustering
Clustering is one of the most widely used techniques for exploratory data analysis. Its goal is to divide the data points into several groups such that points in the same group are similar and points in different groups are dissimilar to each other. Spectral clustering has become increasingly popular due to its promising performance in graph-based clustering. It can be solved efficiently by standard linear algebra software, and very often outperforms traditional algorithms such as the k-means algorithm. Spectral clustering does not make assumptions about the shape of clusters. Unlike K-means, that assumes a spherical shape for the resulted clusters. Spectral clustering gives importance to connectivity (within data points) rather than compactness (around cluster centers). The goal of spectral clustering is to cluster data that is connected.

2) Background
The set of resources R= {rk} represents the data points, where the resource rk denotes data entry. Each rk  R f , where f is the number of features describing rk like spatial, temporal and static contextual features, tags.

3) Similarity matrix
Given an enumerated set of data points R, the Similarity or Adjacency Matrix is defined as a symmetric matrix A, where Aij  0 represents a measure of the similarity between two data points, resources, ri and rj.
Aij 1  when ri and rj have the same features.
The data points, or resources, are in the same cluster when there are close, but in different clusters when there are far away. But data points in the same cluster may also be far away or even farther away than points in different clusters.
The goal is to transform the f-dimensional space so that when 2 points ri and rj are close, they are always in the same cluster, and when they are far apart, they are in different clusters. A common way to define similarity is by using the Gaussian Kernel (1).

4) Unnormalized graph laplacian
The unnormalized Graph Laplacian (2) is a matrix defined as the difference of 2 matrices denoted: 5) Process of spectral clustering 1) Construct the similarity matrix using Gaussian Kernel. 2) Compute the first k eigenvectors of its Laplacian matrix to define a feature vector for each object. It embeds the data points, resources rk, in a low-dimensional space in which clusters are more obvious. 3) Apply a classical clustering algorithm, like K-means, to partition the resources into k classes.

B. Step 2: Graph of Resources
Let GR = (VR , ER) be undirected graph of resources with VR vertices and ER edges. GR is a graph of resources R that models the relationships among resources represented by the vertices VR linked by the edges ER. Two resources ri and rj are linked with their weighted edge (3) denoted by W(ri , rj).
The two resources ri and rj are respectively described by their set of tags   Each cluster resulting from the spectral clustering has its graph of resources. The annotated resources can be clustered in categories based on their conjointly attributed descriptive tags. The graphs of resources assemble annotated resources to formalize resources' similarity. Besides, the weighted edges W (ri, rj) considers the contextual features (contextual information: spatial, temporal and static) that conjointly describe the two resources to enhance their relatedness. Therefore, the recommender system will explore the items-items similarities among the graphs of resources.

C. Step 3: Graph of Tags
Tags cleansing and analyzing: The uncontrolled vocabulary holds a variety and huge volume of extracted tags. The proposed approach relies on cleaning up and selecting relevant tags (e.g. removing all special characters and establish a blacklist of keywords).
The folksonomy contains inconsistent tags that can be solved by retrieving the most relevant tags. The extraction of tags is done according to the frequency of their appearance (removing low frequent tags), the degree of frequency [14] of each tag ti (4), denoted by DF(rk, ti) where the resource rk  R is described by a set of tags T. The Frequency of a tag ti considers both of frequencies: Frequency depending on users FU(rk, ti) that represents the quotient of number of users who attribute the tag ti to describe the resource rk divided by the total number of users in the corpus U; Frequency depending on the tag DT(rk, ti) that represents the quotient of the number of times the tag ti describes the resource rk divided by the total number of tags T.
Building the graph of tag: The graph of tags GT= (VT, ET) is drawn up by the weighted edges W(ti, tj) relating two tags ti and tj. The tags represent the nodes, or vertices VT, of the undirected graph GT linked together by edges ET. The weight edge W(ti, tj) identifies the semantic relationships among tags (5). For instance, two tags ti and tj describing a resource rk may hold the same meaning but with different degree of specification. Their weighted edge W(ti, tj) scales how strongly the tags ti and tj are semantically related regarding their jointly usage of users' annotation and resources' description.

D. Step 4: Recommendation Algorithm
The recommender system will take advantage of emergent graphs of resources and tags (see Algorithm 1 Folksonomy Graph-based recommendations). The graph of resource GR illustrates the intensity of similarity among resources. Once the algorithm selects the resources previously rated by a user, it will recommend their semantically related resources. The relatedness of two resources ri and rj is computed with the weighted edge W (ri, rj). For a resources r, the Folksonomy graphs based recommendation algorithm suggests its top k related resources. The graph of tags GT extends the recommendation of resources by scrolling through the semantic relationships between tags describing the resources. Hence, the graph of tags is useful to describe the relationship between resources annotated with related tags. Consequently, the recommender system will explore this tags-knowledge graph to further extract and recommend related resources having similar descriptive tags.

IV. EVALUATION AND RESULTS
To evaluate the proposed approach, we used the Goodbooks-10k [6] dataset for book recommendations. The dataset provides ten thousand books tagged and rated by users. The books are described by their metadata like isbn, authors, year and title. Besides, the books are rated (from 1 to 5) and tagged by users using 34252 tags. We divided the dataset of books into 2 clusters. The resulted 2 groups of books are coherently clustered by using spectral clustering rather than using k-means clustering method (see Fig. 2 and Fig. 3). The spectral clustering   The evaluation of the books recommendations is performed using the automatic or offline evaluation that considers the previously rated books (rating  3) as the ground truth, true positive Tp, interpreted relevant to the user. That is, evaluating how closely the recommendations match the actual preferences of the user. For each user, we select his/her previously rated book to generate automatic books recommendations based on folksonomy graph-based recommendations algorithm. The accuracy metrics, namely precision P, recall R and F1-measure F (6), are calculated from the number of books that are either relevant or not and either recommended or not. Four possible outcomes are shown in the confusion matrix (see Table I).
The experimental evaluation (see Table II, Fig. 4) contains the results of the three metrics evaluating the proposed approach recommendations presented to 10 active users randomly selected. The accuracy metrics evaluate whether the folksonomy Graph-based recommendations algorithm can properly predict the relevant books that were previously well-rated by the users. The high precision results indicate that almost all the recommendations are indeed relevant to the users.
The proposed approach of folksonomy graphs based CARS is compared to the hybrid based RS that uses CB and CF filtering [15]. The algorithm of hybrid-based RS recommends books with similar content to the 10 active users. The recommendations of books are based also on their similarity with users' profiles using collaborative filtering. The precision-recall curve (see Fig. 5) shows that the precision and the recall of the proposed approach "folksonomy graphs based CARS" are higher than the hybrid based RS.

V. CONCLUSION AND PERSPECTIVES
Today's search engines guarantee an availability of published content on the web. Yet, the social information systems exponentially enhance the growth of the web with all kind of resources. This abundance of shared resources is mostly unorganized, thus making it tough for users to do their searching and exploring experiences. The arrival of the social web has enabled users to annotate the shared resources with their own tags, which creates a collaborative classification or folksonomy. The proposed approach aims to explore the contextual information coming from the application domain as well as analyzing the folksonomy relationship to generate graphs of resources and tags which create the ground of knowledge of the recommender system. The recommender system's purpose is to help users in finding and discovering items of their interest. The dominant recommendation challenge to predict the users' preferences has tremendously advanced research. This paper describes a folksonomy graphs based context-aware recommender system of resources.
The Goodbooks-10k dataset has been conducted to evaluate the accuracy and effectiveness of the proposed approach for book recommendations. The experimental evaluation has provided higher accuracy measures attesting the relevancy of the proposed folksonomy graphs based CARS algorithm compared to hybrid based RS. The future works will extend the experiment to the online evaluation. Future perspectives will focus on integrating additional contextual information to improve the description of resources, also covering the graph theory and network analysis for generating and adjusting the graph of resources to enhance their recommendations.