Implementation of Data Mining Analysis to Determine the Tuna Fishing Zone Using DBSCAN Algorithm

The aim of this study is to map the tuna fishing zones based on the daily fish catch data from the Hindian Ocean. With the study, it is expected to deliver a potential tuna fishing zones mapping, where it is based on the number of catch along with its spatial data. The study utilized a data mining approach with DBSCAN algorithm as the method to cluster the data. The study yields information that the Bigeye tuna is dominated the catch in the west monsoon, while Yellowfin tuna dominated the catch in the east monsoon. Based on the trial using the DBSCAN algorithm, we know that the optimal Eps and MinPts value are 1.5 and 5 respectively to generate a convergence cluster.

catches will be processed to determine spatial data on tuna fishing areas using DBSCAN algorithm and using rapidminer as a tool for program execution. The contribution of this study will be used for further study in many ways, one is to deliver a potential tuna fishing zones mapping.

A. Data Mining
Data mining is a branch of science that combines databases, statistics, artificial intelligence and machine learning [5]. Examples of cases in data mining such as, search for names that are most commonly used in the US state or grouping documents from search results with search engines based on the context [6]. The ultimate goal of data mining is to obtain important information from raw data. The first stage of data extraction is data input, then proceed to the second stage, namely preprocessing, which includes the process of feature selection, dimensionality reduction, and normalization. The purpose of preprocessing is to prepare input data before the data mining process. Then in the third stage there is a data mining process in which there are four core, namely predictive modeling, association analysis, cluster analysis, and anomaly detection. At the last stage there is postprocessing which is the result of data mining [7].

B. Clustering
Grouping a number of data or objects into clusters (groups) so that each in the cluster will contain data that is as close as possible and different from objects in other clusters [8]. The grouping of tuna catch data that we can process according to what we want is that it can determine the tuna catchment area itself. The cluster itself is Tuna so that in a broad shell with tuna data, we can determine the catch area with the Clustering method.

C. Spatial Analysis
Spatial analysis is an analysis that is limited by several factors, such as space, communication and transformation, spatial data shows the position, size and possible topological relationships (shape and layout) of objects on earth [9]. Spatial analysis is also finding in relationships and characteristics that may exist implicitly in spatial databases [10].
Because of the large amount of spatial data that can be obtained from satellite imagery, medical devices, video cameras, etc., it is expensive and it is often unrealistic for users to examine spatial data in detail. Spatial Analysis aims to automate the process of discovering that knowledge.
Thus, it plays an important role in: a) Extract in spatial patterns and features Implementation of Data Mining Analysis to Determine the Tuna Fishing Zone Using DBSCAN Algorithm

Muhammad Ramadhani and Devi Fitrianah
International Journal of Machine Learning and Computing, Vol. 9, No. 5, October 2019 b) Capture the intrinsic relationship between spatial and non-spatial data c) Presents data order in a concise manner and at a higher conceptual level d) Helps to rearrange spatial database to accommodate semantic data, and to achieve better performance [11]. The study uses spatial analysis with captured data using satellite imagery and has so much data that it needs very good data management to produce the latest data or information.
D. Density Based Spatial Clustering of Application with Noise (DBSCAN) DBSCAN algorithm is one of the algorithms used for classification or grouping of data. Each object from a radius area (cluster) must contain at least a minimum number of data. All objects that are not included in the cluster are considered as noise [12]. Whereas in this study it is very appropriate to use DBSCAN, because this study determines the spatial area of fish catches with a fairly wide range. DBSCAN can help record the spatial area of tuna catches because DBSCAN can determine a spatial data using its algorithm. According to Ester in the journal said that DBSCAN is very efficient for large-scale spatial databases [13]. DBSCAN algorithm has 2 parameters; a) Eps: minimum distance between two points. This means that if the distance between two points is lower or equal to this value (eps), these points are considered neighbors. b) MinPoints: minimum number of points to form solid regions. For example, if we set the MinPoints parameter as 5, then we need at least 5 points to form a solid region. c) Density Reachable: An object p is the density reachable of object q with respect to and MinPts in a set of objects D if there is a chain of objects p1, p2,…,pn, where p1 = q and pn = p where pi+1 density reachable directly from pi with respect to and MinPts, for 1 £ i £ n, pi member of D [14]. The parameters above are parameters that must exist in determining the spatial area with DBSCAN. The above parameters that will be cluster forming parameters in the data that you want to process [15].

E. Related Works Regarding To Density Based Spatial Clustering of Application With Noise (DBSCAN) and Capture Fisheries
There are several studies discussing DBSCAN implementation, in the A. R. Ajiboye, A. G. Akintola, and A. O. Ameen's journal [16] discussing Anomaly Detection using DBSCAN implementation in RapidMiner Applications, the journal discusses the whole starting from the dataset, modeling and cluster results. The Eps and MinPts values needed to get the best results in the journal use Eps: 1 and MinPts: 5 values and produce 2 solid clusters and 1 cluster noise.
The next study related to the determination of tuna fishing areas is D. Fitrianah, H. Fahmi, A. N. Hidayanto, and A. M. Arymurthy [17] who also discussed the determination of tuna fishing areas, but in the journal discusses about Potential Fishing Zone's and in this study discusses the Spatial Areas of Tuna Fishing. It almost looks similar, but in this study discusses Spatial, while in the journal, the study discusses Temporal although there are several similar stages but the results of this study are very different of each other.

III. METHODOLOGY
In this section, the author will explain some parts in the methodology stage. As in Fig. 1

A. Data Collection
The author obtained data on tuna fish catches at sea around 0.05°-21.15°S and 95°-139°E of Bali island with the results of collecting data on tuna catches as much as 35,000 data on tuna catches from 1978-1991. This data is obtained from PT. Perikanan Nasional in Indonesia. The data is also used in study of Tuna Potential Fishing Zones by D. Fitrianah, A. N. Hidayanto, J. L. Gaol, H. Fahmi, and A. M. Arymurthy [18].

B. Pre-Processing Data
The data that has been obtained has to go through the Pre-processing process first to get the best value when the program execution process takes place. Data cleaning and Reduction will be carried out in this phase to improve some data that is damaged. The data received is not good because there is a lot of data that is empty and attributes that are not needed. Therefore, the results of the Pre-processing of this International Journal of Machine Learning and Computing, Vol. 9, No. 5, October 2019 data will be shown in Table II.
The author has to preprocessing the data manually in the dataset. There are 35,000 data that must be checked one by one. This preprocessing only uses 2 methods: a) Cleaning: Cleaning data is used to clean up missing values so the data can be processed properly because if the data has a missing value, the data cannot be processed [19]. Missing values can be deleted or replaced with a value of 0. In this tuna catch data, almost 1,000 datas are lost and this takes a long time to complete, considering the preprocessing process is done manually. b) Reduction: In this step the author chooses, and focuses attention on simplification, abstraction, and transformation of the rough data obtained [20]. This technique is used to simplify some attributes because there are several attributes that are not used in this study. Like the attribute type of fish, because the author only uses 3 types of fish, namely: Yellowfin, Bigeye Tuna, and Albacore, while other attributes must be removed in order to facilitate data to be processed. In addition to the type of fish, the author also requires the attributes of longitude and latitude as variables in determining the tuna catchment area.  Table II. is the result of data that is in accordance with the study needs. The pkID attribute is the ID of the data row. Whereas the attribute date is the determinant of the direction of the west monsoon and east monsoon, while only the last 3 years have been taken for this study.

C. Using Model (DBSCAN)
Data that has been checked and preprocessed, will enter the execution stage to find out the results of the DBSCAN algorithm. As shown in Fig. 2. this modeling uses RapidMiner tools to help process the data that has been collected. In this section, the author will provide an explanation of the modeling used in RapidMiner with the DBSCAN Algorithm. a) Read Excel Operator: This operator is used to import dataset files from Excel to RapidMiner to be processed properly using RapidMiner and in this operator we can also specify which attributes we will use later to execute. b) Clustering (DBSCAN) Operator: This operator performs clustering with DBSCAN. DBSCAN (for density-based spatial clustering of applications with noise) is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes. In this operation, the author will be asked to provide the MinPts value. and Eps value. Then the author will determine the value of Eps: 1.5 and MinPts: 5. From the value of Eps and MinPts this will form a cluster of tuna fish. c) Fillter Example Operator: This Operator selects which Examples of an ExampleSet are kept and which Examples are removed. In this Fillter Example operator, the author will use it to clean up the noises in the cluster. By removing cluster_0 and cluster_1 so as not to cover other clusters that have been formed.

D. Cluster Analysis
This will be the final step before reaching the results, at this stage the author will give the results of clustering of several types of fish (Yellowfin, Albacore, and Bigeye Tuna) which will be associated with the direction of the west monsoon from October -April and east monsoon from April -October 1989-1991.    Learning and Computing, Vol. 9, No. 5, October 2019 cluster was also dominated by Bigeye tuna, but the difference was that there were 3 clusters dominated by Yellowfin tuna and 1 cluster by Albacore tuna. This indicates that 2 seasons this year the movement of fish is not so significant and only changed a little.     This indicates that the movement of the season in that year does not have much impact on the cluster results but the catch data produced is very different considering that the West Monsoon data is not as large as the East Monsoon.
The two seasons in 1991 had a very large difference, seen in Fig. 5 and Fig. 8 As we can see in Fig. 3 until Fig. 5 of West Monsoon and Fig. 6 until Fig. 8 of East Monsoon, in the red circle that the author has made, there are some fish points that have not changed even though the year and the season has changed. Almost every chart has a red circle which indicates that the fish point has not changed.
Only in Fig. 4 which do not have a fish point remain due to the number of clusters or the amount of fish catch that is not so much in that season and year. This indicates that there are several habitats of tuna fish that do have a habitat at a fixed longitude and latitude.

IV. RESULT AND DISCUSSION
This is the final stage, where at this stage the author will be talking about the best Eps and MinPts value to get the best cluster and the reason why the author uses the Eps = 1.5 and Minpts = 5 values. The Eps and Minpts values that I have specified are the most optimal values, because those values will form several clusters with several solid points as shown in Fig. 3 until Fig. 8. The author will show some values of Eps and Minpts apart from those specified.   Fig. 9 is the result of DBSCAN's spatial cluster analysis using several Eps and MinPts values. The comparison above does only change the Eps value, because the thing that really influences the formation of clusters is the value of the Eps itself. Because the Eps value is used to form a cluster from the specified radius point [21]. In Fig. 10. It is indeed seen forming several clusters, but still too much noise and not so form a dense cluster so that it is not optimal enough if the Eps value used is 2.0. Therefore, the value of Eps = 1.5 is a very optimal value used to form a dense cluster without causing much noise.    After several attempts to create a group of spatial areas of tuna fishing by determining several MinPts and Eps values, the optimal value in forming a spatial cluster of tuna fishing catches was obtained. As seen in Table IV and Table V, tuna will continue to move along with the progress of the season and year. We can see that in Table IV. West Monsoon, spatial clusters are almost dominated by Bigeye Tuna, while in Table V. East Monsoon, spatial groups are almost dominated by Yellowfin tuna. Thus in West Monsoon will be dominated by Bigeye tuna in these waters and East Monsoon which dominates this waters is Yellowfin tuna. That is why tuna is one type of fish that moves according to the season and direction of the wind.

V. CONCLUSION
Using the DBSCAN algorithm, spatial clustering related to fishing grounds can provide excellent results in creating spatial grouping in tuna fishing areas. This method must adjust the Eps and MinPts values to provide optimal results in cluster formation based on the pattern of the fish location itself. The optimal Eps and MinPts is 1.5 and 5 as in Fig. 3 until Fig. 8. The experiment was carried out several times to obtain the optimal Eps and MinPts values, as in Table III. DBSCAN forms a spatial cluster using optimal Eps and MinPts values. The point of cluster density also affects the results of spatial formation, as in Fig. 9 and Fig. 10 which do not form clusters because the value of Eps and MinPts is not optimal. If the MinPts value given is getting bigger, the cluster formed will be even smaller while if the Eps value gets bigger, the cluster formed will be less.
As we can see in Table IV and Table V, that in the west monsoon season it is dominated by Bigeye tuna, while in the east monsoon season it is dominated by Yellowfin tuna. This data is obtained by analyzing the results of the cluster that has been formed.