Early Cost Estimation in Customized Furniture Manufacturing Using Machine Learning

Accurate cost estimation at the early stage of a construction project is a key factor in the success of most projects. Many difficulties arise when estimating the cost during the early design stage in customized furniture manufacturing. It is important to estimate the product cost in the earlier manufacturing phase. The cost estimation is related to the prediction of the cost, which commonly includes calculation of the materials, labor, sales, overhead, and other costs. Historical data of the previously manufactured products can be used in the cost estimation process of the new products. In this paper, we propose an early cost estimation approach, which is based on machine learning techniques. The experimental investigation based on the real customized furniture manufacturing data is performed, results are presented, and insights are given.


I. INTRODUCTION
Customization of products is a challenging trend for the recent year. Competitive pressure, complex customer requirements, expectations of interested parties initiate additional requirements for manufacturers and products. This has particularly affected the manufacturing of customized furniture. Thereby, companies need decision support tools to ensure an immediate response to individual orders and a proper assessment of production procedures, costs, and timing. The task to estimate costs as precise and timely as possible has become critical in customized manufacturing [1].
More and more customers want to stand out and to order exclusive furniture. Customization changes pricing policy, as a specificity of customized furniture manufacturing, significantly influences the furniture cost. When considering the factors that influence the success of the results of manufacturing, it is evident that the cost is as essential as quality and functionality. The cost depends on furniture complexity (the furniture consists of many components, a part of them is complex), production volume (large volume decreases the cost of one item), design overhead (a new design needs additional designer time, or the existing design Manuscript received July 18, 2019; revised August 14, 2020. This work was supported by the European Regional Development Fund according to the supported activity 'Research Projects Implemented by World-class Researcher Groups' under Measure No. 01.2.2-LMT-K-718. All authors are with the Mykolas Romeris University, Ateities str. 20, LT-08303 Vilnius, Lithuania (e-mail: olga.kurasova@gmail.com).
can be simply customized), production peculiarities (much manual work, expensive machine tools are used). One more issue arises, when the cost has been estimated before manufacturing while the information about the actual costs is most limited, it is called early cost estimation. The accuracy of this estimation can strongly affect company profits: too low price will reduce profits, too high price can deter customers.
Different approaches have been developed for early cost estimation, however, they are missing in the furniture industry. Cost estimation is defined as the process of predicting the cost of a product before all stages of product development have been completed. As a rule, the accuracy of a cost estimation reflects the information currently available. Furniture manufacturing has a great variety of products and prices, from smallholders to large expensive furniture sets. Cost estimation of the product in the early design stage is useful for accelerating product entry to the market, reducing costs, and improving quality with a high level of competitiveness in the market [2]. It is very important for companies to properly and promptly react to changes in the growing complexity of customized manufacturing to be able to produce maximum precision in their pricing. Thus, machine learning-based approaches of cost estimation would be adapted for the furniture production domain. Machine learning is widely used in various fields of production [3]- [7] and can be an effective and accurate technique to estimate the cost of customized furniture. The objective of the research is to propose a machine learning-based approach to estimate the cost for customized furniture manufacturing in the early design phase.
The rest of the paper is organized as follows. A comprehensive review of related works is performed and presented in Section II. Section III is devoted to machine learning-based cost estimation. A use case analysis is presented in Section IV. Finally, Section V concludes the research.

II. COST ESTIMATION TECHNIQUES
related to a set of activities and materials before they have been executed [9], [10]. Firstly, it is needed for manufacturing enterprises for financial plans and strategies. Secondly, this is especially important for customized manufacturing, when customers want to know the product price in advance. This preliminary price should be as close to the final price as possible. To calculate the price, the cost drivers should be identified considering customer requirements. The process usually takes a lot of time and requires many resources. To attract customers, the time for cost estimation should be decreased.
A simple way of the furniture price calculation is as follows: = + , were is the price of the product, is the total cost of production, is the profit. The cost consists of direct costs and overhead costs. Direct cost includes direct material costs, direct labor cost, and other costs. Production costs, administrative costs, the cost of disposing belong to overhead costs.
Some important elements are related to cost estimation [11]. Firstly, the cost estimating technique requires extensive historical data including not only features that influence the cost, but also the costs of the previous products. Secondly, it is necessary to identify human and financial resources, how many people (cost estimators), and what their competence is required to estimate the cost, what financial support needs. It is also essential to know both company and customer expectations, outcome, and usage of the estimate. Finally, a schedule is also an important element, how much time is to collect the required data, how much time is needed for evaluation to complete the estimate, given the available resources and data.
In the literature, cost estimation techniques are divided into qualitative or quantitative [8] (Fig. 1). An idea of qualitative cost estimation techniques is based on a comparison of a new product with the previously manufactured products to identify their similarities. The similarities help to incorporate the past data into the new product, and there is no need for the cost estimate from scratch. Thus, using the cost estimator experience, the previous design and manufacturing data provide useful information to calculate a reliable cost estimate of a new product that is similar to the manufactured one according to past design. Qualitative cost estimation techniques can be categorized into intuitive and analogical techniques. The intuitive cost estimation techniques are grounded on the cost estimator experience [12]. Knowledge is used to generate cost estimates. The expert knowledge can be stored in the form of rules, decision trees, and judgments at a specific database to facilitate the decision-making process and prepare cost estimates for new products. When using analogical techniques (sometimes called top-down), the cost of a new product is estimated according to the cost of similar known products. Here, the similarity is commonly evaluated without any quantitative measure.
Quantitative cost estimation techniques are based on a detailed analysis of product design, its features, called cost drivers, and corresponding manufacturing processes instead of simply relying on the past data or knowledge of a cost estimator. Although these techniques are known to provide more accurate results, the usage is normally restricted to the final design phases due to the requirement of detailed product design. Quantitative techniques can be further categorized into parametric and analytical techniques [13].
In parametric techniques (sometimes called statistical), historical data and empirical examinations are evaluated to gain information on the causal link between product features and costs [13]. In other words, parametric cost estimates are a result of a cost estimating using statistical relationships between historical costs and other product characteristics. Regression analysis is widely used for cost estimation as a well-defined mathematical approach [14], [15]. Moreover, this analysis enables to explain the significance of a variable and the relationships between other variables. Artificial neural networks are also suitable for cost estimations. However, they require a large amount of historical data for accurately training.
Analytical cost techniques (sometimes called bottom-up estimating and engineering build-up) provide a detailed decomposition of costs, and the total cost is usually computed as a sum of such parts as labor, material, machine-related things multiplied by the time required to perform manufacturing tasks. Analytical approaches depict the relevant processes of product creation in detail, and derive the costs incurred, aggregating them properly [13]. The analytical approach requires decomposing a product into elementary units, operations, and activities that represent different resources consumed during the production cycle and expressing the cost as a summation of all these components [8].
Moreover, some analogical techniques can be assigned to quantitative one, when numerical similarity measure is used to identify the analogous product [16]. Analogy estimates are based on a comparison of the new product to similar ones. They can be applied when the new product is similar to a number of others that were carried out earlier. In this case, the total cost of the product is determined on the basis of accumulated experience. Cost estimation using analogy reasons from functional or geometrical similarity to a similar product considering the similarity measures describing the level of correspondence of the relevant characteristics [13].

Cost estimation techniques
Qualitative Quantitative Customized furniture manufacturing companies can be categorized into two groups. Companies of one group are responsible for the design, thus, all manufacturing process includes from design creation to final manufacturing. In these cases, the final cost depends on the complexity of the design, and the manufacturer and the customer can negotiate the price, reducing or complicating the design.
Other type companies obtain orders with a design sketch (quotation), and it is necessary to estimate the cost of the furniture. The cost estimation from sketch analysis would be accurate because sketches reveal the complexity of items that influences manufacturing cost. However, if we want to involve machine learning techniques, a problem arises due to historical data. A lot of various sketches need because an image recognition problem should be solved, and the training process requires many data/images to obtain accurate results. It is necessary to have many similar sketches to train image recognition algorithms accurately. Furthermore, it is another problem in customized furniture manufacturing.
We propose to solve the cost estimation problem by analyzing the numerical data of manufacturing. Different manufacturing companies use different tools for data storage as well as different methodologies for cost estimation. It would be appropriate to store the data in Enterprise Resource Planning (ERP) or other systems, however, a typical case, especially small companies, when the companies do not use ERPs or unable to exploit all the functionality. Thus, data extraction is also a relevant problem in this domain. Moreover, if data are of a large volume, we face the challenge to analyze them using machine learning techniques [17]. It is important to set a proper data set for cost estimation in customized furniture manufacturing. Some categories could be identified: 1) item measurement (length, height, width, weight, the volume of the bounding box); 2) material data (the material used for production, material cost); 3) operational data (operation list and the time required to complete the operation process); 4) labor data (much manual work, expensive machine tools are used); production time (the customer's requirement for production time); 5) batch size (more the same products reduce the cost of one product); 6) manufacturing complexity (a qualitative parameter indicative of the uniqueness of the item and complexity of the work). The cost estimation is a time-consuming problem for many furniture manufacturing companies, especially for companies that produced customized items. When a quotation for a new product is received, it needs to estimate if this product is new for the company. If the product has been manufactured before, the cost is known. However, if the ordered product is new, it is necessary to estimate the components of the cost such as material quantities and cost, operation and labor cost, packing cost. After that, overhead is added, and the quotation answer is ready for sending to the customer. This process is depicted in Fig. 2(a). The usage of machine learning can significantly shorten the time of this process. We propose to change the estimation of operations and labor as well as packing costs by machine learning-based estimation so that the final cost would be estimated only from material data (see Fig. 2(b)).  Various prediction approaches can be used to solve the cost estimation problem: 1) Linear regression. It is the simplest regression model, where the relationship between a dependent variable and one or more independent variables is linear. The advantage of this type of regression is that the obtained results are easily interpreted. Thus, the linear regression is widely used, despite the fact that there are more accurate prediction approaches. 2) Decision tree-based regression. There are plenty of methods that build regression in the form of a tree structure. The following methods are currently the most popular and give the most accurate prediction results: Decision tree, Random forest, Extra tree, AdaBoost, Gradient boosting. 3) K-neighbors regression. It a non-parametric method used for regression, where the input consists of the k closest training examples in the feature space, the output is the predicted value, which is obtained by averaging the values of k nearest neighbors. Because of its simplicity, k-neighbors-based methods are often used for big data analysis. 4) Artificial neural networks (ANN). Recently, this is a rapidly developing group of artificial intelligence techniques that trained by machine learning paradigms and is able to handle various complex data analysis problems. ANNs can be successfully employed as prediction approaches, too. In most cases, they outperform other methods, however, in order to achieve the most accurate results, a large amount of historical data for training is required. All the mentioned methods have their advantages and disadvantages; therefore, it makes sense to use them all and choose the best results as the final ones. Moreover, all algorithms are implemented in popular machine learning libraries (e.g., NumPy, scikit-learn, etc.), thus, their use does not cause any technical difficulties.

IV. USE CASE ANALYSIS
The real historical data for 1026 products provided by a Lithuanian furniture manufacturing company are used in the use case analysis and experimental investigation. The data gathered over the last five years include the real costs of these products. A set of products includes items of various sizes and complexity (from small pieces of furniture to large furniture kits). Thus, the data set consists of values of some features of products (Table I):  1) is a cost (prediction/dependent variable), 2) 1 , … , are operational times (exploratory/independent variables), 3) +1 , … , are quantitative material features (exploratory variables).
Commonly, solving data analysis problems requires to identify the most important features. Some feature selection approaches can be employed, e.g., the principal component analysis (PCA) [18], methods of univariate feature selection (e.g., SelectKBest), and others. Applying SelectKBest, five of the most important features (from Table I) have been identified: 'm 2 ', 'qty_parts,' 'm,' 'qty_unique_parts,' 'qty.' As we see, all these features are from a set +1 , … , and assigned to the material features.
In practical applications, the cost is estimated by analogous. Usually, it performs intuitively, when the person responsible for costing determines which products are similar. However, when machine learning techniques are applied, it is necessary to evaluate products according to mathematical metrics of similarity. Some similarity metrics can be used: Euclidean, standardized Euclidean, Cityblock (Manhattan), and Hamming distances.
The exploration process is as follows: 1) Similarities are computed by various metrics between all pair of items when the inputs are 1 , … , . 2) Similarities are also computed by the same metrics between all pairs of items when the inputs are (cost).

3) Correlation coefficients are computed between
similarities obtained at Steps 1 and 2. The high values of the correlation coefficients show that the items that are similar in similarity metrics will be similar in cost, too. The experimental investigation shows that the data are the most similar according to types of Minkowski distances (Cityblock, Euclidean, standardized Euclidean distance).
Solving the cost estimation problem, a relationship between and needs to be indicated. Thus, cost estimation can be formulated as a prediction problem. Statistic-based methods find this relationship via some mathematical functions. The other machine learning techniques create a more sophisticated model from historical data. In this investigation, the various prediction algorithms are used. During the training process, cross-validation with 10 folds is applied. The average values (avg) and standard deviations (std) of determination coefficient R 2 and root mean square error (RMSE) are presented in Table II and Fig.  3.
Two cases are analyzed: 1) when all features 1 , … , are used for prediction, 2) and five features which have been identified as the most important using the SelectKBest method. The experimental investigation shows that the highest value of R 2 reaches by Random forest algorithm (0.842) and Gradient boosting algorithm (0.84). Both algorithms are assigned to decision-based methods. As expected, ANNs did not show the best results as the training data set is not large. The results also demonstrate the fact that the reduction of the number of features affects accuracy slightly. Moreover, the prediction results even worsen when the number of features is reduced to five. Grid search strategies are used to search over specified hyper-parameter values for algorithms. In Fig. 4 and Fig. 5, the best results are depicted considering all features and only five, respectively, when the prediction problem is solved. The x-axis corresponds to true cost values, and the y-axis corresponds to the predicted cost values. The results demonstrate that, in the best case, R 2 reaches 0.977 for training data and 0.839 for testing data. Moreover, we see that the smaller costs are predicted more accurately than in the cases of higher costs. Thus, further research lines can be related to cost estimation, taking into consideration cost groups, machine learning-based clustering methods can be employed.

V. CONCLUSIONS
The paper has investigated an early cost estimation problem facing customized furniture manufacturing. Usually, the cost estimation is a complicated and time-consuming process due to the need to evaluate many components in the early design stage, when information is most limited. Moreover, a lot of human resources are required. Application of the machine learning techniques allows simplifying and accelerating this process by providing accurate and effective cost estimation. However, using machine learning techniques, it is crucial to have a proper set of historical data for the training process and to identify essential data features to obtain accurate results.
In the experimental investigation, the real historical data for 1026 products provided by a Lithuanian furniture manufacturing company are used. The study has shown that the cost of the customized furniture can be estimated quite precisely when only data on the quantities of materials are used. The use of machine learning techniques can reduce the time required to estimate the cost in the early design phase and to accelerate product entry to the market. He is now at Mykolas Romeris University as a principal researcher as well as at the Institute of Data Science and Digital Technologies of Vilnius University as a principal researcher and professor. Her research interests include data mining methods, optimization theory and applications, artificial intelligence, neural networks, visualization of multidimensional data, multiple criteria decision support, parallel computing, and image processing. She is the author of more than 70 scientific publications.
Virginijus Marcinkevičius got the PhD in computer science from Vytautas Magnus University. He is a senior researcher from Mykolas Romeris University. His research interest includes machine learning, artificial intelligence, cybersecurity, and natural language processing.
Viktor Medvedev received a doctoral degree in computer science (PhD) from Institute of Mathematics and Informatics jointly with Vilnius Gediminas Technical University in 2008. His research interests include artificial intelligence, visualization of multidimensional data, dimensionality reduction, neural networks, data mining and parallel computing.