Viet-Ha Nhu, Pijush Samui, Deepak Kumar, Anshuman Singh, Nhat-Duc Hoang, Dieu Tien Bui
Heterogeneous nature of soil consists of various chemical and physical attributes that make the prediction of soil parameters very tedious and challenging. Moreover, it becomes more difficult when we have more number of variables. This study investigates the feasibility of principal component analysis as dimensionality reduction technique to select the input variables in terms of principal components (PCs), which helps in reducing the complexity and multicollinearity problem. The soil attributes, namely depth of the sample, sand percentage, silt percentage, clay percentage, moisture content, dry density, wet density, void ratio, liquid limit, plastic limit, liquid index, and plastic index, have been employed as influencing factors to estimate the coefficient of compression of soil. Furthermore, the extracted variance-based PCs were used as predictor to build the minimax probability machine regression (MPMR), multivariate adaptive regression splines (MARS), and genetic programming regression (GPR). The predictive accuracy of the models has been assessed via five statistical fitness parameters. In the training phase, the PCA-MARS model has shown good outcomes in terms of fitness measurement parameters (RMSE= 0.004, r = 0.981 and NSE = 0.963). During testing phase, PCA-MARS has outperformed (RMSE= 0.006, r = 0.963 and NSE = 0.912) followed by PCA-GPR and PCA-MPMR. The finding of this research concludes that PCA-based MARS model can be used as new and reliable data-driven approach for estimation of soil parameters. Furthermore, this new tool can help to save the time and capital spent on estimation of different parameter of soil.