TY - JOUR
T1 - Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions
AU - Otchere, Daniel Asante
AU - Ganat, Tarek Omar Arbi
AU - Ojero, Jude Oghenerurie
AU - Tackie-Otoo, Bennet Nii
AU - Taki, Mohamed Yassir
N1 - Funding Information:
The authors express their sincere appreciation to University Teknologi Petronas and the Centre of Research in Enhanced Oil recovery for financially supporting this work through YUTP grant ( 015LCO-105 ).
Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - Feature Selection, a critical data preprocessing step in machine learning, is an effective way in removing irrelevant variables, thus reducing the dimensionality of input features. Removing uninformative or, even worse, misinformative input columns helps train a machine learning model on a more generalised data with better performances on new and unseen data. In this paper, eight feature selection techniques paired with the gradient boosting regressor model were evaluated based on the statistical comparison of their prediction errors and computational efficiency in characterising a shallow marine reservoir. Analysis of the results shows that the best technique in selecting relevant logs for permeability, porosity and water saturation prediction was the Random Forest, SelectKBest and Lasso regularisation methods, respectively. These techniques did not only reduce the features of the high dimensional dataset but also achieved low prediction errors based on MAE and RMSE and improved computational efficiency. This indicates that the Random Forest, SelectKBest, and Lasso regularisation can identify the best input features for permeability, porosity and water saturation predictions, respectively.
AB - Feature Selection, a critical data preprocessing step in machine learning, is an effective way in removing irrelevant variables, thus reducing the dimensionality of input features. Removing uninformative or, even worse, misinformative input columns helps train a machine learning model on a more generalised data with better performances on new and unseen data. In this paper, eight feature selection techniques paired with the gradient boosting regressor model were evaluated based on the statistical comparison of their prediction errors and computational efficiency in characterising a shallow marine reservoir. Analysis of the results shows that the best technique in selecting relevant logs for permeability, porosity and water saturation prediction was the Random Forest, SelectKBest and Lasso regularisation methods, respectively. These techniques did not only reduce the features of the high dimensional dataset but also achieved low prediction errors based on MAE and RMSE and improved computational efficiency. This indicates that the Random Forest, SelectKBest, and Lasso regularisation can identify the best input features for permeability, porosity and water saturation predictions, respectively.
KW - Decision tree algorithm
KW - Dimensionality reduction techniques
KW - Ensemble machine learning
KW - Feature selection techniques
KW - Reservoir characterisation
UR - http://www.scopus.com/inward/record.url?scp=85113218886&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113218886&partnerID=8YFLogxK
U2 - 10.1016/j.petrol.2021.109244
DO - 10.1016/j.petrol.2021.109244
M3 - Article
AN - SCOPUS:85113218886
SN - 0920-4105
VL - 208
JO - Journal of Petroleum Science and Engineering
JF - Journal of Petroleum Science and Engineering
M1 - 109244
ER -