Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions

Daniel Asante Otchere; Tarek Omar Arbi Ganat; Jude Oghenerurie Ojero; Bennet Nii Tackie-Otoo; Mohamed Yassir Taki

doi:10.1016/j.petrol.2021.109244

Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions

Daniel Asante Otchere^*, Tarek Omar Arbi Ganat^*, Jude Oghenerurie Ojero, Bennet Nii Tackie-Otoo, Mohamed Yassir Taki

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

66 Citations (Scopus)

Abstract

Feature Selection, a critical data preprocessing step in machine learning, is an effective way in removing irrelevant variables, thus reducing the dimensionality of input features. Removing uninformative or, even worse, misinformative input columns helps train a machine learning model on a more generalised data with better performances on new and unseen data. In this paper, eight feature selection techniques paired with the gradient boosting regressor model were evaluated based on the statistical comparison of their prediction errors and computational efficiency in characterising a shallow marine reservoir. Analysis of the results shows that the best technique in selecting relevant logs for permeability, porosity and water saturation prediction was the Random Forest, SelectKBest and Lasso regularisation methods, respectively. These techniques did not only reduce the features of the high dimensional dataset but also achieved low prediction errors based on MAE and RMSE and improved computational efficiency. This indicates that the Random Forest, SelectKBest, and Lasso regularisation can identify the best input features for permeability, porosity and water saturation predictions, respectively.

Original language	English
Article number	109244
Journal	Journal of Petroleum Science and Engineering
Volume	208
DOIs	https://doi.org/10.1016/j.petrol.2021.109244
Publication status	Published - Jan 1 2022
Externally published	Yes

Keywords

Decision tree algorithm
Dimensionality reduction techniques
Ensemble machine learning
Feature selection techniques
Reservoir characterisation

ASJC Scopus subject areas

Fuel Technology
Geotechnical Engineering and Engineering Geology

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1016/j.petrol.2021.109244

Cite this

Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions. / Otchere, Daniel Asante; Ganat, Tarek Omar Arbi; Ojero, Jude Oghenerurie et al.
In: Journal of Petroleum Science and Engineering, Vol. 208, 109244, 01.01.2022.

Research output: Contribution to journal › Article › peer-review

@article{fc44ef13af9443cd9e92b7107122a7f0,

title = "Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions",

abstract = "Feature Selection, a critical data preprocessing step in machine learning, is an effective way in removing irrelevant variables, thus reducing the dimensionality of input features. Removing uninformative or, even worse, misinformative input columns helps train a machine learning model on a more generalised data with better performances on new and unseen data. In this paper, eight feature selection techniques paired with the gradient boosting regressor model were evaluated based on the statistical comparison of their prediction errors and computational efficiency in characterising a shallow marine reservoir. Analysis of the results shows that the best technique in selecting relevant logs for permeability, porosity and water saturation prediction was the Random Forest, SelectKBest and Lasso regularisation methods, respectively. These techniques did not only reduce the features of the high dimensional dataset but also achieved low prediction errors based on MAE and RMSE and improved computational efficiency. This indicates that the Random Forest, SelectKBest, and Lasso regularisation can identify the best input features for permeability, porosity and water saturation predictions, respectively.",

keywords = "Decision tree algorithm, Dimensionality reduction techniques, Ensemble machine learning, Feature selection techniques, Reservoir characterisation",

author = "Otchere, {Daniel Asante} and Ganat, {Tarek Omar Arbi} and Ojero, {Jude Oghenerurie} and Tackie-Otoo, {Bennet Nii} and Taki, {Mohamed Yassir}",

note = "Publisher Copyright: {\textcopyright} 2021 Elsevier B.V.",

year = "2022",

month = jan,

day = "1",

doi = "10.1016/j.petrol.2021.109244",

language = "English",

volume = "208",

journal = "Journal of Petroleum Science and Engineering",

issn = "0920-4105",

publisher = "Elsevier",

}

TY - JOUR

T1 - Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions

AU - Otchere, Daniel Asante

AU - Ganat, Tarek Omar Arbi

AU - Ojero, Jude Oghenerurie

AU - Tackie-Otoo, Bennet Nii

AU - Taki, Mohamed Yassir

PY - 2022/1/1

Y1 - 2022/1/1

N2 - Feature Selection, a critical data preprocessing step in machine learning, is an effective way in removing irrelevant variables, thus reducing the dimensionality of input features. Removing uninformative or, even worse, misinformative input columns helps train a machine learning model on a more generalised data with better performances on new and unseen data. In this paper, eight feature selection techniques paired with the gradient boosting regressor model were evaluated based on the statistical comparison of their prediction errors and computational efficiency in characterising a shallow marine reservoir. Analysis of the results shows that the best technique in selecting relevant logs for permeability, porosity and water saturation prediction was the Random Forest, SelectKBest and Lasso regularisation methods, respectively. These techniques did not only reduce the features of the high dimensional dataset but also achieved low prediction errors based on MAE and RMSE and improved computational efficiency. This indicates that the Random Forest, SelectKBest, and Lasso regularisation can identify the best input features for permeability, porosity and water saturation predictions, respectively.

AB - Feature Selection, a critical data preprocessing step in machine learning, is an effective way in removing irrelevant variables, thus reducing the dimensionality of input features. Removing uninformative or, even worse, misinformative input columns helps train a machine learning model on a more generalised data with better performances on new and unseen data. In this paper, eight feature selection techniques paired with the gradient boosting regressor model were evaluated based on the statistical comparison of their prediction errors and computational efficiency in characterising a shallow marine reservoir. Analysis of the results shows that the best technique in selecting relevant logs for permeability, porosity and water saturation prediction was the Random Forest, SelectKBest and Lasso regularisation methods, respectively. These techniques did not only reduce the features of the high dimensional dataset but also achieved low prediction errors based on MAE and RMSE and improved computational efficiency. This indicates that the Random Forest, SelectKBest, and Lasso regularisation can identify the best input features for permeability, porosity and water saturation predictions, respectively.

KW - Decision tree algorithm

KW - Dimensionality reduction techniques

KW - Ensemble machine learning

KW - Feature selection techniques

KW - Reservoir characterisation

UR - http://www.scopus.com/inward/record.url?scp=85113218886&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85113218886&partnerID=8YFLogxK

U2 - 10.1016/j.petrol.2021.109244

DO - 10.1016/j.petrol.2021.109244

M3 - Article

AN - SCOPUS:85113218886

SN - 0920-4105

VL - 208

JO - Journal of Petroleum Science and Engineering

JF - Journal of Petroleum Science and Engineering

M1 - 109244

ER -

Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions

Abstract

Keywords

ASJC Scopus subject areas

UN SDGs

Access to Document

Other files and links

Cite this