An LP-based hyperparameter optimization model for language modeling

Amir Hossein Akhavan Rahnama; Mehdi Toloo; Nezer Jacob Zaidenberg

doi:10.1007/s11227-018-2236-6

An LP-based hyperparameter optimization model for language modeling

Amir Hossein Akhavan Rahnama, Mehdi Toloo^*, Nezer Jacob Zaidenberg

^*المؤلف المقابل لهذا العمل

نتاج البحث: المساهمة في مجلة › Article › مراجعة النظراء

3 اقتباسات (Scopus)

ملخص

In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models’ hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrate that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.

اللغة الأصلية	English
الصفحات (من إلى)	2151-2160
عدد الصفحات	10
دورية	Journal of Supercomputing
مستوى الصوت	74
رقم الإصدار	5
المعرِّفات الرقمية للأشياء	https://doi.org/10.1007/s11227-018-2236-6
حالة النشر	Published - مايو 1 2018
منشور خارجيًا	نعم

ASJC Scopus subject areas

???subjectarea.asjc.1700.1712???
???subjectarea.asjc.2600.2614???
???subjectarea.asjc.1700.1710???
???subjectarea.asjc.1700.1708???

الوصول إلى المستند

10.1007/s11227-018-2236-6

الملفات والروابط الأخرى

قم بذكر هذا

@article{fbfd36bb94364121815cc773d20f8448,

title = "An LP-based hyperparameter optimization model for language modeling",

abstract = "In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models{\textquoteright} hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrate that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.",

keywords = "Hyperparameter optimization, Language model, Linear programming, Machine learning, Optimization, n-Grams",

author = "Rahnama, {Amir Hossein Akhavan} and Mehdi Toloo and Zaidenberg, {Nezer Jacob}",

note = "Publisher Copyright: {\textcopyright} 2018, Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2018",

month = may,

day = "1",

doi = "10.1007/s11227-018-2236-6",

language = "English",

volume = "74",

pages = "2151--2160",

journal = "Journal of Supercomputing",

issn = "0920-8542",

publisher = "Springer Netherlands",

number = "5",

}

TY - JOUR

T1 - An LP-based hyperparameter optimization model for language modeling

AU - Rahnama, Amir Hossein Akhavan

AU - Toloo, Mehdi

AU - Zaidenberg, Nezer Jacob

PY - 2018/5/1

Y1 - 2018/5/1

N2 - In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models’ hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrate that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.

AB - In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models’ hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrate that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.

KW - Hyperparameter optimization

KW - Language model

KW - Linear programming

KW - Machine learning

KW - Optimization

KW - n-Grams

UR - http://www.scopus.com/inward/record.url?scp=85040232951&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85040232951&partnerID=8YFLogxK

U2 - 10.1007/s11227-018-2236-6

DO - 10.1007/s11227-018-2236-6

M3 - Article

AN - SCOPUS:85040232951

SN - 0920-8542

VL - 74

SP - 2151

EP - 2160

JO - Journal of Supercomputing

JF - Journal of Supercomputing

IS - 5

ER -