Hyperparameters in Machine Learning

Introduction

Hyperparameters are the parameters that are explicitly defined by the user for the machine learning model. This helps the learning process of the machine learning algorithm. The prefix “hyper” in hyperparameter means top-level, indicating these are the controlling parameters to ensure model performance.

The user selects the value of the hyperparameter before the training stage of the algorithm and inputs the value during the learning process. The value of a hyperparameter cannot be changed during the training process. Hence change in hyperparameter value is a procedure done external to the model.

How do the model parameters differ from hyperparameters?

Parameters in a machine learning/deep learning model are the variables that are derived from the given data. This is done during the learning process to perform classification and prediction. Unlike hyperparameters, the model parameters are internal to the algorithm and vary with the nature of the data. Hence, we cannot control the value of model parameters. In essence, model parameters are what we figure out with the training process of machine learning algorithms.

In the equation of a line, $y=mx+C$, the value of the coefficient $m$ and $C$ are learned by the model based on the nature of the data set during the training stage. Here, $m$ and $C$ are examples of model parameters and hence cannot be controlled by the user.

However, the value of hyperparameters can be manipulated in an iterative way to obtain the optimized algorithm.

Examples of Hyperparameters

There are a set of hyperparameters specifically designed for each type of model,

The $K$ in K-nearest neighbors, number of branches in a decision tree, Train-test ratio in machine learning and deep learning models in general, number of clusters in the clustering algorithms, and learning rate, weights, number of epochs, etc. in deep neural networks are some of the examples of hyperparameters.

Hyperparameter Optimization

The process of finding the best hyperparameter to ensure model performance is called hyperparameter tuning or hyperparameter optimization.

Some of the hyperparameter optimization techniques are as follows:

Manual Search

As the name suggests, different combinations of hyperparameters values are tried manually by the trial-and-error method followed by the training and then select the values which give the best output. These hyperparameter values will be used at the test stage for the performance assessment of the model. The process to optimize the hyperparameter will be performed iteratively.

Grid search and Random Search

To avoid the manual labor associated with the manual search, we can import and use GridSearchCV or RandomSearchCV functions from the Scikit-Learn library. The search operations will be carried out by these library functions. Combinations of the given hyperparameter will be done by the algorithm and build models corresponding to each of the combinations. The performance of each model will be evaluated with cross-validation data. The aforementioned libraries return the model which performs well together with the score associated with the model.

The following code snippet shows how to perform grid-search hyperparameter optimization in scikit-learn library.

from sklearn.svm import SVC, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
hyper_parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
clf = SVC()
clf = GridSearchCV(clf, hyper_parameters)
clf.fit(iris.data, iris.target)
print(clf.best_params_)

The random search function picks up the hyperparameter values and their combinations randomly. This reduces the number of training stages and can sometimes overcome the time complexity associated with the Grid search method.

The following code snippet shows how to perform random-search hyperparameter optimization in scikit-learn library.

from sklearn.model_selection import RandomizedSearchCV
clf = RandomizedSearchCV(clf, hyper_parameters)
clf.fit(iris.data, iris.target)
print(clf.best_params_)

In addition to these techniques explained, there are more sophisticated and advanced methods for hyperparameter tuning or optimization such as Bayes search, automated hyperparameter tuning, artificial neural network tuning, etc.

Hyperparameter tuning in Deep Learning

The basic architecture of a deep neural network consists of an input layer, hidden layers, and an output layer. The hidden layer is built by several neurons.

dnn architecture with hyperparameters — Architecture of a three-layer neural network

The number of neurons in the hidden layers is a significant hyperparameter to optimize. The number neurons can be pre-determined by the user based on the application.

Activation function

The hidden layers have a parameter called activation function which decides how to calculate the output from the input. There are multiple types of activation functions with different formulae to handle the input data. The output from each layer will be passed to the next layer as input and this process continues till the final output is extracted.

Learning rate

An optimizer is assigned to manage the weights of the neuron in the deep neural networks and the learning rate to ensure the minimum loss. The learning rate and the weights assigned to the neurons are the hyperparameters associated with the optimizer. If we increase the learning rate, the model learns faster but it may fail to reach minimum loss. On the contrary, a lower learning rate impacts model performance though they reduce loss significantly. Hence, a trade-off between higher and lower learning rates is necessary and it can be achieved by tuning of the ‘learning rate’ hyperparameter.

Epochs

Another hyperparameter to be tuned in a neural network is the epochs. Epoch is a count of the total number of times the training data is passed through the neural network for modeling. The number of epochs is one when the data is passed in a forward and backward direction only once, the epoch number will be two if the data is passed two times through the network in this manner. The number of epochs has to be optimized to get an optimal model by avoiding overfitting and underfitting.

Batch size

Batch size is a hyperparameter when we sample a huge data set into batches and pass these batches to deep neural networks. Smaller batch size improves the speed of learning but may lead to overfitting and higher batch size helps to reduce overfitting, but it results in a slow learning process. So, we have to do hyperparameter tuning for batch size to ensure the best performance of the model.

In addition, simpler problems require a smaller number of layers, and complex problems require more layers. We can optimize the number of layers by considering it as a hyperparameter and by running it through a for-loop iteration.

Conclusion

The motive behind hyperparameter tuning is to improve the performance of the model. Each model has a specific set of hyperparameters. So, we have to understand the significance of choosing a hyperparameter in each context and on a different dataset with proper care and caution. Choosing the right hyperparameter and making the right combination decide the quality of hyperparameter tuning and in turn the model performance.