Machine learning Interview Questions (Part-4)

Differentiate between local optimization and global optimization

Optimization is a process of assigning a set of inputs to an objective function that can extract the best possible output from the objective function. Local optimization tries to find extrema points, i.e., maxima or minima of the objective function in some specific or predefined region of the input space.

Global optimization uses the entire input space to find the extrema (maxima or minima) for the objective function. If the objective function takes more than one global optima, then the optimization problem is called multimodal optimization.

In the figure $x_1$ is the local extremum as $x_1$ uses only the negative region of input space to decide the minimum. On the other hand, $x_2$ uses the entire input space to find the minimum and hence it is a global extremum and the process is global optimization.

What is inductive bias in machine learning? Explain with some examples

Inductive bias refers to a set of assumptions followed by the learning algorithm in order to predict the output from the input which the algorithm has not encountered earlier. Inductive bias plays a crucial role in the capacity of a machine learning model to generalize its inferences from the training data (observations) on unseen data. A weak inductive bias causes the model to approach local optima whereas strong inductive bias leads the model to global optimum.

Examples

1. Assumption that hyperplane separated negative class from positive class in support vector machines
2. Assumption that output varies linearly with input when we apply linear regression

What is meant by instance-based learning?

Instance based learning is a procedure in machine learning in which the algorithm builds hypotheses from training instances. These training instances will be stored in memory. Each time the algorithm encounters a new query point, comparison of the new problem with these training instances will be performed instead of explicit generalization. Instance based learning is also called memory-based learning. Decision tree, K-nearest neighbor etc. are examples of instance-based learning.

The advantage of instance-based learning is its adaptability to new data. Also, we can use small approximations to assign target functions instead of using complete instance sets. The disadvantage with instance-based learning is each query point demands the creation of a new local model which in turn increases the computational complexity.

State Bayes’ theorem and explain its significance in data science

Bayes’ theorem describes how the probability of occurrence of an event is related to a condition. Bayes theorem deals with conditional probability of an event, means probability of an event depends on the previous events already occurred,
The formula for Bayes’ theorem is,
$$P(\frac{A}{B}) = \frac{P(\frac{B}{A}) \times P(A)}{P(B)}$$
Here,
$P(\frac{A}{B})$ is called posterior, means it measures how likely $A$ happens given that $B$ has happened
$P(\frac{B}{A})$ is called likelihood means how likely $B$ happens given that $A$ has happened already
$P(A)$ is called prior as it is a measure of how likely $A$ happens
$P(B)$ is called marginalization and it describes how likely $B$ happens

Bayes theorem is used in machine learning for classification purposes in terms of Naive Bayes classifier. Also, Bayes theorem serves in the development of Bayesian neural networks in deep learning.

What is Naive Bayes?

Naive Bayes is a classification algorithm in supervised machine learning. It makes predictions on the basis of probability using Bayes’ theorem. Spam filtration, article classification etc. are some of the applications of Naive Bayes classifier. The term “Naïve” indicates the occurrence of an event is independent of occurrence of another event, meaning each occurrence is Naïve. Bayes in Naïve Bayes conveys it is based on Bayes’ theorem.

Explain how hyperparameters are different from parameters in machine learning?. What is the significance of hyperparameters?

Parameters are variables in machine learning which is internal to the model. The model parameters will be calculated from the training data by the model by self-learning process. Coefficients of independent variables or weights in linear regression model and support vector machines are examples of parameters in the model.

The parameters which are explicitly defined by the user to control the learning process in a model are called hyperparameters. Hyperparameters are external to the model. Hyperparameters are essential to optimize the model and are independent of the data set. The value of hyperparameter can be extracted by the process of hyperparameter tuning. $K$ in $K$-nearest neighbors’ algorithm and learning rate in neural networks are examples of hyperparameters.

Explain Eigenvalues and Eigenvectors

Eigen is a German word which means characteristic. So, eigenvectors and eigenvalues are established to express the characteristics of a matrix. In simple words eigenvectors are those vectors which are unaffected by a transformation. Eigenvalues quantify the amount by which the eigenvector is stretched.

Mathematically, the eigenvector $v$ and the eigenvalue $\lambda$ involved in a matrix transformation operation of matrix $\hat{A}$ is,

$$\hat{A} v = \lambda v$$

Explain the significance of p-value

In machine learning, the null hypothesis is an assumption which claims there is no relation between two types of features to be analyzed. P-value measures under the given condition that the null hypothesis is true, how likely it is to get a particular result. 0.05 is the reference level generally followed to understand p-value. If the p-value is less than 0.05, it is considered as a low value of p and it indicates our assumption that the null hypothesis is true is false. In such a circumstance we reject the null hypothesis and consider an alternate hypothesis.

How would you understand the optimum number of clusters in a clustering algorithm? Explain any one of the methods

The optimum number of clusters can be defined from the methods such as the Elbow method, Average Silhouette method, Gap statistic method etc.

The procedure to calculate optimum number of clusters using Elbow method in K-means cluster algorithm is as follows,

1. Compute K-means clustering by varying the value of $k$, for instance $k$ = 1 to 10
2. Calculate the value of Within clusters Sum of Squares (WSS) for each $k$
3. Plot WSS with respect to the number of clusters in each $k$
4. The region of bend in the plot is considered as the optimum number of clusters