The article discusses interview questions asked in many machine learning job roles

**Explain label encoding and one hot encoding, Explain how the dimensionality of a dataset is affected with these methods in machine learning**

Machine learning algorithms can understand only numbers. Often, the dataset consists of categorical and numerical features. We need to convert the categorical features to numerical features for the machines to understand. This process is called categorical encoding. Label encoding and one hot encoding are two popular methods in categorical encoding.

Label encoding is the process of converting each categorical feature to a unique integer form based on its alphabetic order. It doesn’t affect the dimensionality of data as each type of variable gets represented by a unique number.

For example:

Name | Age |

Ramu | 13 |

Sara | 15 |

Amit | 12 |

Riya | 16 |

Reference table

The above table after label encoding will be modified as below,

Name | Age |

1 | 13 |

3 | 15 |

0 | 12 |

2 | 16 |

In one hot encoding each unique feature in the categorical variable is assigned with a binary number and treated as a feature. This unique numerical feature is called one hot vector. One hot encoding increases the dimensionality of the data set as it creates a new variable for each type of features in the categorical variable. One hot encoding is useful in cases where the categorical features are less in number.

The reference table after one hot encoding will get modify as shown below,

0 | 1 | 2 | 3 | Age |

0 | 1 | 0 | 0 | 13 |

0 | 0 | 0 | 1 | 15 |

1 | 0 | 0 | 0 | 12 |

0 | 0 | 1 | 0 | 16 |

**What is Box-Cox transformation?**

In machine learning, the target variable is the variable we try to estimate as the output. Box-Cox transformation is a statistical method to transform the target variable in a way that our data follows normal distribution. Box-Cox transformation involves the transformation of any non-linear or power law distribution to normal distribution. Converting to normal distribution makes it easy to analyse the data based on the central tendency of data distribution and we can extract the information from the confidence interval. This in turn enhances the predictive power of the model.

**What is time series data analysis in machine learning?**

Time series data analysis in machine learning studies the response of the target variable by considering time as an independent variable. Time series data consist of data collected at successive time intervals. Time series analysis of data is used in the area of weather forecasting, stock market predictions etc.

**What is exploding gradient problem**?

Exploding gradients is a difficulty that often occurs while training deep neural networks using back propagation method. In a deep neural network with n hidden layers, n derivatives will be multiplied together while performing a back propagation. If the derivatives are large enough, the gradient increases exponentially as we propagate backwards in the model. This will cause accumulation of large error gradients and they eventually become large in magnitude. This is called the problem of exploding gradients. The exploding gradient problem makes the model unstable by making it difficult to converge to ideal weight values. We can tackle this problem by reducing the number of layers or by initializing the weights properly.

**Explain the concept marginalization in machine learning**

Marginalization is a statistical technique which involves summing out the probability of a given random variable from the joint probability distribution of this variable with other variables.

If we want to know whether playing a game has a role in someone’s happiness, we can use marginalization as follows,

$$P(\frac{happiness}{game}) = P(\frac{happiness}{cricket}) +P(\frac{happiness}{football}) +P(\frac{happiness}{tennis}) +…$$

**Whether logistic regression is a classification or regression technique? Explain with reason**

Logistic regression is a classification technique. The basis function used in logistic regression is a sigmoid function which is also called logistic function. This logistic function maps any real valued function to a value between 0 and 1. Based on whether the output value is greater than 0.5 or the value is equal or less than 0.5, we will assume the target variable belongs to class 0 or 1. This is a binary classification task. Hence logistic regression is a classification technique.

**Explain the difference between KNN and K-Means algorithms**

K-Nearest Neighbour (KNN) | K-Means |

Supervised learning algorithm used for classification/regression purposes | Unsupervised learning algorithm used for clustering of input data |

K in KNN indicates number of nearest neighbors to a query point | K in K-means is a measure of number of clusters |

KNN is used for datasets of any size | Usually K-means is performed on large data set and the performance of K-means algorithm is faster compared to KNN |

**What are ensemble models and why do you need it in machine learning?**

Ensemble models are models which make use of the aggregate output of diverse machine learning models for final prediction. The ensemble model collects the output from different machine learning algorithms or the output from models which use different datasets for training and then aggregate the output from these base learners to produce a unique final prediction on the fresh data. The objective of using ensemble modeling is to reduce the generalization error caused by base learners.

**Explain few methods to handle outliers in data**

Outliers are data points which show significant differences from the rest of the data. It occurs due to erroneous experimental observations or inconsistent measurements. Outliers can affect the quality of predictions and hence have to be removed.

If the number of outliers is less in number, we can delete those observations. Transforming the variables using methods such as scaling, Box-cox transformation, log transformation etc. can be another effective method to manage outliers. If the number of outliers is considerably large in number, we need to treat them using different statistical modeling.

**Name few recommendation systems and explain its easiness in implementation**

Some of the popular recommendation systems are listed below,

- Collaborative filtering

Collaborative filtering is based on the methodology “customers who bought this also bought”. Online shopping sites such as Amazon, Flipkart etc. use collaborative filtering recommendation systems. It works on the principle, a customer who buys a particular item is more likely to buy an item or similar type or some way related item in future.

- Content-based filtering

This is another popular recommendation system used in ecommerce portals and video streaming apps. Content based filtering are specific to the user. It collects the search history of a customer as input data and recommends similar items to that particular customer.

- Candidate generation network

Candidate generation network uses Deep Neural Networks (DNN) to analyze a person’s online history like the websites they visit frequently, comments, likes etc. as input data and use google brain’s tensor flow to predict the future preferences of the user.

- Knowledge based recommender system

Knowledge based recommendation systems are based on the “if this then that” principle. This recommendation system is not based on the user history but on the interaction with the user. Users can provide feedback to improve the search results in a knowledge-based recommender system. Knowledge based recommendation systems know what content to be recommended in which context.

The next set of interview questions can be found here