We have published seven modules of interview questions in machine learning and answers so far on this website. This is yet another addition to the series of discussion on frequently asked questions in machine learning job roles.
Whether stochastic gradient descent or gradient descent is computationally complex?
Gradient descent is computationally complex. After identifying a set of parameters to reduce the loss function, it considers all the data points in the training set to validate it and update the equation. On the other hand, stochastic gradient descent does it by using a portion of data points selected from the training data.
Explain Laplace smoothing
Laplace smoothing is a method employed in Naïve Bayes algorithm to overcome the problem of zero probability values for new features in test data. The zero probability value for query data points occurs when there occurs a new feature in the test stage for which the algorithm is not trained during the training stage.
Laplace smoothing performs the task to tackle the zero probability by adding a smoothing parameter $\alpha$ to the formula of posterior probability and also by balancing the equation by adding $K \alpha$ to the denominator. Here $K$ represents the number of parameters on which the new feature depends.
Write down the pseudo code for proportional sampling
- Perform normalization of all data points, i.e., make the data points in the range (0,1)
- Calculate cumulative some of normalized data
- Plot normal distribution (0,1)
- Pick a random value from the normal distribution
- If the random value is ≤ cumulative sum, then return the number
Explain the difference between K Means and K means++
K-means and K-means++ are clustering methods in unsupervised learning. They differ in the way it selects the centroid to perform clustering around it. K-means depends on the initialization of the centroid around which the clustering occurs. Whereas K-means++ overcome this drawback with its independence on centroid initialization of and results in better and distinct clustering without overlapping.
How would you tackle overfitting in a deep learning model?
- Apply regularization on the weights
- Decrease the complexity of the model by following the dropout method. Adding dropouts make some neurons inactive in each step of training
What is the role of calibration in machine learning models?
Calibration ensures the model is giving exact probability values. As the model follows many assumptions during training, it is possible that the model inaccurate values for probability. So, calibration is required to ensure the output gives the exact values.
What is the benefit of stratified sampling?
In stratified sampling data points corresponding to each feature are equally distributed among each sample.
What is the significance of the confidence interval?
Confidence interval quantifies the certainty of an estimate. It tells the likelihood of a query point in a particular interval.
State central limit theorem
Central limit theorem states that if we make samples with finite variance from a sufficiently large dataset like population, the mean of all such samples will be equal to the population mean
What is the fundamental assumption of Naïve Bayes algorithm?
Naïve bias classifier is based on the assumption of conditional independence among the target features. To be clear, Naïve Bayes assumes that the presence of a particular feature has no relation to the presence of another feature.
The previous set of machine learning interview questions and answers can be found here