# ML INTERVIEW QUESTIONS -PART 1

IN THESE INTERVIEW PREP SERIES WE LOOK AT IMPORTANT INTERVIEW QUESTIONS ASKED IN DATA SCIENTIST /ML AND DL ROLES .

IN EACH PART WE WILL DISCUSS FEW ML INTERVIEW QUESTIONS.

#### 1) What is PDF?

Probability density function (PDF) is a statistical expression that defines a probability distribution (the likelihood of an outcome) for a continuous random variable. PDF for an interval indicates the probability of the random variable falling within the interval.

#### 2) What is Confidence Interval?

confidence interval displays the probability that a parameter will fall between a pair of values around the meanConfidence intervals measure the degree of uncertainty or certainty in a sampling method.

#### 3) Can KL divergence be used as a distance measure?

No. It is not a metric measure as it is not symmetric.

#### 4) What is Log-normal distribution?

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution.

#### 5) What is Spearman Rank Correlation Coefficient?

Spearman Rank Correlation Coefficient is determined by applying Pearson Coefficient on rank encoded random variables.

#### 6) Why is “Naive” Bayes naive?

The conditional independence of the variables of a data frame is an assumption in Naive Bayes which can never be true in practice. The conditional independence assumption is made to simplify the computations of the conditional probabilities. Naive Bayes is naive due to this assumption.

#### 7)What is the “Crowding problem” in t-sne?

This happens when the datapoints are distributed in a region on a high-dimensional manifold around i, and we try to model the pairwise distances from i to the datapoints in a two-dimensional map. For example, it is possible to have 11 datapoints that are mutually equidistant in a ten-dimensional manifold but it is not possible to model this faithfully in a two-dimensional map. Therefore, if the small distances can be modeled accurately in a map, most of the moderately distant datapoints will be too far away in the two-dimensional map.

#### 8)What are the limitations of PCA?

PCA should be used mainly for variables which are strongly correlated.

If the relationship is weak between variables, PCA does not work well to reduce data. Refer to the correlation matrix to determine.

PCA Results Are Difficult To Interpret Clearly.

#### 9)Name 2 failure cases of KNN?

When query point is an outlier or when the data is extremely random and has no information.

#### 10) Name 4 assumptions of linear regression

• Linear relationship
• Multivariate normality
• No or little multicollinearity
• No auto-correlation
• Homoscedasticity

#### 11)Why are log probabilities used in Naive -bayes?

The calculation of the likelihood of different class values involves multiplying a lot of small numbers together. This can lead to an underflow of numerical precision. As such it is good practice to use a log transform of the probabilities to avoid this underflow.

#### 12)How to handle Numerical features in(Gaussian NB)?

Numerical features are assumed to be Gaussian. Probabilities are determined by considering the distribution of the data points belonging to different classes separately.

#### 13)How do you get a feature important in naive Bayes?

The naive bayes classifers don’t offer an intrinsic method to evaluate feature importances. Naïve Bayes methods work by determining the conditional and unconditional probabilities associated with the features and predict the class with the highest probability.

#### 14)Differentiate between GD and SGD.

n both gradient descent (GD) and stochastic gradient descent (SGD), you update a set of parameters in an iterative manner to minimize an error function.

In SGD only one data point is used per iteration to calculate the value of the loss function. While for GD all the data points are used to calculate the value of the loss function

#### 15)Do you know the train and run time complexities for a SVM model?

Train time complexity O(n2)

Run time complexity O(k*d)

k=number of support vectors, d=dimensionality of data set

#### 16)Why is RBF kernel SVM compared to kNN?

They are not that similar, but they are related though. The point is, that both kNN and RBF are non-parametric methods to estimate the density of probability of your data.

Notice that this two algorithm approach the same problem differently: kernel methods fix the size of the neighborhood (h) and then calculate K, whereas kNN fixes the number of points, K, and then determines the region in space which contain those points.

#### 17)What decides overfitting and underfitting in DT?

the max_depth parameter decides the overfitting and underfitting in Decision Trees.

#### 18)What is Non negative Matrix Factorization?

decomposing a matrix into 2 smaller matrices with all elements greater than zero and whose product gives us the original matrix.

#### 19)what is Netflix prize problem?

The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users or the films being identified except by numbers assigned for the contest.

#### 20) What are word embeddings?

Word embeddings are a type of word representation IN A VECTOR SPACE that allows words with similar meaning to have a similar representation.