Machine Learning Interview Questions

The term Machine Learning refers to the automated detection of meaningful patterns in data. The value of Machine Learning is that it allows you to continually learn from data and predict the future . The quality or quantity of the data will affect the learning and prediction performance. Machine Learning continues to evolve as one of the most promising and in-demand career paths for skilled professionals. This guide is specifically designed for you to do a thorough Machine Learning interview preparation before going for the interview.

Ready to dive in? Then let's get started!

What is Machine learning?

Machine learning is a scientific discipline, probability theory, that create a model based on sample data and use the model to make a prediction or strategy.

Why do you need machine learning?

Machine learning is needed for tasks that are too difficult for humans to develop directly. Some tasks are involving huge datasets, or complex calculations that it is impractical for humans to work out all of the nuances and code for them explicitly. So, Machine learning is a means to use experience, and get better at doing a task, measured by a performance metric.

Difference between Artificial Intelligence and Machine Learning?

Artificial Intelligence (AI) is a general term used for the field which is trying to mimic human behaviour and its intelligence. Any method or approach which is capable of doing this comes under Artificial Intelligence . Machine Learning is a subset of Artificial Intelligence which implements AI by learning patterns from data and then make predictions based on these patterns.

What are Different Types of Machine Learning algorithms?

There some variations of how to define the types of ML Algorithms but commonly they can be categories according to their purpose and following are the main categories:
  1. Supervised Learning
  2. Unsupervised Learning
  3. Semi-supervised Learning
  4. Reinforcement Learning
Based on the above categories, here is the list of commonly used machine learning algorithms.
  1. Linear Regression
  2. Logistic Regression
  3. Decision Tree
  4. Support Vector Machines (SVM)
  5. Naive Bayes
  6. kNN
  7. K-Means
  8. Random Forest
  9. Dimensionality Reduction Algorithms
  10. Gradient Boosting algorithms
  11. Q-Learning
  12. Temporal Difference (TD)

How do you make sure which Machine Learning Algorithm to use?

The answer to the question varies depending on many factors, including:

  1. The size, quality, and nature of data.
  2. The available computational time.
  3. The urgency of the task.
  4. What you want to do with the data.
Their is no single best algorithm that works for all cases. As a simple starting place, you consider what inputs you have and what outputs you want, which often narrows down choices in any situation. For more, scikit-learn.org published this infographic, that can be helpful, even when you're not using sklearn library.
Top 100 Machine Learning Interview Questions & Answers

List down various approaches for machine learning

The different approaches in Machine Learning are

  1. Concept Vs. Classification Learning
  2. Inductive Vs. Analytical Learning
  3. Statistical Vs. Symbolic Learning
  1. Concept Vs. Classification Learning : For concept-learning problems, we are asking: What kind of X will give us Y? For classification problems , we want to know: Given X, What will be Y?
  2. Inductive Vs. Analytical Learning : Inductive Learning is process of learning by example;- where a system tries to introduce generalization by training data. Analytical learning stems from the idea that when not enough training examples are provided, it may be possible to "replace" the "missing" examples by prior knowledge and deductive reasoning.
  3. Statistical Vs. Symbolic Learning : Statistical uses data, numbers or numeric representations, to generalise and predict unknown cases. Symbolic uses reasoning by "symbols" which are generally the variables in logical statements.

What is inductive bias in machine learning?

Technically, when you are trying to learn B from A and, initially, the hypothesis space for B is infinite. To learn anything at all, you need to reduce the scope . This is done in the form of your assumptions about the hypothesis space, also called inductive bias. Every machine learning algorithm with any ability to generalize beyond the training data that it sees has some type of inductive bias, which are the assumptions made by the model to learn the target function and to generalize beyond training data. The stronger the inductive bias, the better the sample efficiency; this can be understood in terms of the bias-variance trade-off .

What is a model learning rate? Is a high learning rate always good?

There are two types of parameters in machine learning; machine learnable parameters and hyper-parameters . Hyper-parameters are the one which the machine learning experts will assign specific values to control the way the machine learning algorithms learn and also to tune the performance of the model. The model learning rate is a configurable hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. It has a small positive value, often in the range between 0.0 and 1.0. The best learning rate depends on the problem at hand, as well as on the architecture of the model being optimized, and even on the state of the model in the current optimization process. The learning rate controls how quickly the model is adapted to the problem. If the learning rate is high, thus the model weights are updated fast and frequently, then your model will converge fast, but it may overshoot the true error minima . This means a faster but erroneous model. If the learning rate is low, thus the model weights are updated slowly, then your model will take a long time to converge but will not overshoot the true error minima. This means a slower but more accurate model.

What is Training set and Test set?

When performing Machine Learning , in order to test the effectiveness of your algorithm, you can split the data into: Training set and Test set.
  1. Training Set: Sample of data that are randomly drawn from the dataset for the purposes of training the model.
  2. Test Set: Sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.

The Training Set, Test Set split is usually 80%,20% or 70%,30% respectively.

How do you choose a classifier based on a training set size?

First of all, you need to identify your problem. It depends upon what kind of data you have and what your desired task is. Different estimators are better suited for different types of data and different problems.

Top 100 Essential Machine Learning Interview Questions and Answers

What is Model Selection in Machine Learning?

Model selection refers to selecting one final machine learning model for a particular problem. For this task you need to compare the relative performance between models based on the required criteria. Therefore the loss function and the metric that represent it, becomes fundamental for selecting the right and non-overfitted model.

Types of model selection:

  1. Resampling methods
  2. Random Split
  3. Time-Based Split
  4. K-Fold Cross-Validation
  5. Stratified K-Fold
  6. Bootstrap
  7. Probabilistic measures

Difference between a parametric and non-parametric model?

A parametric model has a fixed and finite number of parameters with respect to the sample size. This means that the model already knows the number of parameters it requires, regardless of its data. The parameters are also independent of the number of training instances. So the complexity of the model is bounded even if the amount of data is unbounded . This makes them not very flexible. Algorithms that do not make strong assumptions about the form of the mapping function are called non-parametric model . These models do not accept a specific form of the mapping function between input and output data as true. This means that the data distribution cannot be defined in terms of such a finite set of parameters. These models are good when you have a lot of data and no prior knowledge. In this model, the (effective) number of parameters can grow with the sample size. This makes them more flexible.

What this error message says?
"if using all scalar values you must pass an index"

This error message says pandas DataFrame needs an index. While pandas create data frame from a dictionary, it is expecting its value to be a list or dict. If you give it a scalar , you'll also need to supply index. What this is essentially asking for is a column number for each dictionary items. You can solve this issue by using the following methods:
x = 100 y = 200
df = pd.DataFrame({'X': [x], 'Y': [y]}) //adding values inside squire bracets df
Or use scalar values and pass an index:
df = pd.DataFrame({'X': x, 'Y': y}, index=[0]) //pass index value df
You can also use pd.DataFrame.from_records which is more convenient when you already have the dictionary in hand.
df = pd.DataFrame.from_records([{'X': x, 'Y': y}]) df