# Machine Learning Interview Questions

The term**Machine Learning**refers to the automated detection of meaningful patterns in data. The value of Machine Learning is that it allows you to continually learn from data and

**predict the future**. The quality or quantity of the data will affect the learning and prediction performance.

**Machine Learning**continues to evolve as one of the most promising and in-demand career paths for skilled professionals. This guide is specifically designed for you to do a thorough

**Machine Learning interview**preparation before going for the interview.

Ready to dive in? Then let's get started!

## What is Machine learning?

Machine learning is a scientific discipline, probability theory, that**create a model**based on sample data and use the model to

**make a prediction**or strategy.

## Why do you need machine learning?

Machine learning is needed for tasks that are**too difficult for humans**to develop directly. Some tasks are involving huge datasets, or

**complex calculations**that it is impractical for humans to work out all of the nuances and code for them explicitly. So,

**Machine learning**is a means to use experience, and get better at doing a task, measured by a performance metric.

## Difference between Artificial Intelligence and Machine Learning?

Artificial Intelligence (AI) is a general term used for the field which is trying to**mimic human behaviour**and its intelligence. Any method or approach which is capable of doing this comes under

**Artificial Intelligence**. Machine Learning is a subset of Artificial Intelligence which implements AI by learning patterns from data and then

**make predictions**based on these patterns.

## What are Different Types of Machine Learning algorithms?

There some variations of how to define the types of**ML Algorithms**but commonly they can be categories according to their purpose and following are the main categories:

- Supervised Learning
- Unsupervised Learning
- Semi-supervised Learning
- Reinforcement Learning

**machine learning**algorithms.

- Linear Regression
- Logistic Regression
- Decision Tree
- Support Vector Machines (SVM)
- Naive Bayes
- kNN
- K-Means
- Random Forest
- Dimensionality Reduction Algorithms
- Gradient Boosting algorithms
- Q-Learning
- Temporal Difference (TD)

## How do you make sure which Machine Learning Algorithm to use?

The answer to the question varies depending on many factors, including:

- The size, quality, and nature of data.
- The available computational time.
- The urgency of the task.
- What you want to do with the data.

**single best algorithm**that works for all cases. As a simple starting place, you consider what inputs you have and what outputs you want, which often narrows down choices in any situation. For more,

**scikit-learn.org**published this infographic, that can be helpful, even when you're not using sklearn library.

## List down various approaches for machine learning

The different approaches in Machine Learning are

- Concept Vs. Classification Learning
- Inductive Vs. Analytical Learning
- Statistical Vs. Symbolic Learning

**Concept Vs. Classification Learning**: For concept-learning problems, we are asking: What kind of X will give us Y? For**classification problems**, we want to know: Given X, What will be Y?**Inductive Vs. Analytical Learning**: Inductive Learning is process of learning by example;- where a system tries to introduce generalization by training data.**Analytical learning**stems from the idea that when not enough training examples are provided, it may be possible to "replace" the "missing" examples by prior knowledge and deductive reasoning.**Statistical Vs. Symbolic Learning**: Statistical uses data, numbers or numeric representations, to generalise and predict unknown cases. Symbolic uses reasoning by**"symbols"**which are generally the variables in logical statements.

## What is inductive bias in machine learning?

Technically, when you are trying to learn B from A and, initially, the**hypothesis space**for B is infinite. To learn anything at all, you need to

**reduce the scope**. This is done in the form of your assumptions about the hypothesis space, also called inductive bias. Every

**machine learning algorithm**with any ability to generalize beyond the training data that it sees has some type of inductive bias, which are the assumptions made by the model to learn the

**target function**and to generalize beyond training data. The stronger the inductive bias, the better the sample efficiency; this can be understood in terms of the

**bias-variance trade-off**.

## What is a model learning rate? Is a high learning rate always good?

There are two types of parameters in machine learning;**machine learnable parameters**and

**hyper-parameters**. Hyper-parameters are the one which the machine learning experts will assign specific values to control the way the

**machine learning algorithms**learn and also to tune the performance of the model. The

**model learning rate**is a configurable hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. It has a small positive value, often in the range between 0.0 and 1.0. The best

**learning rate**depends on the problem at hand, as well as on the architecture of the model being optimized, and even on the state of the model in the current optimization process. The

**learning rate**controls how quickly the model is adapted to the problem. If the learning rate is high, thus the model weights are updated fast and frequently, then your model will converge fast, but it may overshoot the

**true error minima**. This means a faster but erroneous model. If the learning rate is low, thus the model weights are updated slowly, then your model will take a long time to converge but will not overshoot the true error minima. This means a slower but more accurate model.

## What is Training set and Test set?

When performing**Machine Learning**, in order to test the effectiveness of your algorithm, you can split the data into: Training set and Test set.

**Training Set:**Sample of data that are randomly drawn from the dataset for the purposes of training the model.**Test Set:**Sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.

The Training Set, Test Set split is usually 80%,20% or 70%,30% respectively.

## How do you choose a classifier based on a training set size?

First of all, you need to identify your problem. It depends upon what kind of data you have and what your desired task is. Different estimators are better suited for different types of data and different problems.

## What is Model Selection in Machine Learning?

Model selection refers to selecting one final**machine learning model**for a particular problem. For this task you need to compare the relative performance between models based on the required criteria. Therefore the

**loss function**and the metric that represent it, becomes fundamental for selecting the right and non-overfitted model.

Types of model selection:

- Resampling methods
- Random Split
- Time-Based Split
- K-Fold Cross-Validation
- Stratified K-Fold
- Bootstrap
- Probabilistic measures

## Difference between a parametric and non-parametric model?

A**parametric model**has a fixed and finite number of parameters with respect to the sample size. This means that the model already knows the number of parameters it requires, regardless of its data. The

**parameters**are also independent of the number of training instances. So the complexity of the model is bounded even if the amount of

**data is unbounded**. This makes them not very flexible. Algorithms that do not make strong assumptions about the form of the mapping function are called

**non-parametric model**. These models do not accept a specific form of the mapping function between input and output data as true. This means that the

**data distribution**cannot be defined in terms of such a finite set of parameters. These models are good when you have a lot of data and no prior knowledge. In this model, the (effective) number of parameters can grow with the sample size. This makes them more flexible.

## What this error message says?

"if using all scalar values you must pass an index"

This error message says pandas **DataFrame**needs an index. While pandas create data frame from a dictionary, it is expecting its value to be a list or dict. If you give it a

**scalar**, you'll also need to supply index. What this is essentially asking for is a column number for each

**dictionary**items. You can solve this issue by using the following methods:

x = 100
y = 200

df = pd.DataFrame({'X': [x], 'Y': [y]}) //adding values inside squire bracets
df

Or use **scalar values**and pass an index:

df = pd.DataFrame({'X': x, 'Y': y}, index=[0]) //pass index value
df

You can also use **pd.DataFrame.from_records**which is more convenient when you already have the dictionary in hand.

df = pd.DataFrame.from_records([{'X': x, 'Y': y}])
df

**Related Topics**