# A Beginners Guide to Scikit-Learn

Scikit-learn is one of the fundamental **Python library** for data analysis. It is an open source package that is relatively simple, efficient and accessible. This **Python scientific library** focuses on bringing machine learning to non-specialists using a general-purpose high-level language. The Python library mostly focused on processing, analyzing and modelling data. **Scikit Learn** has minimal dependencies and is distributed under the simplified Berkeley Source Distribution (BSD) license, encouraging its use in both organizational and academic settings. Since it depend on the **scientific Python** environment, it can easily be incorporate into projects outside the traditional range of statistical **data analysis** .

## Installing scikit-learn

## A simple machine learning model using Scikit Learn

#### output

## Step by Step explanation...

## Load wine data set

The wine dataset is a classic and very easy multi-class **classification dataset** .

## Logistic Regression classifier

Logistic regression is a fundamental classification technique. It belongs to the group of **linear classifiers** and is somewhat similar to polynomial and **linear regression** . The next step, with Scikit-learn, is to call the logistic **regression estimator** and save it as an object.

## train_test_split Function

The **train_test_split()** function is for splitting a single dataset for two different purposes: for **training data** and for **testing data** .

## fit() method

It basically **trains your model** using the dataset you provided. Fitting your model to the training data is essentially the training part of the **modelling process** .

## predict() method

After it is trained, the model can be used to **make predictions** on previously unseen data, usually with a **predict()** method call.

## Random forest classifier

A forest is comprised of trees. **Random forests** creates decision trees on randomly selected data samples, gets prediction from each tree and selects the best solution by means of voting.

## Simple Imputer

It is a **scikit-learn** class which is helpful in handling the missing data in the predictive model dataset. It replaces the **NaN values** with a specified placeholder.

## Model Evaluation

Once a model has been trained you need to measure how good the model is at **predicting** on new data. This step is known as **model evaluation** and the metric that you choose will be determined by the task you are trying to solve.

## Classification Report

A **Classification report** is used to measure the quality of predictions from a classification algorithm. How many predictions are True and how many are False. The **classification report** visualizer displays the precision, recall, F1, and support scores for the model.

#### output

The classification report is about key metrics in a **classification problem** .

Heading | Description |
---|---|

precision | how many are correctly classified among that class |

recall | how many of this class you find over the whole number of element of this class |

f1-score | harmonic mean between precision and recall |

support | number of occurence of the given class in your dataset |