A Beginners Guide to Scikit-Learn
Scikit-learn is one of the fundamental Python library for data analysis. It is an open source package that is relatively simple, efficient and accessible. This Python scientific library focuses on bringing machine learning to non-specialists using a general-purpose high-level language. The Python library mostly focused on processing, analyzing and modelling data. Scikit Learn has minimal dependencies and is distributed under the simplified Berkeley Source Distribution (BSD) license, encouraging its use in both organizational and academic settings. Since it depend on the scientific Python environment, it can easily be incorporate into projects outside the traditional range of statistical data analysis .
A simple machine learning model using Scikit Learn
Step by Step explanation...
Load wine data set
The wine dataset is a classic and very easy multi-class classification dataset .
Logistic Regression classifier
Logistic regression is a fundamental classification technique. It belongs to the group of linear classifiers and is somewhat similar to polynomial and linear regression . The next step, with Scikit-learn, is to call the logistic regression estimator and save it as an object.
The train_test_split() function is for splitting a single dataset for two different purposes: for training data and for testing data .
It basically trains your model using the dataset you provided. Fitting your model to the training data is essentially the training part of the modelling process .
After it is trained, the model can be used to make predictions on previously unseen data, usually with a predict() method call.
Random forest classifier
A forest is comprised of trees. Random forests creates decision trees on randomly selected data samples, gets prediction from each tree and selects the best solution by means of voting.
It is a scikit-learn class which is helpful in handling the missing data in the predictive model dataset. It replaces the NaN values with a specified placeholder.
Once a model has been trained you need to measure how good the model is at predicting on new data. This step is known as model evaluation and the metric that you choose will be determined by the task you are trying to solve.
A Classification report is used to measure the quality of predictions from a classification algorithm. How many predictions are True and how many are False. The classification report visualizer displays the precision, recall, F1, and support scores for the model.
The classification report is about key metrics in a classification problem .
|precision||how many are correctly classified among that class|
|recall||how many of this class you find over the whole number of element of this class|
|f1-score||harmonic mean between precision and recall|
|support||number of occurence of the given class in your dataset|