ValueError: Unknown label type: 'unknown'

The Unknown label type: 'unknown' error raised related to the Y values that you use in scikit-learn . There is a mismatch in "What you can pass" Vs. "What you are actually passing". Say between Array Vs. DataFrame or 1D list Vs. 2D list. This means that the scikit-learn library is not able to recognize what type of problem you want to solve ( regression or classification ). Specifically, what type of data is in your Y variable ? Scikit-learn expects you to pass label-like: integer, string, etc. and you providing 'continuous' (probably are float numbers) data.

Solutions:


how to solve ValueError: Unknown label type: 'unknown'
  1. Group your Y values into bins (classes for example: 0, 1, 2, 3) and apply classification modeling to your data.
  2. In most cases, your Y values are of type object, so sklearn cannot recognize its type. Add the line y=y.astype('int') before you pass the variable into the classifier.
  3. When you are passing Y values to rf.fit(X,Y), it expects Y values to be 1D list. Slicing the Panda frame always result in a 2D list. So, you should convert the 2D list provided by pandas DataFrame to a 1D list as expected by fit() function .
  4. If you prefer your predictions to have continuous values, You need to use the regression machine learning methods (eg. RandomForestRegressor) to predict Y values.