'Scikit Learn Dataset

In Scikit learn, when doing X,Y = make_moons(500,noise = 0.2) and after printing X and Y, I see that they are like arrays with a bunch of entries but with no commas? I have data that I want to use instead of the Scikit learn moons dataset, but I dont understand what data type these Scikit learn data sets are and how I can make my data follow this data type.

Solution 1:^[1]

The first one X is a 2d array:

array([[-6.72300890e-01,  7.40277997e-01],
        [ 9.60230259e-02,  9.95379113e-01],
        [ 3.20515776e-02,  9.99486216e-01],
        [ 8.71318704e-01,  4.90717552e-01],
        ....
        [ 1.61911895e-01, -4.55349012e-02]])

Which contains the x-axis, and y-axis position of points.

The second part of the tuple: y, is an array that contains the labels (0 or 1 for binary classification).

array([0, 0, 0, 0, 1, ... ])

To use this data in a simple classification task, you could do the following:

from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Create dataset
X, y = make_moons(500,noise = 0.2)

# Split dataset in a train part and a test part
train_X, test_X, train_y, test_y = train_test_split(X, y)

# Create the Logistic Regression classifier
log_reg = LogisticRegression()

# Fit the logistic regression classifier
log_reg.fit(train_X, train_y)

# Use the trained model to predit con the train and predict samples
train_y_pred = log_reg.predict(train_X)
test_y_pred = log_reg.predict(test_X)

# Print classification report on the training data
print(classification_report(train_y, train_y_pred))

# Print classification report on the test data
print(classification_report(test_y, test_y_pred))

The results are:

On training data

              precision    recall  f1-score   support

           0       0.88      0.87      0.88       193
           1       0.86      0.88      0.87       182

    accuracy                           0.87       375
   macro avg       0.87      0.87      0.87       375
weighted avg       0.87      0.87      0.87       375

On test data

              precision    recall  f1-score   support

           0       0.81      0.89      0.85        57
           1       0.90      0.82      0.86        68

    accuracy                           0.86       125
   macro avg       0.86      0.86      0.86       125
weighted avg       0.86      0.86      0.86       125

As we can see, the f1_score is not very different between the train and the test set, the model is not overfitting.

Solution 2:^[2]

Lets say you have something like that (I added No):

if ( condition1 ) {
  //some code 1
  if ( condition2 ) {
    //some code 2
    if ( condition3 ) {
        //some code 3
      } else {
        return false;
      }
   } else {
     return false;
   }
} else {
  return false;
}

Since each time a condition is false, you exit the function returning false, you can directly test if the condition is false using a negation (if the negated condition is true):

if ( !condition1 ) {
    return false;
}
//some code 1
if ( !condition2 ) {
    return false;
}
//some code 2
if ( !condition3 ) {
    return false;
}
//some code 3

This doesn't reduce the number of if statements, but you avoid many nesting levels and the else statements.

Solution 3:^[3]

You can also try the switch statement. For many situations it will produce cleaner code.

<?php
if ($i == 0) {
    echo "i equals 0";
} elseif ($i == 1) {
    echo "i equals 1";
} elseif ($i == 2) {
    echo "i equals 2";
}

switch ($i) {
    case 0:
        echo "i equals 0";
        break;
    case 1:
        echo "i equals 1";
        break;
    case 2:
        echo "i equals 2";
        break;
}
?>

The switch statement is also compatible with using strings:

<?php
switch ($i) {
    case "apple":
        echo "i is apple";
        break;
    case "bar":
        echo "i is bar";
        break;
    case "cake":
        echo "i is cake";
        break;
}
?>

Good luck! :)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Benjamin Breton
Solution 2
Solution 3	Maximo Migliari

'Scikit Learn Dataset

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]