'Why do we need "MultiOutputClassifier" if we can get same results without it?

I am learning about multi-label multi-classification examples

It is when you have a case like this

Year   Actor     Budget      |   Genre
------------------------------------------------
2004   Tom C.    40,000,000  |   Action, Darama
2016   Mel G.    54,000,000  |   Comedy, Action, Family
2021   Eva K.    3,000,000   |   Comedy, Romance

I saw an example using MultiOutputClassifier but I do not see a value of using this classifier as models still work without it, without any problem.

Here is the example, you will see that at line (1) which is without MultiOutputClassifier, the results are similar to line (2) with MultiOutputClassifier

so in that case why would anyone use MultiOutputClassifier?

from sklearn.datasets import make_classification
from sklearn.multioutput import MultiOutputClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.utils import shuffle
import numpy as np
import datetime



def time_to_sec(dt):
   ms = (dt.days * 24 * 60 * 60 + dt.seconds) * 1000 + dt.microseconds / 1000.0
   ms = ms / 1000.0
   return ms


X, y1 = make_classification(n_samples=100000, n_features=100, n_informative=30, n_classes=3, random_state=1)
y2 = shuffle(y1, random_state=1)
y3 = shuffle(y1, random_state=2)
Y = np.vstack((y1, y2, y3)).T
n_samples, n_features = X.shape # 10,100
n_outputs = Y.shape[1] # 3
n_classes = 3


forest = RandomForestClassifier(random_state=1)
Tx = datetime.datetime.now()
forest.fit(X, Y).predict(X)  # <------------------------------- (1)
Ty = datetime.datetime.now()
Sec1 = time_to_sec(Tx-Ty)

multi_target_forest = MultiOutputClassifier(forest, n_jobs=-1)
Tx = datetime.datetime.now()
multi_target_forest.fit(X, Y).predict(X)  # <------------------ (2)
Ty = datetime.datetime.now()
Sec2 = time_to_sec(Tx-Ty)

print("Time spend for line (1) = " + str(Sec1))
print("Time spend for line (2) = " + str(Sec2))


Solution 1:[1]

Not every model supports multi-output supervised learning. MultiOutputClassifier is a convenient way to train independent single-output models so that you don't have to manually for-loop through every output and clone your base model. You also benefit from parallelization (see n_jobs parameter).


Note that RandomForestClassifier takes into account possible relationship between outputs by calculating the average reduction across all outputs. So it is not equivalent to MultiOutputClassifier as it creates only one, overall model.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Sanjar Adilov