'Is it possible to automatically infer the class_weight from flow_from_directory in Keras?

I have an imbalanced multi-class dataset and I want to use the class_weight argument from fit_generator to give weights to the classes according to the number of images of each class. I'm using ImageDataGenerator.flow_from_directory to load the dataset from a directory.

Is it possible to directly infer the class_weight argument from the ImageDataGenerator object?



Solution 1:[1]

Just figured out a way of achieving this.

from collections import Counter
train_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(...)

counter = Counter(train_generator.classes)                          
max_val = float(max(counter.values()))       
class_weights = {class_id : max_val/num_images for class_id, num_images in counter.items()}                     

model.fit_generator(...,
                    class_weight=class_weights)

train_generator.classes is a list of classes for each image. Counter(train_generator.classes) creates a counter of the number of images in each class.

Note that these weights may not be good for convergence, but you can use it as a base for other type of weighting based on occurrence.

This answer was inspired by: https://github.com/fchollet/keras/issues/1875#issuecomment-273752868

Solution 2:[2]

Alternatively, you can simply do:

from sklearn.utils import class_weight
import numpy as np

class_weights = class_weight.compute_class_weight(
               'balanced',
                np.unique(train_generator.classes), 
                train_generator.classes)

You can then set (as per comment above):

model.fit_generator(..., class_weight=class_weights)

Solution 3:[3]

I tried both solutions and the sklearn.utils.class_weight one gives better accuracy though I am not sure why. They do not both yield the same class weights.

Solution 4:[4]

As suggested in the article here, a good way to assign class weights is to use:

(1 / class_count) * (total_count/2)

Thus, slightly modifying the method suggested above by Fábio Perez:

counter = Counter(train_generator.classes)
total = float(sum(counter.values()))
class_weight = {class_id : (1/num_images)*(total)/2.0 for class_id, num_images in counter.items()}

Solution 5:[5]

The code suggested by Pasha Dembo works pretty well. However, you should transform it in a dictionary before inserting in the model_fit generator:

from sklearn.utils import class_weight import numpy as np

class_weights = class_weight.compute_class_weight(
           'balanced',
            np.unique(train_generator.classes), 
            train_generator.classes)

train_class_weights = dict(enumerate(class_weights))
model.fit_generator(..., class_weight=train_class_weights)

Alternatively, you can simply do:

 from sklearn.utils import class_weight import numpy as np
 
 class_weights = class_weight.compute_class_weight(
                'balanced',
                 np.unique(train_generator.classes), 
                 train_generator.classes) You can then set (as per comment above):
 
 model.fit_generator(..., class_weight=class_weights)

Solution 6:[6]

from sklearn.utils import class_weight
import numpy as np
class_weights = dict(zip(np.unique(traingen.classes),class_weight.compute_class_weight(
                        class_weight = 'balanced',
                        classes = np.unique(traingen.classes), 
                        y = traingen.classes)))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Pasha Dembo
Solution 3 David Brown
Solution 4 Aman Agrawal
Solution 5 DCCoder
Solution 6 Soheil