'Dependent hyperparameters with keras tuner

My goal is to tune over possible network architectures that meet the following criteria:

  1. Layer 1 can have any number of hidden units from this list: [32, 64, 128, 256, 512]

Then, the number of hidden units to be explored for the rest of the layers should always depend on the particular selection that was made in the layer above it, specifically:

  1. Layer 2 can have the same or half as many units as layer 1.
  2. Layer 3 can have the same or half as many units as layer 2.
  3. Layer 4 can have the same or half as many units as layer 3.

As I am currently implementing it, the hp.Choice options for layers 2, 3 and 4 are never updating once they have been established for the first time.

For example, pretend on the first pass of the tuner num_layers = 4 which means all four layers will get created. If, for example, layer 1 selects 256 hidden units, the options become:

Layer 2 --> [128, 256]

Layer 3 --> [64, 128]

Layer 4 --> [32, 64]

Layers 2, 3 and 4 stay stuck with these choices for every iteration that follows, rather than updating to adapt to future selections for layer 1.

This means in future iterations when the number of hidden units in layer 1 changes, the options for layers 2, 3 and 4 no longer meet the intended goal of exploring options where each subsequent layer can either contain the same or half as many hidden units as the previous layer.

def build_and_tune_model(hp, train_ds, normalize_features, ohe_features, max_tokens, passthrough_features):
    
    all_inputs, encoded_features = get_all_preprocessing_layers(train_ds,
                                                            normalize_features=normalize_features,
                                                            ohe_features=ohe_features,
                                                            max_tokens=max_tokens,
                                                            passthrough=passthrough_features)

    
    
    # Possible values for the number of hidden units in layer 1.
    # Defining here because we will always have at least 1 layer.
    layer_1_hidden_units = hp.Choice('layer1_hidden_units', values=[32, 64, 128, 256, 512])

    # Possible number of layers to include
    num_layers = hp.Choice('num_layers', values=[1, 2, 3, 4])
    
    print("================= starting new round =====================")
    print(f"Layer 1 hidden units = {hp.get('layer1_hidden_units')}")
    print(f"Num layers is {hp.get('num_layers')}")
    
    
    all_features = layers.concatenate(encoded_features)
    
    x = layers.Dense(layer_1_hidden_units,
                     activation="relu")(all_features)

    
    if hp.get('num_layers') >= 2:
        
        with hp.conditional_scope("num_layers", [2, 3, 4]):
            
            # Layer 2 hidden units can either be half the layer 1 hidden units or the same.
            layer_2_hidden_units = hp.Choice('layer2_hidden_units', values=[(int(hp.get('layer1_hidden_units') / 2)),
                                                                            hp.get('layer1_hidden_units')])

            
            print("\n==========================================================")
            print(f"In layer 2")
            print(f"num_layers param = {hp.get('num_layers')}")
            print(f"layer_1_hidden_units = {hp.get('layer1_hidden_units')}")
            print(f"layer_2_hidden_units = {hp.get('layer2_hidden_units')}")
            print("==============================================================\n")

            x = layers.Dense(layer_2_hidden_units,
                             activation="relu")(x)

    if hp.get('num_layers') >= 3:
        
        with hp.conditional_scope("num_layers", [3, 4]):
        
            # Layer 3 hidden units can either be half the layer 2 hidden units or the same.
            layer_3_hidden_units = hp.Choice('layer3_hidden_units', values=[(int(hp.get('layer2_hidden_units') / 2)),
                                                                            hp.get('layer2_hidden_units')])


            print("\n==========================================================")
            print(f"In layer 3")
            print(f"num_layers param = {hp.get('num_layers')}")
            print(f"layer_1_hidden_units = {hp.get('layer1_hidden_units')}")
            print(f"layer_2_hidden_units = {hp.get('layer2_hidden_units')}")
            print(f"layer_3_hidden_units = {hp.get('layer3_hidden_units')}")
            print("==============================================================\n")

            x = layers.Dense(layer_3_hidden_units,
                             activation="relu")(x)

    if hp.get('num_layers') >= 4:
        
        with hp.conditional_scope("num_layers", [4]):
        
            # Layer 4 hidden units can either be half the layer 3 hidden units or the same.
            # Extra stipulation applied here, layer 4 hidden units can never be less than 8.
            layer_4_hidden_units = hp.Choice('layer4_hidden_units', values=[max(int(hp.get('layer3_hidden_units') / 2), 8),
                                                                            hp.get('layer3_hidden_units')])


            print("\n==========================================================")
            print(f"In layer 4")
            print(f"num_layers param = {hp.get('num_layers')}")
            print(f"layer_1_hidden_units = {hp.get('layer1_hidden_units')}")
            print(f"layer_2_hidden_units = {hp.get('layer2_hidden_units')}")
            print(f"layer_3_hidden_units = {hp.get('layer3_hidden_units')}")
            print(f"layer_4_hidden_units = {hp.get('layer4_hidden_units')}")
            print("==============================================================\n")

            x = layers.Dense(layer_4_hidden_units,
                             activation="relu")(x)

    
    output = layers.Dense(1, activation='sigmoid')(x)
    
    model = tf.keras.Model(all_inputs, output)
    
    model.compile(optimizer=tf.keras.optimizers.Adam(),
                  metrics = ['accuracy'],
                  loss='binary_crossentropy')
    
    print(">>>>>>>>>>>>>>>>>>>>>>>>>>>> End of round <<<<<<<<<<<<<<<<<<<<<<<<<<<<<")
    
    return model

Does anyone know the correct way to tell Keras Tuner to explore all possible options for each layers hidden units, where the area to explore satisfies the criteria that each layer after the first is allowed to have the same or half as many hidden units as the previous layer, and the first layer can have a number hidden units from the list [32, 64, 128, 256, 512]?



Solution 1:[1]

For this we first need to understand how the hyper parameters and their values are getting selected, before control reaches to our application, Keras tuner selects all the active hyper parameters from the hyper parameter space, an active hyper parameter means it’s associated condition is getting satisfied(note: by default hyper parameters don't have any condition assigned to them) and then Keras tuner will generate random value from a list of values associated to each active hyper-parameter, that means selection of hyper parameter and it’s value is already done before control reaches to our application, in our application it just pulls the already generated value, that's why you will always see the hyper parameters never updating once they have been established for the first time.

In your case, let's consider a scenario, let's say in first trial, it generates 256 as unit count for first layer then below code will create a hyper parameter 'layer2_hidden_units' for second layer with possible set of values as [128, 256]

layer_2_hidden_units = hp.Choice('layer2_hidden_units', values=[(int(hp.get('layer1_hidden_units') / 2)),  hp.get('layer1_hidden_units')])

In the second trial, before reaching control to your application, it has already taken a value from the list [128, 256], let's say 128, so the value of hyper-parameter 'layer2_hidden_units' will be 128 and then at your application it just pulls the already generated value.

Solution to your query is to generate hyper parameter dynamically like below

hidden_units = hp.Choice('units_layer_' + str(layer_index), values=[(int(hp.get('layer1_hidden_units') / 2)), hp.get('layer1_hidden_units')])

# where 
# hp.get('layer1_hidden_units') = 256 and layer_index = 2
# or hp.get('layer1_hidden_units') = 128 and layer_index = 1
# and so on...

Now let's take our already discussed scenario, where Keras tuner selected 256 as unit count for first layer in first trial, then for the same trial above code will allow Keras tuner to set hyper parameters for remaining layers as hidden_units_layer_2 = [128, 256], hidden_units_layer_1 = [64, 128], hidden_units_layer_0 = [32, 64]

But now we will face second challenge, it will always activate all hyper parameters in forthcoming trials although some of them will not be required, for example in second trial if the selected unit count for first layer is 64 then also it will activate the hidden_units_layer_2=[128, 256] and hidden_units_layer_1=[64, 128], that means now we need to disable them by adding them under condition scope as below

with hp.conditional_scope(parent_units_name, parent_units_value):
   hidden_units = hp.Choice(child_units_name, values=child_units_value)

The final code will look as below

# List possible units
possible_units = [32, 64, 128, 256, 512]

possible_layer_units = []
for index, item in enumerate(possible_units[:-1]):
    possible_layer_units.append([item, possible_units[index + 1]])

# possible_layer_units = [[32, 64], [64, 128], [128, 256], [256, 512]] 
# where list index represent layer number 
# and list element represent list of unit possibilities for each layer

first_layer_units = hp.Choice('first_layer_units', values=possible_units)

# Then add first layer
all_features = layers.concatenate(encoded_features)  
x = layers.Dense(first_layer_units, activation="relu")(all_features)

# Get the number of hidden layers based on first layer unit count
hidden_layer_count = possible_units.index(first_layer_units)
if 0 < hidden_layer_count:
    iter_count = 0
    for hidden_layer_index in range(hidden_layer_count - 1, -1, -1):
        if iter_count == 0:
            # Collect HP 'units' details for the second layer
            # Suppose first_layer_units = 512, then
            # HP example: <units_layer_43=[256, 512] condition={first_layer_units:[256, 512]}>
            # where for units_layer_43, 4 indicates there will be total 5 layers and 3 indicates 4th layer from last
            # we are using total hidden layer count in HP name to avoid an issue while getting the unit count value.
            parent_units_name = 'first_layer_units'
            parent_units_value = possible_layer_units[hidden_layer_index]
            child_units_name = 'units_layer_' + str(hidden_layer_count) + str(hidden_layer_index)
            child_units_value = parent_units_value
        else:
            # Collect HP 'units' details for the next layers
            # Suppose units_layer_43 = 256, then
            # HP example: <units_layer_42=[128, 256] condition={units_layer_43:[256, 512]}>
            parent_units_name = 'units_layer_' + str(hidden_layer_count) + str(hidden_layer_index + 1)
            parent_units_value = possible_layer_units[hidden_layer_index + 1]
            child_units_name = 'units_layer_' + str(hidden_layer_count) + str(hidden_layer_index)
            child_units_value = possible_layer_units[hidden_layer_index]

        # Add and Activate child HP under parent HP using conditional scope
        with hp.conditional_scope(parent_units_name, parent_units_value):
            hidden_units = hp.Choice(child_units_name, values=child_units_value)
            
        # Add remaining NN layers one by one
        x = layers.Dense(hidden_units, activation="relu")(x)

        iter_count += 1

So this way only those hyper-parameters will get activated for which the associated condition gets satisfied, hence in our case if in second trial and for first layer the selected unit count is 64 then hyper-parameters 'units_layer_2' and 'units_layer_1' will be disabled because of the conditional scope and only hyper parameter 'units_layer_0' will be kept as active.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1