'How to properly use tf.metrics.accuracy?

I have some trouble using the accuracy function from tf.metrics for a multiple classification problem with logits as input.

My model output looks like:

logits = [[0.1, 0.5, 0.4],
          [0.8, 0.1, 0.1],
          [0.6, 0.3, 0.2]]

And my labels are one hot encoded vectors:

labels = [[0, 1, 0],
          [1, 0, 0],
          [0, 0, 1]]

When I try to do something like tf.metrics.accuracy(labels, logits) it never gives the correct result. I am obviously doing something wrong but I can't figure what it is.

tensorflow

Solution 1:^[1]

TL;DR

The accuracy function tf.metrics.accuracy calculates how often predictions matches labels based on two local variables it creates: total and count, that are used to compute the frequency with which logits matches labels.

acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(labels, 1), 
                                  predictions=tf.argmax(logits,1))

print(sess.run([acc, acc_op]))
print(sess.run([acc]))
# Output
#[0.0, 0.66666669]
#[0.66666669]

acc (accuracy): simply returns the metrics using total and count, doesnt update the metrics.
acc_op (update up): updates the metrics.

To understand why the acc returns 0.0, go through the details below.

Details using a simple example:

logits = tf.placeholder(tf.int64, [2,3])
labels = tf.Variable([[0, 1, 0], [1, 0, 1]])

acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(labels, 1),   
                                  predictions=tf.argmax(logits,1))

Initialize the variables:

Since metrics.accuracy creates two local variables total and count, we need to call local_variables_initializer() to initialize them.

sess = tf.Session()

sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())

stream_vars = [i for i in tf.local_variables()]
print(stream_vars)

#[<tf.Variable 'accuracy/total:0' shape=() dtype=float32_ref>,
# <tf.Variable 'accuracy/count:0' shape=() dtype=float32_ref>]

Understanding update ops and accuracy calculation:

print('acc:',sess.run(acc, {logits:[[0,1,0],[1,0,1]]}))
#acc: 0.0

print('[total, count]:',sess.run(stream_vars)) 
#[total, count]: [0.0, 0.0]

The above returns 0.0 for accuracy as total and count are zeros, inspite of giving matching inputs.

print('ops:', sess.run(acc_op, {logits:[[0,1,0],[1,0,1]]})) 
#ops: 1.0

print('[total, count]:',sess.run(stream_vars)) 
#[total, count]: [2.0, 2.0]

With the new inputs, the accuracy is calculated when the update op is called. Note: since all the logits and labels match, we get accuracy of 1.0 and the local variables total and count actually give total correctly predicted and the total comparisons made.

Now we call accuracy with the new inputs (not the update ops):

print('acc:', sess.run(acc,{logits:[[1,0,0],[0,1,0]]}))
#acc: 1.0

Accuracy call doesnt update the metrics with the new inputs, it just returns the value using the two local variables. Note: the logits and labels dont match in this case. Now calling update ops again:

print('op:',sess.run(acc_op,{logits:[[0,1,0],[0,1,0]]}))
#op: 0.75 
print('[total, count]:',sess.run(stream_vars)) 
#[total, count]: [3.0, 4.0]

The metrics are updated to new inputs

For more information on how to use the metrics during training and how to reset them during validation, can be found here.

Solution 2:^[2]

On TF 2.0, if you are using the tf.keras API, you can define a custom class myAccuracy which inherits from tf.keras.metrics.Accuracy, and overrides the update method like this:

# imports
# ...
class myAccuracy(tf.keras.metrics.Accuracy):
    def update_state(self, y_true, y_pred, sample_weight=None):
        y_true = tf.argmax(y_true,1)
        y_pred = tf.argmax(y_pred,1)
        return super(myAccuracy,self).update_state(y_true,y_pred,sample_weight)

Then, when compiling the model you can add metrics in the usual way.

from my_awesome_models import discriminador

discriminador.compile(tf.keras.optimizers.Adam(),
                      loss=tf.nn.softmax_cross_entropy_with_logits,
                      metrics=[myAccuracy()])

from my_puzzling_datasets import train_dataset,test_dataset

discriminador.fit(train_dataset.shuffle(70000).repeat().batch(1000), 
                  epochs=1,steps_per_epoch=1, 
                  validation_data=test_dataset.shuffle(70000).batch(1000), 
                  validation_steps=1)

# Train for 1 steps, validate for 1 steps
# 1/1 [==============================] - 3s 3s/step - loss: 0.1502 - accuracy: 0.9490 - val_loss: 0.1374 - val_accuracy: 0.9550

Or evaluate yout model over the whole dataset

discriminador.evaluate(test_dataset.batch(TST_DSET_LENGTH))
#> [0.131587415933609, 0.95354694]

Solution 3:^[3]

Applied on a cnn you can write:

x_len=24*24
y_len=2

x = tf.placeholder(tf.float32, shape=[None, x_len], name='input')

fc1 = ... # cnn's fully connected layer
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
layer_fc_dropout = tf.nn.dropout(fc1, keep_prob, name='dropout')

y_pred = tf.nn.softmax(fc1, name='output')
logits = tf.argmax(y_pred, axis=1)

y_true = tf.placeholder(tf.float32, shape=[None, y_len], name='y_true')
acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(y_true, axis=1), predictions=tf.argmax(y_pred, 1))


sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())

def print_accuracy(x_data, y_data, dropout=1.0):
    accuracy = sess.run(acc_op, feed_dict = {y_true: y_data, x: x_data, keep_prob: dropout})
    print('Accuracy: ', accuracy)

Solution 4:^[4]

Extending the answer to TF2.0, the tutorial here explains clearly how to use tf.metrics for accuracy and loss. https://www.tensorflow.org/beta/tutorials/quickstart/advanced

Notice that it mentions that the metrics are reset after each epoch :

  train_loss.reset_states()
  train_accuracy.reset_states()
  test_loss.reset_states()
  test_accuracy.reset_states()

When label and predictions are one-hot-coded

def train_step(features, labels):
    with tf.GradientTape() as tape:
        prediction = model(features)
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=predictions))
    gradients = tape.gradient(loss, model.trainable_weights)
    optimizer.apply_gradients(zip(gradients, model.trainable_weights))
    train_loss(loss)
    train_accuracy(tf.argmax(labels, 1), tf.argmax(predictions, 1))

Solution 5:^[5]

Here how I use it:



test_accuracy = tf.keras.metrics.Accuracy()

# use dataset api or normal dataset from lists/np arrays
ds_test_batch = zip(x_test,y_test)

predicted_classes =  np.array([])

for (x, y) in ds_test_batch:
  # training=False is needed only if there are layers with different
  # behaviour during training versus inference (e.g. Dropout).
  
  #Ajust the input similar to your input during the training 
  logits =  model(x.reshape(1,-1), training=False ) 
  prediction = tf.argmax(logits, axis=1, output_type=tf.int64)
  
  predicted_classes =  np.concatenate([predicted_classes,prediction.numpy()])
  
  test_accuracy(prediction, y)

print("Test set accuracy: {:.3%}".format(test_accuracy.result()))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2	Javier JC
Solution 3	Tobias Ernst
Solution 4	yuva-rajulu
Solution 5	ÙAbdalrahman M. Amer

'How to properly use tf.metrics.accuracy?

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]

Solution 5:[5]