'How can I increase the speed of my DQN neural network by TensorFlow 2.7?

My computer configuration:
GPU: RTX 970, 4G memory
OS:windows 10 professional
CPU: intel Xeon E5 2680 v2*2
RAM: 128G

I use TensorFlow 2.7 to create a DQN neural network to solve GYM's MountainCar problem.
Source code is here:

I used debug mode and found that the self.target_net.predict() function and the self.evaluate_net.predict() function takes about 70ms to run once.

%timeit qs = self.evaluate_net.predict(observation[np.newaxis])
75.6 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

However, when I debug the build_network() function and use model.summary(), all parameters of a neural network are only 4547. In other codes, I built a CNN Neural Network with 1.6M parameters and a total of 9 layers, and the time to run once was only 8ms. What is the reason why the neural network runs slowly once?

model.summary()
Model: "sequential_1"
___________________________________________________
 Layer (type) Output Shape Param #
===================================================== ===============
 dense_3 (Dense) (None, 64) 192
                                                                 
 dense_4 (Dense) (None, 64) 4160
                                                                 
 dense_5 (Dense) (None, 3) 195
                                                                 
===================================================== ===============
Total params: 4,547
Trainable params: 4,547
Non-trainable params: 0
___________________________________________________

The difference between the two codes is that the code that needs to be optimized has the following hints. Does it have anything to do with this? I checked stackoverflow and it says that Tensorflow needs to be recompiled from source:

I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

I uninstalled TensorFlow, and according to the Intel® Optimization for TensorFlow* Installation Guide, after installing intel_tensorflow 2.7, it runs slower, and using gpus = tf.config.experimental.list_physical_devices(device_type='GPU'), the GPU cannot be found. And the above prompt still appears.

I also tried TF_ENABLE_ONEDNN_OPTS = 1 and it didn't work.



Solution 1:[1]

I found a way to optimize a little bit: predict manual , so, I changed the two functions of DQNAgent, and the time was increased from 70 seconds per round to 35 seconds per round, and the speed was doubled.

The modified code is as follows:


class DQNAgent:
    
    def learn(self, observation, action, reward, next_observation, done):
        self.replayer.store(observation, action, reward, next_observation, done)  # store experience

        # Experience playback
        observations, actions, rewards, next_observations, dones = self.replayer.sample(self.batch_size)
        next_qs = self.target_net(next_observations)
        next_max_qs = next_qs.numpy().max(axis=-1)
        us = rewards + self.gamma * (1. - dones) * next_max_qs
        targets = self.evaluate_net(observations).numpy()
        targets[np.arange(us.shape[0]), actions] = us
        self.evaluate_net.fit(observations, targets, verbose=0)

        if done:  # update target network
            self.target_net.set_weights(self.evaluate_net.get_weights())
        return

    def decide(self, observation):  # epsilon greedy strategy
        if np.random.rand() < self.epsilon:
            return np.random.randint(self.action_n)
        qs = self.evaluate_net(observation[np.newaxis])
        return np.argmax(qs)

Now I just need to optimize this code:

self.evaluate_net.fit(observations, targets, verbose=0)

At one point I thought that there was something wrong with my algorithm. Later found that the following code can solve the MountainCar problem.

https://github.com/DanielPalaio/MountainCar-v0_DeepRL

Comparing his algorithm with mine, the difference is that the size of his hidden layer network is [256,256] , my hidden layer is [64,64]. After I change the hidden layer directly to [1024,1024], my code can also solve the MountainCar problem.

In addition, his code uses tf.train_on_batch(), mine is tf.fit(), and running tf.train_on_batch() on my computer reports an error, so I had to change it to tf.fit().

I think the efficiency of DQN is very slow, DoubleDQN and DuelingDQN are indeed faster than DQN.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1