'Randomly changing for loop values

I've been working on a deep q learning snake game in my free time, with plans to add genetic algorithm components to it. To that end, I was setting up loops that would allow me to create a given population of snakes that would each run for some number of episodes for a total of some number of generations.

It should be simple. Just some nested for loops. Only, I've been getting some pretty wild results from my for loops.

This is the code in question:

def run(population_size=1, max_episodes=10, max_generations=50):
    total_score = 0

    agents = [Agent() for i in range(population_size)]
    game = SnakeGameAI()

    for cur_gen in range(max_generations):
        game.generation = cur_gen
        for agent_num, agent in enumerate(agents):
            # Set colors
            game.color1 = agent.color1
            game.color2 = agent.color2

            # Set agent number
            game.agent_num = agent_num

            for cur_episode in range(1, max_episodes+1):
                # Get old state
                state_old = agent.get_state(game)

                # Get move
                final_move = agent.get_action(state_old)

                # Perform move and get new state
                reward, done, score = game.play_step(final_move)
                state_new = agent.get_state(game)

                # Train short memory
                agent.train_short_memory(state_old, final_move, reward, state_new, done)

                # Remember
                agent.remember(state_old, final_move, reward, state_new, done)

                # Snake died
                if done:
                    # Train long memory, plot result
                    game.reset()
                    agent.episode = cur_episode
                    game.agent_episode = cur_episode
                    agent.train_long_memory()

                    if score > game.top_score:
                        game.top_score = score
                        agent.model.save()

                    total_score += score
                    game.mean_score = np.round((total_score / cur_episode), 3)
                    
                    print(f"Agent{game.agent_num}")
                    print(f"Episode: {cur_episode}")
                    print(f"Generation: {cur_gen}")
                    print(f"Score: {score}")
                    print(f"Top Score: {game.top_score}")
                    print(f"Mean: {game.mean_score}\n")

And this is the output it gives:

Agent0
Episode: 3
Generation: 7
Score: 0
Top Score: 0
Mean: 0.0

Agent0
Episode: 3
Generation: 14
Score: 0
Top Score: 0
Mean: 0.0

Agent0
Episode: 7
Generation: 20
Score: 1
Top Score: 1
Mean: 0.143

Agent0
Episode: 10
Generation: 26
Score: 0
Top Score: 1
Mean: 0.1

Agent0
Episode: 6
Generation: 28
Score: 1
Top Score: 1
Mean: 0.333

Agent0
Episode: 5
Generation: 37
Score: 0
Top Score: 1
Mean: 0.4

Agent0
Episode: 3
Generation: 43
Score: 0
Top Score: 1
Mean: 0.667

Agent0
Episode: 1
Generation: 45
Score: 1
Top Score: 1
Mean: 3.0

Agent0
Episode: 2
Generation: 49
Score: 0
Top Score: 1
Mean: 1.5

The generation number steadily ticks up every second until it hits 49 and ends the loop, while the episode number randomly changes every time the snake dies. It's bizarre. I've never seen anything like this and have no idea what in my code could possible cause it.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source