'Game of the nim, minimax

There are 3 piles (1 pile - 7 matches, 2 pile - 5 matches, 3 pile - 3 matches) you can take any number of matches, but only from one pile, the one who takes the last match loses. https://en.wikipedia.org/wiki/Nim

There is a code that generates all possible moves and stores them in a dictionary ("current position" : "possible positions")

def get_moves(initialstate):
    memo = {}
    memo[initialstate] = [move for move in move_list(initialstate)]
    return memo

def move_list(state):
    for i in range(state[0]):
        yield (i, state[1], state[2])
    for i in range(state[1]):
        yield (state[0], i, state[2])
    for i in range(state[2]):
        yield (state[0], state[1], i)

print(get_moves((7,5,3)))

how to convert this dictionary into a tree and how to implement minimax?

I want something like:

function  minimax(node, depth, maximizingPlayer) is
    if depth = 0 or node is a terminal node then
        return the heuristic value of node
    if maximizingPlayer then
        value := −∞
        for each child of node do
            value := max(value, minimax(child, depth − 1, FALSE))
        return value
    else (* minimizing player *)
        value := +∞
        for each child of node do
            value := min(value, minimax(child, depth − 1, TRUE))
        return value

I changed the heuristic function, but the computer still chooses the wrong moves, why? From state [0,1,3] there is a winning state [0,1,0], why does the computer not choose this option?

def minimax(position, depth, max_player):
    print(position, evaluate(position))
    if depth == 0:
        return evaluate(position), position
    
    if max_player:
        maxEval = float('-inf')
        best_move = None
        for move in get_moves(position):
            evaluation = minimax(move, depth-1, False)[0]
            maxEval = max(maxEval, evaluation)
            if maxEval == evaluation:
                best_move = move
        
        return maxEval, best_move
    else:
        minEval = float('inf')
        best_move = None
        for move in get_moves(position):
            evaluation = minimax(move, depth-1, True)[0]
            minEval = min(minEval, evaluation)
            if minEval == evaluation:
                best_move = move
        
        return minEval, best_move

def evaluate(position):
    if position == (0,0,0):
        return -100
    if position == (0,0,1) or position == (1,0,0) or position == (0,1,0):
        return 100
    return 1 if (position[0] ^ position[1] ^ position[2]) == 0 else 0

def get_moves(initialstate):
    memo = [move for move in move_list(initialstate)]
    return memo

def move_list(state):
    for i in range(state[0]):
        yield (i, state[1], state[2])
    for i in range(state[1]):
        yield (state[0], i, state[2])
    for i in range(state[2]):
        yield (state[0], state[1], i)
        
print(list(minimax((0,1,3),12,True)[1]))




Solution 1:[1]

This should be no different from any other implementation of minimax, e.g. for tic tac toe. For each turn you go through all possible moves (draw a stick from any of the piles) and see what the corresponding game state is (win/loss/draw). A draw in this case would be that there still are sticks left in all piles, a win is that opponent draws last stick in a pile, and a loss is that you draw the last stick in a pile.

You don't need to store it in a dict. You can have a general "board" which holds the current game state (sticks in each pile). Then in each minimax iteration you get all possible moves from the current board state (e.g. get_possible_moves(board)).

Solution 2:[2]

A few issues:

  • The minimax function will return None as value when there are no possible moves (i.e. when the state is (0,0,0)). Instead the evaluate function should be ran in that case, just like when depth == 0.

  • As you use the concept of a maximizing and minimizing player, the base case is not correct: return -100 would indicate that this is a win for the minimizing player, while in really depends who did the last move. The same goes for the next return 100.

    For this reason I would suggest to not work with the concept of maximizing/minimizing players, but consider both players as maximizing, and toggle the sign of the value that is returned.

  • In the XOR part, I would return -1 instead of 0, so that it really is the opposite value of 1.

  • As this flavour of the Nim game is the "misère" version (see Wikipedia), the base case should include any case where the largest pile has a size of 1. So, not only (1,0,0), but also (1, 1, 1) and (1, 0, 1) ... etc.

Here is a correction:

def minimax(position, depth):
    val = evaluate(position)
    # Use heuristic when depth is reached OR when no more moves possible
    if depth == 0 or val == -100:  
        return -val, None  # No move associated with this value
    
    maxEval = -1000
    best_move = None
    for move in get_moves(position):
        # Negate returned value to make it relevant for current player
        evaluation = -minimax(move, depth-1)[0]
        maxEval = max(maxEval, evaluation)
        if maxEval == evaluation:
            best_move = move
    return maxEval, best_move

def evaluate(position):
    # Evaluate the board in the view of 
    #     the player that last moved
    if max(position) <= 1:  # Largest pile size is at most 1?
        # Must leave odd number of non-empty piles to win
        return 100 if sum(position) % 2 else -100
    # Must leave a XOR sum of zero. Use 1 or -1.
    return 1 if (position[0] ^ position[1] ^ position[2]) == 0 else -1

Finally, as this "heuristic" is not really a heuristic, but a perfect evaluation of the state, I don't see the use of minimax here, as the best move will be found with a depth of 1. Passing a higher value of depth will not lead to better moves. If you want that a player will play a most-promising move when in a loosing position, you'll need to add a real heuristic, which expresses what exactly that "most-promising" means; like:

  • "choose the move for which the other player has the least possibilities to play a winning move", or
  • "choose the move which delays the loss for as long as possible", or
  • "choose a move which maximizes the number of piles that are non-empty" (reasoning that the game is easier for a human to play correctly when it has fewer non-empty piles)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 eligolf
Solution 2