'Game of the nim, minimax
There are 3 piles (1 pile - 7 matches, 2 pile - 5 matches, 3 pile - 3 matches) you can take any number of matches, but only from one pile, the one who takes the last match loses. https://en.wikipedia.org/wiki/Nim
There is a code that generates all possible moves and stores them in a dictionary ("current position" : "possible positions")
def get_moves(initialstate):
memo = {}
memo[initialstate] = [move for move in move_list(initialstate)]
return memo
def move_list(state):
for i in range(state[0]):
yield (i, state[1], state[2])
for i in range(state[1]):
yield (state[0], i, state[2])
for i in range(state[2]):
yield (state[0], state[1], i)
print(get_moves((7,5,3)))
how to convert this dictionary into a tree and how to implement minimax?
function minimax(node, depth, maximizingPlayer) is
if depth = 0 or node is a terminal node then
return the heuristic value of node
if maximizingPlayer then
value := −∞
for each child of node do
value := max(value, minimax(child, depth − 1, FALSE))
return value
else (* minimizing player *)
value := +∞
for each child of node do
value := min(value, minimax(child, depth − 1, TRUE))
return value
I changed the heuristic function, but the computer still chooses the wrong moves, why? From state [0,1,3] there is a winning state [0,1,0], why does the computer not choose this option?
def minimax(position, depth, max_player):
print(position, evaluate(position))
if depth == 0:
return evaluate(position), position
if max_player:
maxEval = float('-inf')
best_move = None
for move in get_moves(position):
evaluation = minimax(move, depth-1, False)[0]
maxEval = max(maxEval, evaluation)
if maxEval == evaluation:
best_move = move
return maxEval, best_move
else:
minEval = float('inf')
best_move = None
for move in get_moves(position):
evaluation = minimax(move, depth-1, True)[0]
minEval = min(minEval, evaluation)
if minEval == evaluation:
best_move = move
return minEval, best_move
def evaluate(position):
if position == (0,0,0):
return -100
if position == (0,0,1) or position == (1,0,0) or position == (0,1,0):
return 100
return 1 if (position[0] ^ position[1] ^ position[2]) == 0 else 0
def get_moves(initialstate):
memo = [move for move in move_list(initialstate)]
return memo
def move_list(state):
for i in range(state[0]):
yield (i, state[1], state[2])
for i in range(state[1]):
yield (state[0], i, state[2])
for i in range(state[2]):
yield (state[0], state[1], i)
print(list(minimax((0,1,3),12,True)[1]))
Solution 1:[1]
This should be no different from any other implementation of minimax, e.g. for tic tac toe. For each turn you go through all possible moves (draw a stick from any of the piles) and see what the corresponding game state is (win/loss/draw). A draw in this case would be that there still are sticks left in all piles, a win is that opponent draws last stick in a pile, and a loss is that you draw the last stick in a pile.
You don't need to store it in a dict. You can have a general "board" which holds the current game state (sticks in each pile). Then in each minimax iteration you get all possible moves from the current board state (e.g. get_possible_moves(board)).
Solution 2:[2]
A few issues:
The
minimax
function will returnNone
as value when there are no possible moves (i.e. when the state is(0,0,0)
). Instead theevaluate
function should be ran in that case, just like whendepth == 0
.As you use the concept of a maximizing and minimizing player, the base case is not correct:
return -100
would indicate that this is a win for the minimizing player, while in really depends who did the last move. The same goes for the nextreturn 100
.For this reason I would suggest to not work with the concept of maximizing/minimizing players, but consider both players as maximizing, and toggle the sign of the value that is returned.
In the XOR part, I would return -1 instead of 0, so that it really is the opposite value of 1.
As this flavour of the Nim game is the "misère" version (see Wikipedia), the base case should include any case where the largest pile has a size of 1. So, not only
(1,0,0)
, but also(1, 1, 1)
and(1, 0, 1)
... etc.
Here is a correction:
def minimax(position, depth):
val = evaluate(position)
# Use heuristic when depth is reached OR when no more moves possible
if depth == 0 or val == -100:
return -val, None # No move associated with this value
maxEval = -1000
best_move = None
for move in get_moves(position):
# Negate returned value to make it relevant for current player
evaluation = -minimax(move, depth-1)[0]
maxEval = max(maxEval, evaluation)
if maxEval == evaluation:
best_move = move
return maxEval, best_move
def evaluate(position):
# Evaluate the board in the view of
# the player that last moved
if max(position) <= 1: # Largest pile size is at most 1?
# Must leave odd number of non-empty piles to win
return 100 if sum(position) % 2 else -100
# Must leave a XOR sum of zero. Use 1 or -1.
return 1 if (position[0] ^ position[1] ^ position[2]) == 0 else -1
Finally, as this "heuristic" is not really a heuristic, but a perfect evaluation of the state, I don't see the use of minimax here, as the best move will be found with a depth of 1. Passing a higher value of depth
will not lead to better moves. If you want that a player will play a most-promising move when in a loosing position, you'll need to add a real heuristic, which expresses what exactly that "most-promising" means; like:
- "choose the move for which the other player has the least possibilities to play a winning move", or
- "choose the move which delays the loss for as long as possible", or
- "choose a move which maximizes the number of piles that are non-empty" (reasoning that the game is easier for a human to play correctly when it has fewer non-empty piles)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | eligolf |
Solution 2 |