$30
KTH, DD2380 ARTIFICIAL INTELLIGENCE
A2: Games
Deadline
For grade A you must have your implementations accepted by Kattis with the required
scores by Oct 1 2018, 20:00 sharp and present in one of the "in-time" slots by Oct 2 2018,
12:00. Any assignment accepted by Kattis/presented after these deadlines will result in
max. grade B.
1 INTRODUCTION TO GAME THEORY
Game theory studies systems where two or more agents try to maximize their own gain by
operating on a shared stated. In this assignment, we are interested in a special class of such
problems, namely one where two players are in direct conflict (which are commonly called
zero-sum games).
To put it formally, a game consists of the following parts:
P = {A,B} is the list of players (which will always be the same in our case because we
will only analyze two-player games).
S is the list of possible game states.
s0 is the initial state, the game always starts in the initial state.
µ : P × S → P (S) (where P (S) is the set of all subsets of S) is a function that given a
player and a game state returns the possible game states that the player may achieve
with one legal move by the current player.
γ : P × S → R is a utility function that given a player and a state says how “useful” the
state is for the player.
A game is called a zero-sum game if γ(A,s)+γ(B,s) = 0 or equivalently γ(A,s) = −γ(B,s).
Informally, each component of the specification has a role in structuring the gameplay. More
concretely we have:
1
Figure 1.1: Initial game state for checkers.
Not all states in S need to be reachable by a legal sequence of moves.
µ is the part of the specification that dictates the rules of the game (by dictating which
are the legal moves).
γ is the part of the specification that drives the game forward, and it is not unique (any
rescaling of γ is also a utility function).
We say that the game is over when it reaches a state st such that µ(p,s) = ; and player p is
about to play. In this case s is called a terminal state, and we say that:
Player A wins if γ(A,s) γ(B,s).
Player B wins if γ(B,s) γ(A,s).
The game is tied if γ(A,s) = γ(B,s) = 0.
Hence the objective of player A is to reach a terminal state st that maximizes γ(A,st) and
since γ(B,s) = −γ(A,s), the objective of player B is to reach a terminal state s
0
t
that minimizes
γ(A,s
0
t
).
EXAMPLE: CHECKERS
• S is any 8×8 checker board where only the black squares are occupied, and there are at
most 12 checkers of each color.
• s0 is the board with 12 checkers of each color positioned in opposite sides of the board,
as shown in figure 1.1.
• µ assigns to a game state (s, A) the set µ(s, A) comprising the states s
0
that can be obtained from s by moving a white checker (or a red checker in case of B).
Describe the possible states, initial state, transition function.
2
Describe the terminal states of both checkers and tic-tac-toe.
1.1 HEURISTIC FUNCTIONS
Note that in the previous example we did not describe any utility function. This is because
utility functions are used as a theoretical device and intuitively they describe how a perfect
player would play (at each step the perfect player would choose the next state with the highest utility). It is natural to assume that such a function is difficult to obtain, so instead one
generally uses heuristic functions.
Simply put, a heuristic function is one that approximates a utility function. Usually the process for obtaining such a function is by analyzing a simpler problem, so that the resulting
function is easier to compute. With that said, a rather simple heuristic function for checkers
could be given by:
ν(A,s) = #{white checkers}−#{red checkers}
This function ν belongs to a class of heuristic functions called evaluation functions meaning
that it can be efficiently computed using only the game state1
.
Why is ν(A,s) = #{white checkers}−#{red checkers} a valid heuristic function for checkers
(knowing that A plays white and B plays red)?
When does ν best approximate the utility function, and why?
Can you provide an example of a state s where ν(A,s) 0 and B wins in the following
turn? (Hint: recall the rules for jumping in checkers)
1.2 GAME TREES AND MORE HEURISTICS
From the questions in the previous section (specifically the last one) it should be evident that
a naive heuristic function is, in general not enough to guarantee a victory against a smart (and
sometimes, even random) player. This shortcoming is shared by most evaluation functions.
In this section we introduce the minimax algorithm that, given a naive heuristic function produces a heuristic function that better fits a utility function. To this end we start by introducing
the following heuristic function:
η(A,s) = |w(s, A)| −|l(s, A)| η(B,s) = |w(s,B)| −|l(s,B)|
1Note that given any heuristic function it would be, theoretically possible to create a hash-table associating each
possible state with its utility function, and this would constitute an evaluation function.
3
s0 depth = 0
depth = 1
depth = 2
depth = 3
A
B
A
Figure 1.2: Each node corresponds to a game state, The blue nodes have even depth, the red
nodes have odd depth, and there is an edge from a blue node to a red node (with
corresponding states sb,sr
, respectively) if in state sb, player A can perform a move
that will change the state into sr and there is an edge from a red node to a blue
node (with corresponding states s
0
r
,s
0
b
, respectively) if in state s
0
r
, player B can perform a move that transforms state into s
0
b
.
Where w(s,p) is the set of terminal states at which player p wins and which are reachable
from state s, and l(s,p) is the set of terminal states at which player p loses and which are
reachable from state s (and | · | denotes the list length).
It is easy to see that η satisfies η(A,s) + η(B,s) = 0, and for a terminal state st η(A,st) 0 if
player A wins, η(A,st) < 0 if player B wins, and η(A,st) = 0 if it is a draw. Also, note that η is
not an evaluation function, as it’s computation requires one to filter through valid sequences
of moves.
Now, for simplicity assume that player A always starts the game. In order to calculate η(A,s)
it is convenient to represent the game as a game tree, GT = (V,E) with a set of nodes V , each
of which corresponds to a game state2
and a set of edges E ⊆ V ×V satisfying: Given s,s
0 ∈ V ,
the edge (s,s
0
) (also denoted s → s
0
) is in E if and only if s0 ∈ µ(A,s) and depth(s) is even, or
s
0 ∈ µ(B,s) and depth(s) is odd (see illustration in figure 1.2).
A game tree can be used to search for a winning strategy at each point of the game. Particularly, the complete game tree is a game tree with s0 as root and where every possible move is
expanded until it reaches a terminal state. We can use game trees to solve games, i.e., to find
a sequence of moves that one of the players can follow to guarantee a win. Thus, if we have
access to the complete game tree, we can calculate η(A,s) by checking all terminal states that
are reachable from s by a path of strictly increasing depth.
Will η suffer from the same problem (referred to in the last question) as the evaluation
function ν?
2Note that there may be more than one node with the same state.
4
Figure 1.3: Consider this chessboard setup, with black to play. While playing white an algorithm that only uses a heuristic function such as η may treat the black queen
moving horizontally to a4 in the same way as moving diagonally to a6 while any
player would be able to see that moving the queen to a4 would result in the loss
of the queen, and most likely the loss of any chance of mate by the black player,
whereas moving the queen diagonally would result in mate in two turns.
1.3 MINIMAX ALGORITHM
Now even though η represents a step in the right direction as it takes information about how
the game might proceed from every point, it still has a large drawback. It assumes player
B plays randomly, and assigns the same weight to moves of player B independent of their
outcome (see Figure 1.3 for an example).
In the specific case of η, consider for instance η(A,s) = 10 for a given state s. This just means
that there are 10 more reachable terminal states that result in a victory for player A than for
player B, however it tells nothing of the distribution of such states, and how much B can do
to counteract those moves.
The way around this problem is assuming that the opponent is also aware that it has to optimize it’s chances of winning, and therefore when calculating a heuristic function that requires traversing the tree, we should only consider the transitions that maximize the heuristic
function for player B (therefore minimizing the heuristic for player A). This can be done by
employing the minimax algorithm, the pseudocode is written in Algorithm 1.
5
Algorithm 1: Pseudocode for minimax algorithm
1 in t minimax ( s ta te , player )
2 / / s t a t e : the curren t s t a t e we are analyzing
3 / / p layer : the curren t p layer
4 / / re turn s a h e u r i s t i c value tha t approximates a u t i l i t y func t ion o f the s t a t e
5
6 i f µ(state, player) = ; / / terminal s t a t e
7 return γ(player, state)
8 / / γ i s an eva luat ion func t ion
9
10 e l s e / / can search deeper
11
12 i f player = A
13 bestPossible = −∞
14 for each child in µ(A, state)
15 v = minimax(child,B)
16 bestPossible = max(bestPossible, v)
17 return bestPossible
18
19 e l s e / / player = B
20 bestPossible = +∞
21 for each child in µ(B, state)
22 v = minimax(child, A)
23 bestPossible = min(bestPossible, v)
24 return bestPossible
Note that in Algorithm 1 requires one to evaluate the tree until it reaches a terminal state.
However, given any evaluation function which can assign a heuristic for the game, we can
truncate the search at a given depth by including the number of levels left to analyze as an
argument and decreasing it in each step.
1.4 ALPHA-BETA PRUNING
Now, Algorithm 1 requires one to search the complete game tree (at least until a given depth).
However, this may not be the case. The idea behind the alpha-beta pruning algorithm is that
in some situations, parts of the tree may be ignored if we know a priori that no better result
may be achieved.
To understand how this is possible, consider that player A is traversing the tree in order to
find the next best move using the minimax algorithm. Instead of just keeping track of the
best node found so far, the player can also keep track of the best heuristic value computed so
far, α as well as the worst such value β (which yields the best result for B). At each transition
in the tree α should be updated, if it is player A’s turn, and otherwise β should be updated.
Finally, the remainder of a branch should be disregarded whenever α is greater then β since
that indicates the presence of a non-desirable state.
6
Algorithm 2: Pseudocode for minimax algorithm with alpha-beta pruning
1 in t alphabeta ( s ta te , depth , α , β , player )
2 / / s t a t e : the curren t s t a t e we are analyzing
3 / / α: the curren t be s t value ach ievab le by A
4 / / β: the curren t be s t value ach ievab le by B
5 / / p layer : the curren t p layer
6 / / re turn s the minimax value o f the s t a t e
7
8 i f depth = 0 or µ(state, player) = ;
9 / / terminal s t a t e
10 v = γ(player, state)
11
12 e l s e i f player = A
13 v = −∞
14 for each child in µ(A, state)
15 v = max( v, alphabeta(child, depth-1, α, β, B) )
16 α = max(α, v)
17 i f β ≤ α
18 break / / β prune
19
20 e l s e / / player = B
21 v = ∞
22 for each child in µ(B, state)
23 v = min(v , alphabeta(child, depth-1, α, β, A))
24 β = min(β, v)
25 i f β ≤ α
26 break / / α prune
27 return v
2 ASSIGNMENT: N-DIMENSIONAL TIC-TAC-TOE
In this assignment we consider a special case of n-dimensional generalization of Tic-Tac-Toe
game. The rules are simple. There is an n-dimensional hypercube H, consisting of 4n
cells.
Two players, A and B, take turns marking blank cells in H. The first player to mark 4 cells
along a row wins. Here by a row we mean any 4 cells, whose centers lie along a straight line in
H. For instance, in the case n = 2 any horizontal, vertical and diagonal row is winning. In the
case n = 3, winning rows lie along the 48 orthogonal rows (those which are parallel to one of
the edges of the cube), the 24 diagonal rows, or the 4 main diagonals of the cube, making 76
winning rows in total. Player A always starts the game.
In this assignment, our goal is to find the best possible move for player X given a particular
state of the game. The number n and the maximal execution time of the program depend on
the grade level.
3 ASSIGNMENT: CHECKERS
For obtaining a higher grade, the student is expected to be able to use the general framework
developed for tic-tac-toe and adapt it to play checkers.
Note, that this should amount to simply designing a new heuristic function that operates on
a checker board rather than a tic-tac-toe board.
7
4 GRADE LEVEL E AND D
In order to get E or D, one has to write a program computing the best possible next move for
player X, given a particular state of the game for n = 2. In this case, it is enough to implement
Minimax algorithm on a complete game tree, and speed up the search using the minimax
algorithm with alpha-beta pruning. Due to time constraints, one cannot analyse the complete game tree. Therefore the student should investigate th influence of he maximum search
depth and come up with a suitable evaluation function. The total execution time for 100 different board states should not exceed 1s. As an example of a simple evaluation function, one
can consider the following.
eval(Board) =
X
r∈Rows
myMarks(r )+
X
c∈Columns
myMarks(c)+
X
d∈Diagonals
myMarks(d)
The problem can be found on Kattis:
https://kth.kattis.com/problems/kth.ai.tictactoe2d
To achieve a passing grade the student has to obtain, at least, 22 out of 25 points on Kattis.
Note: The Kattis score does not guarantee the student is given the corresponding grade level,
the grade level will depend on later oral examination.
8
5 GRADE LEVEL C
In order to get C, one has to write a program computing the best possible next move for player
X, given a particular state of a game of 3D tic-tac-toe. Like in the previous grade level, due to
time constraints, the total execution time for analyzing 100 different board states should not
exceed 6s. Just as before, the student should investigate the influence of search depth in the
execution time and performance of the algorithm.
https://kth.kattis.com/problems/kth.ai.tictactoe3d
To achieve this grade the student must obtain at least 20 points out of 25 on Kattis.
Note: The Kattis score does not guarantee the student is given the corresponding grade level,
the grade level will depend on later oral examination.
9
6 GRADE LEVEL B AND A
In order to get B or A, the student is expected to be able to have completed levels E through
C and adapt the algorithms where necessary to the checkers problem which can be found on
Kattis:
https://kth.kattis.com/problems/kth:ai:checkers.
Having completed levels E through C, it should be a straight-forward to make minimax with
alpha-beta pruning work for checkers, however, the student will have to further optimize the
algorithm to the problem in order to get the necessary score. To this end the student may
pursue several avenues, from iterative deepening to implementing repeated states-checking
to move ordering. In other words, it might be possible to get to the required score by fiddling
the depth and the evaluation heuristics, but this is not the point of this exercise and will not
yield A-B grades.
To achieve this grade the student must obtain at least 20 points out of 25 on Kattis.
Note: The Kattis score does not guarantee the student is given the corresponding grade level,
the grade level will depend on later oral examination.
10