$30
CSE 573 HW1
100 points
Instructions:
1) The homework should be done individually. Don’t forget to write your name.
2) We highly recommend typing your homework, but writing and scanning also works.
3) Keep your answers brief but provide enough explanations and details to let us know that
you have understood the topic.
4) The assignment is due on Feb 12.
Topics: Points
Search 20
Heuristics for Informed Search 15
Alpha-Beta pruning 15
Expectimax 25
Value Function 15
Markov Decision Processes (MDPs) 10
Problem 1. Search [20 points]
In the following state graph the agent wants to move from the start state (a) to the goal state
(G). The agent can move in the direction indicated by the edges.
At every node, the agent observes a list of next possible destinations (children of the current
node) in alphabetical order and adds them to the fringe, in that order.
In the following questions, umber the nodes in your search tree corresponding to the expansion
order (see below for an example).
A) What is the path returned by Depth First Search (DFS) (ignoring costs on edges)? Show
your search tree. (4 points)
B) What is the path returned by Uniform Cost Search (UCS) given the costs of the edges in
the figure? Show the expanded search tree. (4 points)
C) Are the functions H and admissible heuristics for the graph? Why or why not? If 1 H2
not, specify a node that has an inadmissible heuristic value and provide the interval of
values that would make the heuristic admissible at that node. (4 points)
D) What is the path returned by greedy search using heuristic H2 ? Show the expanded
tree. (4 points)
E) What is the path returned by A* search using heuristic H1 ? Show the expanded tree. (4
points)
Problem 2. Heuristics for Informed Search [15 Points]
A delivery robot is moving in an n*m maze. A simple version of the maze is shown in the figure.
The robot is programmed to deliver multiple parcels to their destinations. Each parcel starts at
some node in the maze and has its own delivery destination. The initial position of the parcels is
shown in the figure, and the number on each parcel is its target destination. At every step the
robot can take one of the following actions:
● Move: Move in one of these directions: {Up, Right, Down, Left}
● Pick: Pick-up a parcel in a location
● Drop: Put down a parcel at a location.
The cost of each move action is 1, and the costs of Pick and Drop are zero. The robot starts at
square number 1 and can move through dotted lines - but the solid lines represent walls. The
robot wants to deliver all parcels to their destinations with minimal cost.
A) If the robot can carry only one parcel at a time, define an admissible heuristic function for
searching the space. Explain in plain english why the heuristic is admissible. Is your
heuristic consistent? Why? Make sure your heuristic is not h(x) = 0. (5 points)
B) If the robot can carry multiple parcels at a time, is the function h(p) = “count of packages
that are not delivered” admissible and consistent? Why or why not? (4 points)
C) If the robot can carry multiple parcels at a time, define two admissible heuristic functions.
Explain in plain english why the heuristic is admissible. Are your heuristics consistent?
Why? Make sure your heuristic is not h(x) = 0, and it is not a function of actual cost
because it is not practical to compute the actual cost in a general case. Hint: the
heuristic function can be a function of carried parcels and un-carried parcels (6 points)
Problem 3. Alpha-Beta pruning [15 points]
Below is the tree showing the states in a 2-player game played by two rational agents. This tree
shows the 2-level expansion of decisions, and the values at the leaves are the utility values at
those states.
A) Show the values of every intermediate node after performing the minimax
algorithm. (5 points)
B) Use the α − β pruning algorithm to determine the branches that need to be cut. (5
points)
C) Change the value of one leaf node so as to maximize the number of leaves
Alpha-Beta needs to explore in the resulting tree. (5 points)
Problem 4 - Expectimax [25 points]
A) In the expectimax search tree tree shown below, fill in the values for interior nodes.
Assume the uniform distribution for the expectation nodes (circular nodes). (2 Points)
B) In the tree above, suppose the opponent chooses the left branch with probability p -
instead of with equal probability. Find the range of values for p that will change the
optimal decision for the root max node, relative to part A. (5 Points)
Small PacMan Maze: In the map below, the pacman and the ghost can move to
any adjacent squares (the action space is {U, D, R, L} ). The movement of the ghost is
random. Pacman starts first and alternates turns with the ghost. The game starts in the shown
figure, and it ends in either one of the two cases.
● Winning: Pacman (maximizer) eats all the dots.
● Losing: The ghost catches the pacman.
Pacman will receive the score of +1 for eating a dot and a penalty score of -2 if it is caught by
the ghost. The final score of the pacman is calculated as the number of dots eaten by the
pacman and the penalty if the ghost caught the pacman.
●
●
C) Draw the expectimax search tree for the first four turns (starting with pacman and then
playing in turns). The ghost moves horizontally with probability of q and it moves
vertically with probability of 1-q. Use the letter ‘P’ for pacman and ‘G’ for ghost when
drawing game states. On the tree please distinguish the max-nodes, terminal nodes and
the expectation nodes. (Hint: You don’t need to expand the whole tree; you can
calculate the values of non-terminal states at depth 4.) (8 points)
D) Calculate the expected score of Pacman if it plays optimally for q=⅓. (6 points)
E) Explain in plain English what Pacman’s strategy should be (4 points).
Problem 5. Evaluation Function [15 Points]
Game of Hex was invented by John Nash in the 1940s. Hex game consists of a rhombus game
map divided into n * n hexagons. Each player in a 2-player game has a marker (Blue and red). At
each round a player can place a marker on an unmarked hexagon, and players alternate turns. The
goal for the players is to link their opposite sides of the board in an unbroken chain. Whoever
connects their sides first wins and receives +1 point and the opponent receives -1. It has been
proven that draw is impossible, hence there’s always a winner.
A) Players can play optimally using a minimax algorithm; Why is expanding the whole game
tree not practical? How can one approximate the value of states under resource limits?
What are the things that we need to consider when we design the evaluation function so
we can evaluate different stages of this game? (5 points)
B) Define an evaluation function that approximates the value of each state. (5 points)
C) If the size of the game is n * n and each time the agent considers next three moves (2 of
itself and 1 for minimizer). What is the Big-θ time cost of each decision? (5 points)
Problem 6. MDP [10 Points]
You are on a competition show similar to American Idol. You will be asked to compete in four
rounds, one at a time. In each round you can either be eliminated, meaning you have to drop
out of the competition, or pass, which means you can proceed to the next round. $500 will be
awarded if you pass the first round (denoted R1), and $100,000 will be awarded if your pass the
last round (denoted R4) and win. Your probability of passing a round is as follows:
P(R1=pass)=4/5, P(R2=pass)=3/4, P(R3=pass)=1/2, P(R4=pass)=1/5
6.1. Model this problem as an MDP. Specify all of the necessary parameters. [6 pts]
6.2. What is the value of each state when γ = 1 ? [4 pts]