COMP 424 – Artificial Intelligence Game Playing

Instructor: Jackie CK Cheung and Readings: R&N Ch 5

Quick recap

Copyright By cscodehelp代写 加微信 cscodehelp

Standard assumptions (except for the lecture on uncertainty):

• Discrete (vs continuous) state space

• Deterministic (vs stochastic) environment

• Observable (vs unobservable) environment

• Static (vs changing) environment

• There is only a single AI agent

COMP-424: Artificial intelligence 2

Quick recap

Standard assumptions (except for the lecture on uncertainty):

• Discrete (vs continuous) state space

• Deterministic (vs stochastic) environment

• Observable (vs unobservable) environment

• Static (vs changing) environment

• There is only a single AI agent

• In the Tic-tac-toe exercise, we pushed uncertainty from the other

agent’s moves into “the environment” (AND nodes)

• Doesn’t account for the fact that the other player has its own goal

COMP-424: Artificial intelligence 3

Today’s lecture

• Adversarial search → games with 2 players

• Planning ahead in a world where other agents are planning

against us

• Minimax search

• Evaluation functions

• Alpha-beta pruning

• State-of-the-art game playing programs

COMP-424: Artificial intelligence 4

Game playing

• One of the oldest, best studied domains in AI! Why?

• They’re fun! People enjoy games and are good at playing them

• Many games are hard:

• State spaces can be huge and complicated

• Chess has branching factor of 35 and games go to 50 moves per player → 35100 nodes in the search tree!

• Games may be stochastic and partially observable

• Real-time constraints (e.g., fixed amount of time between moves)

• Games require the ability to make some decision even when calculating the optimal decision is infeasible

• Clear, clean description of the environment and actions

• Easy performance indicator: winning is everything

COMP-424: Artificial intelligence 5

Types of games

• Perfect information vs. Imperfect information

• Perfect: players observe the exact state of the game

• Imperfect: information is hidden from players

• Fully observable vs. partially observable

• Deterministic vs. Stochastic

• Deterministic: state changes are fully determined by player moves • Stochastic: state changes are partially determined by chance

COMP-424: Artificial intelligence 6

What type of game?

Perfect information? Yes

Checkers Othello Chess Go

Mastermind Solitaire*

Backgammon

Scrabble Monopoly

Most card games

Deterministic

Stochastic

COMP-424: Artificial intelligence

Game playing as search

• Consider 2-player, turn-taking, perfect information, deterministic games

• Can we formulate them as a search problem?

COMP-424: Artificial intelligence 8

Game playing as search

• Consider 2-player, turn-taking, perfect information, deterministic games

• Can we formulate them as a search problem? Yes!

COMP-424: Artificial intelligence 9

Game playing as search

• Consider 2-player, turn-taking, perfect information, deterministic games

• Can we formulate them as a search problem? Yes! State, s: the state of the board + which player moves

Operators, o:

Transition fcn:

Legal moves in a given state Defines the result of a move States in which the game is over

Terminal states: (won/lost/drawn)

Utility fcn: Defines a numeric value for the game for player p, when the game ends in terminal

Simple case: +1 for win, -1 for loss, 0 for draw. More complex cases: points won, money, …

COMP-424: Artificial intelligence 10

Game playing as search

• Consider 2-player, turn-taking, perfect information, deterministic games

• Can we formulate them as a search problem? Yes!

• We want to find a strategy (a way of picking moves) that

maximizes utility

• i.e., maximize the probability of winning or the expected points,

minimize the cost, …

COMP-424: Artificial intelligence 11

Game search challenge

• It’s not quite the same as simple searching

• There’s an opponent! That opponent is adversarial!

COMP-424: Artificial intelligence 12

Game search challenge

• It’s not quite the same as simple searching

• There’s an opponent! That opponent is adversarial!

• The opponent has its own goals which don’t match our goals

• Opponent tries to make things good for itself and bad for us

• We must simulate the opponent’s decisions

COMP-424: Artificial intelligence 13

Quick aside

• In games, there’s an adversarial opponent

• It has its own goals which don’t match our goals

• A special case: the zero-sum game

COMP-424: Artificial intelligence 14

Quick aside

In games, there’s an adversarial opponent

• It has its own goals which don’t match our goals

• A special case: the zero-sum game

Zero-sum games

• Each player’s gain or loss of utility is exactly balanced by the

losses or gains of the utility of the other player

• If I win, you lose: U(p1) = 1 → U(p2) = -1 U(p1) + U(p2) = 0

COMP-424: Artificial intelligence

Game search challenge

• It’s not quite the same as simple searching

• There’s an opponent! That opponent is adversarial!

• It has its own goal which does not match our goal

• We must simulate the opponent’s decisions

• Key idea: Define

• a max player (who wants to maximize the utility)

• a min player (who wants to minimize it)

COMP-424: Artificial intelligence 16

Example: Game Tree for Tic-Tac-Toe

Term for the move of one player. A full move is two ply.

Search Tree:

A tree superimposed on the full game tree that examines enough nodes for the player to determine what move to make.

COMP-424: Artificial intelligence 17

Minimax search

An algorithm for finding the optimal strategy: the best move to play at each state (node)

How it works:

Expand the complete search tree until terminal states have been reached

Compute utilities of the terminal states

Back up from the leaves towards the current game state:

• At each min node: back up the worst value among its children

• At each max node: back up the best value among its children

COMP-424: Artificial intelligence 18

Minimax search

• Expand the search tree to the terminal states

• Compute utilities at terminal states

• Back up from the leaves towards the current game state

• At each min node: back up the worst value among the children

• At each max node: back up the best value among the children

COMP-424: Artificial intelligence 19

Minimax search

COMP-424: Artificial intelligence

Minimax search

COMP-424: Artificial intelligence

Minimax search

COMP-424: Artificial intelligence

Minimax search

COMP-424: Artificial intelligence

Minimax search

COMP-424: Artificial intelligence

Minimax search

The minimax value at each node tells us how to play an optimal game!

COMP-424: Artificial intelligence 25

Minimax Algorithm

operator MinimaxDecision() for each legal operator o:

apply the operator o and obtain the new game state s.

Value[o] = MinimaxValue(s)

return the operator o with the highest value Value[o].

double MinimaxValue(s)

if isTerminal(s), return Utility(s). for each state s’ in Successors(s)

let Value(s’) = MinimaxValue(s’).

if Max’s turn to move in s, return maxs’Value(s’). if Min’s turn to move in s, return mins’Value(s’).

COMP-424: Artificial intelligence 26

• Apply minimax search to the following game search tree

COMP-424: Artificial intelligence 27

Apply minimax search to the following game search tree 7

27 Max4 27

COMP-424: Artificial intelligence 28

Properties of Minimax search

• Complete? • Optimal?

• Time complexity?

• Space complexity?

COMP-424: Artificial intelligence 29

Properties of Minimax search

• Complete? If the game tree is finite

• Optimal? Against an optimal opponent, yes

• It maximizes the worst-case outcome for Max

• There might exist better strategies for suboptimal opponents

• Time complexity? O(bm)

• Space complexity? O(bm) if we use DFS

COMP-424: Artificial intelligence 30

Properties of Minimax search

• Complete? If the game tree is finite

• Optimal? Against an optimal opponent, yes

• It maximizes the worst-case outcome for Max

• There might exist better strategies for suboptimal opponents

• Time complexity? O(bm)

• Space complexity? O(bm) if we use DFS

• So is Minimax suitable for solving chess?

COMP-424: Artificial intelligence 31

Properties of Minimax search

• Complete? If the game tree is finite

• Optimal? Against an optimal opponent, yes

• It maximizes the worst-case outcome for Max

• There could be superior strategies for suboptimal opponents

• Time complexity? O(bm)

• Space complexity? O(bm) if we use DFS

• So is Minimax suitable for solving chess?

• In chess: b≈35, m≈100, so an exact solution is impossible!

COMP-424: Artificial intelligence 32

Coping with resource limitations

• Suppose we have 100 seconds to make a move, and we can search 104 nodes per second

• Then we can only search 106 nodes

(Or even fewer, if we spend time deciding which nodes to search)

• Possible approach:

• Use a cutoff test (e.g., based on a depth limit)

• Use an evaluation function for the nodes where we cut the search (since they’re not terminal states, we don’t know the true utility)

• Need to think about real-time search

COMP-424: Artificial intelligence 33

Evaluation functions

• An evaluation function v(s) estimates the expected utility

of a state (e.g., the likelihood of winning from that state)

• Performance depends strongly on the quality of v(s)

• If it is too inaccurate it will lead the agent astray

• Desiderata:

• It should order the terminal states according to their true utility

• It should be relatively quick to compute

• For nonterminal states, it should be strongly correlated with the actual chances of winning

• An evaluation function can be designed by an expert or learned from experience

COMP-424: Artificial intelligence 34

Evaluation functions

• Most evaluation functions work by calculating features of the state (e.g., # of pawns, queens, etc.)

• Features define categories or equivalence classes of states

• Typically, we compute features separately and then combine

them for an aggregate value

• E.g., if the features of the board are independent, use a weighted linear function:

v(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)

• Independence is a strong, likely incorrect assumption. But it could still be useful!

• An extra bishop is worth more in the endgame; its weight should depend on the “move number” feature

COMP-424: Artificial intelligence 35

Example: Chess

Black to move White to move White slightly better Black winning

• Linear evaluation function: v(s) = w1 f1(s) + w2 f2(s) f1(s) = (# white queens) – (# black queens)

f2(s) = (# white pawns) – (# black pawns)

COMP-424: Artificial intelligence 36

Example: Chess

Black to move White to move White slightly better Black winning

• Linear evaluation function: v(s) = w1 f1(s) + w2 f2(s)

f1(s) = (# white queens) – (# black queens) f2(s) = (# white pawns) – (# black pawns)

w1 = 9 w2 = 3

COMP-424: Artificial intelligence

How precise should the evaluation fcn be?

• For deterministic games, all that matters is that the fcn preserve the ordering of the nodes

• Thus, the move chosen is invariant under monotonic transformations of the evaluation function

• In deterministic games, payoff acts as an ordinal utility function

COMP-424: Artificial intelligence 38

Minimax with an evaluation function

• Use the evaluation function to evaluate non-terminal nodes

• This helps make a decision without searching until the end of the game

• Minimax cutoff algorithm:

Same as standard Minimax, except stop at some maximum depth m, use the evaluation function on those nodes, back up from there

COMP-424: Artificial intelligence 39

Minimax cutoff in chess

• How many moves ahead can we search in chess?

If our hardware could search 106 nodes in the available time, then

minimax cutoff with b=35 could search 4 moves ahead

• Is that good?

4 moves ahead ≈ novice player

8 moves ahead ≈ human master, typical PC 12 moves ahead ≈ Deep Blue, Kasparov

• Key idea:

Instead of exhaustive search, let’s search a few lines of play, but deeply We need pruning!

COMP-424: Artificial intelligence 40

α-β pruning

• Simple idea: if a path looks worse than what we already have, then discard it (prune)

• If the best move at a node cannot change (regardless of what we would find by searching), there’s no need to search further!

• α-β is a standard technique for deterministic, perfect information games

• How does it work?

• It uses two parameters, α and β, to track bounds on the utility or

evaluation values

COMP-424: Artificial intelligence 41

α-β pruning

• Proceed like Minimax, with DFS and then back ups

• Additionally, keep track of:

• the best (highest) leaf value found for Max (in α)

• the best (lowest) leaf value found for Min (in β)

• At max nodes, update α only

• At min nodes, update β only

• Prune in the event of inconsistency (α ≥ β)

COMP-424: Artificial intelligence 42

Initialize α and β

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

COMP-424: Artificial intelligence

In this search, we pruned away two leaf nodes compared to Minimax.

COMP-424: Artificial intelligence 54

• The textbook’s version of this example (Edition 3, p. 168, Figure 5.5) shows the correct steps, but with incorrect and confusing alpha and beta values, which don’t match their algorithm!

COMP-424: Artificial intelligence 55

α-β pruning algorithm

double MaxValue(s,α,β)

if cutoff(s), return Evaluation(s).

for each state s’ in Successors(s)

let α = max { α, MinValue(s’,α,β) }. if α ≥ β, return β.

double MinValue(s,α,β)

if cutoff(s), return Evaluation(s). for each state s’ in Successors(s)

let β = min { β, MaxValue(s’,α,β) }.

if α ≥ β, return α. return β.

COMP-424: Artificial intelligence 56

Important lessons of α-β pruning

• Pruning can greatly increase efficiency!

• Pruning does not affect the final result.

• The best moves are same as returned by Minimax (assuming the opponent is optimal and the evaluation function is correct.)

• Order matters for search efficiency!

• α-β pruning demonstrates the value of reasoning about

which computations are important!

COMP-424: Artificial intelligence 57

Efficiency of α-β pruning

• With bad move ordering, time complexity is O(bm) • The same as Minimax since nothing is pruned

• With perfect ordering, time complexity is O(bm/2)

• This means double the search depth for the same resources!

• In chess: the difference between a novice and expert agent

• On average, O(b3m/4), if we expect to find the max/min after b/2 expansions

• Randomizing the move ordering can achieve the average

• An Evaluation function can be used to order the nodes

• A useful ordering fcn for chess: try captures first, then threats, then forward moves, then backward moves

COMP-424: Artificial intelligence 58

Drawbacks of α-β

• If the branching factor is really big, search depth is still too limited. E.g., Go, where branching factor is about 300

• Optimal play is guaranteed against an optimal opponent if search proceeds to the end of the game. But the opponent may not be optimal!

• If heuristics are used, this assumption turns into the opponent playing optimally according to the same heuristic function as the player.

• This is a very big assumption! What if the opponent plays very differently?

COMP-424: Artificial intelligence 59

Forward pruning

Idea (for domains with large branching factor): Only explore the n best moves for current state (according to the evaluation function)

• Unlike α-β pruning, this can lead to suboptimal solutions

• But it can be very efficient with a good evaluation function

COMP-424: Artificial intelligence 60

State-of-the-art game playing programs

Chinook (Schaeffer et al., U. of Alberta)

• Best checkers player (since 1990’s)

• Plain α-β search, performed on standard PCs

• Evaluation function based on expert features of the board

• Opening move database

• HUGE endgame database!

Chinook has perfect information for all checkers positions involving 8

or fewer pieces on the board (a total of 443,748,401,247 positions)

• Only a few moves in middle of the game are actually searched

• They’ve now done an exhaustive search for checkers, and through optimal play can force at least a draw

COMP-424: Artificial intelligence 62

Deep Blue (IBM)

• Specialized chess processor, special-purpose memory architecture

• Very sophisticated evaluation function (expert features, tuned weights)

• Database of standard openings/closings

• Uses a version of α-β pruning (with undisclosed improvements)

• Can search up to 40-deep in some branches

• Can search over 200 billion positions per second!

• Overall, an impressive engineering feat

COMP-424: Artificial intelligence 63

Computer Chess

• Now, several computer programs running on regular hardware are on par with human champions (e.g., Fritz, Stockfish)

• ELO ratings of top chess bots are estimated at over 3500

• C.f., rating of 2500 needed to become a Grandmaster

• Top human players at just over 2800

• Human chess players use chess bots as a way to analyze and improve their game

• Interesting preview of how humans and AI can cooperate?

AlphaGo (DeepMind, 2016)

• Uses Monte Carlo tree search (more on this next class!) to simulate future game states

• Uses deep reinforcement learning (i.e., multi-layer neural networks + reinforcement learning) to learn how to value moves (policy network) and board states (value network):

• These are machine learning techniques

• Requires access to a database of previous games by expert players

• Performance:

• March 2016: 4W-1L record against

• 2017: 60W-0L record after further training against human pros

• AlphaZero (2017) removed the need for human game data, learned to play “from scratch” against itself!

COMP-424: Artificial intelligence 65

MuZero (DeepMind, 2020)

Some possible strategies for AI in games

• Design a compact state representation

• Search using iterative deepening for real-time play

• Use alpha-beta pruning with an evaluation function

• Order moves using the evaluation function

• Tune the evaluation function using domain knowledge, trial-and-error, learning

• Searching deeper is often more important than having a good evaluation function

• Consider using different strategies for opening, middle and endgame (including look-ups)

• Consider that the opponent might not be optimal

• Decide where to spend the computation effort!

COMP-424: Artificial intelligence 67

• Understand the different types of games

• Understand Minimax and alpha-beta pruning, what they compute, how to implement, and common methods to make them work better (e.g., node ordering)

• Understand the role of the evaluation function and desirable characteristics

• Get familiar with the state-of-the-art for solving some common games

COMP-424: Artificial intelligence 68

Extra Reading (R&N)

• Stochastic games

• Add chance nodes to the search try,

• Minimax → Expectiminimax

• α-β applies (with modifications for chance nodes)

• More sophisticated move ordering schemes

• killer moves

• transposition tables

• More sophisticated cutoffs

• quiescence search

• singular extensions

• Multiplayer games

COMP-424: Artificial intelligence 69

Human or computer – who is better?

• 1994: Chinook

程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com