# 程序代写 COMP3620 / 6320 – S1 2022 – cscodehelp代写

COMP3620 / 6320 – S1 2022
This lecture will be recorded
Turn off your camera if you do not want to be in the recording
If you cannot see my whole slide, go to View Options -> Zoom Ratio and select Fit to Window

If you want to ask a question, either:
• Raise your hand and I will open your mic, or • Type the question in the chat box

Last Lecture Tic-tac-toe

Last Lecture: Minimax
• Perfect play for deterministic, two-player, zero-sum, perfect-information games • Idea: choose move to position with highest minimax value
➢best achievable utility against best possible opponent
E.g. 2-ply game:

Last Lecture: α–β pruning
3 12 8 2 4 6 14 2 5

• Handling large state spaces in Games • Stochastic Games
Outline for today

Imperfect decisions in real-time
• Approach: limit search depth and estimate expected utility
Cutoff Cutoff
expected utility
expected utility
exact utility
Suppose we have 100 seconds, explore 104 nodes/second ⇒ 106 nodes per move ≈ 358/2
⇒ α–β reaches depth 8 ⇒ pretty good chess program

Changes to Minimax
• Use Cutoff test instead of Terminal test
– Cutoff(s,d): true iff the state s encountered at depth d in the tree must be considered as a leaf (or s is terminal).
• e.g., depth limit, estimated number of nodes expanded
• Use Eval instead of Utility
– Eval(s,p) i.e., evaluation function that estimates the expected utility of cutoff state s
wrt player p, and correlates with chances of winning
– should order the terminal states in the same way as Utility
– should not take too long

Evaluation functions
What would be a good evaluation function for tic tac toe??

Evaluation functions
Black to move White to move White slightly better Black winning
For chess, typically linear weighted sum of features Eval(s) = w1f1(s) + w2f2(s) + . . . + wnfn(s)
e.g., w2 = 5 with f2(s) = (number of white castles) – (number of black castles), etc.

Observation: Exact values don’t matter
12 24 120 20400
• Behavior is preserved under any monotonic transformation of Eval • Only the order matters:
– payoff in deterministic games acts as an ordinal utility function

Stochastic Games

Stochastic games: backgammon
0 1 2 3 4 5 6
195−>2141 115−>161 105−>160
7 8 9 10 11 12
0 1 2 3 4 5 6
7 8 9 10 11 12
115−>161 105−>160
25 24 23 22 21 20 19
18 17 16 15 14 13
25 24 23 22 21 20 19
18 17 16 15 14 13
0 1 2 3 4 5 6
7 8 9 10 11 12
0 1 2 3 4 5 6
7 8 9 10 11 12
0 1 2 3 4 5 6
7 8 9 10 11 12
5−>110 195−>2141 115−>161 105−>160
25 24 23 22 21 20 19
18 17 16 15 14 13
5−>101 195−>2141 115−>161 105−>160
5−>101 195−>2141 115−>161 105−>160
25 24 23 22 21 20 19
18 17 16 15 14 13
25 24 23 22 21 20 19
18 17 16 15 14 13

Stochastic games in general
• Chance is introduced by dice, card-shuffling, coin flipping.
• “Chance” can be seen as special kind of player whose move is the outcomes of a
random event, which determines the space of legal moves down the tree. • Chance is not adversarial: the value of chance positions is the expectation
(average) over all possible outcomes of the value of the result.

Minimax in Stochastic Games
• ExpectiMinimax gives perfect play, also like Minimax, expect we must handle chance nodes
−1 0.5 0.5 0.5
MAX’s move
MIN’s coin flip
MIN’s move
2 4 7 4 6 0 5 −2

Stochastic games in practice
• Time complexity: O(bmnm)
– n is the maximum number of outcomes of a chance event
• Dice rolls increase the effective branching factor
• n = 21 possible rolls with 2 dice
• b ≈ 20 legal moves in Backgammon (can be 4,000 with 1-1 roll)
• ≈ 109 nodes at depth 4 of the tree
• α–β pruning is much less effective
• value of lookahead is diminished
• TDGammon [1992] uses depth-2 search + very good Eval
• ≈ world-champion level
• evaluation function learnt over millions of games
• method combined reinforcement learning with neural nets
𝑏: max. branching factor
𝑚: max. depth of the state space

Evaluation function: exact values DO matter
.9 .1 .9 .1 2 3 1 4
2 2 3 3 1 1 4 4
x10 no x100 change
.9 .1 .9 .1 20 30 1 400
20 2030 30 1 1400400
• Behavior is preserved only by positive linear transformation
• Hence Eval should be proportional to the expected payoff determined by Utility

Pruning in stochastic game trees
A version of α-β pruning is possible:

Pruning in stochastic game trees
A version of α-β pruning is possible:
CHANCE 1.5 0.5 0.5
[1.5 ,+∞] 0.5 0.5

Pruning in stochastic game trees
A version of α-β pruning is possible:
CHANCE 1.5 0.5 0.5
[1.5 ,+∞] 0.5

Pruning in stochastic game trees
A version of α-β pruning is possible:
CHANCE 1.5 0.5 0.5
[1.5 ,+∞] 0.5 0.5
[1.5 ,0]!!
Can we prune here?
• No, the value of the chance node could still be high enough to be MAX’s choice and we need to find out exactly how high it is.

Pruning in stochastic game trees
A version of α-β pruning is possible:
CHANCE 1.5 0.5 0.5
[1.5 ,+∞] 0.5

Pruning in stochastic game trees
A version of α-β pruning is possible:
CHANCE 1.5 0.5 0.5
[1.5 ,+∞] 0.5

Pruning in stochastic game trees
A version of α-β pruning is possible:
CHANCE 1.5 0.5 0.5
[1.5 ,0.5]!! 0.5
[1.5 ,1] !!
Can we prune here? • Yes, because
– the value of the MIN node will be at most 1,
– hence the value of the CHANCE node will be at most 0.5,
– which is provably insufficient to be MAX’s choice.

Pruning in stochastic game trees
A version of α-β pruning is possible:
1.5 0.5 0.5
MIN 2 1 0 1

Pruning in stochastic game trees
More pruning occurs if we can bound the leaf values
⇒ Suppose our Eval function returns values in [−2, +2], then

Pruning in stochastic game trees
More pruning occurs if we can bound the leaf values
⇒ Suppose our Eval function returns values in [−2, +2], then

Pruning in stochastic game trees
More pruning occurs if we can bound the leaf values
⇒ Suppose our Eval function returns values in [−2, +2], then
1.5 0.5 0.5

Pruning in stochastic game trees
More pruning occurs if we can bound the leaf values
⇒ Suppose our Eval function returns values in [−2, +2], then
1.5 0.5 0.5
[1.5,+1]!!
[1.5,0] !! 22210112
Can we prune here? • Yes, because
– we know that the right-hand MIN node will be worth at most 2
– therefore the CHANCE node is worth at most 1
– which is not high enough to be MAX’s choice.

Pruning in stochastic game trees
More pruning occurs if we can bound the leaf values
⇒ Suppose our Eval function returns values in [−2, +2], then
1.5 0.5 0.5

Other techniques to tame complexity
• Symmetry pruning
• Monte Carlo sampling
• Iterative deepening
• Pattern databases
• Deep learning
• Monte Carlo Tree Search (MCTS)

Deterministic games in practice
• Checkers: Chinook ended 40-year-reign of world champion in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board (500 billion states). In 2007, checkers became the largest game to be completely solved (with 500 × 1020 states!) at the time.
• Chess: Deep Blue defeated human world champion in a six-game match in 1997. Deep Blue searches 200 million positions per sec, i.e. 300 billion per move (depth 14), uses very sophisticated evaluation (8000 features), and singular extensions to extend some search lines up to 40 ply.
• Go: branching factor b > 300 made this more challenging. Monte Carlo Tree Search (MCTS) is the method of choice. Zen defeated a 9 Dan in 2013. In 2016 AlphaGo won 4-1 against , using deep learning to learn a value function and policy to guide MCTS.
• General game playing: playing any game from the rules of the game. Poker: the next big thing! But it’s a stochastic game!

• A Game is defined by an initial state, a successor function, a terminal test, and a utility function
• The minimax algorithm select optimal actions for two-player zero-sum games of perfect information by a depth first exploration of the game-tree
• α-β pruning does not compromise optimality but increases efficiency by eliminating provably irrelevant subtrees
• It is not feasible to consider the whole game tree (even with α-β), so we need to cut the search off at some point and apply an evaluation function that gives an estimate of the expected utility of a state
• Game trees and minimax can be extended to stochastic games by introducing chance nodes whose value is the expectation of that of their successors
• The value of α-β and lookahead is limited in stochastic games ➢the evaluation function needs to compensate