# 程序代写代做代考 chain Reinforcement Learning

Reinforcement Learning

In-class tutorial:Worked examples

[DP, MC, basics of TD]

Subramanian Ramamoorthy

School of InformaFcs

17 January 2017

Plan for the Session

• Problems chosen to illustrate concepts covered in earlier

lectures

• We will work out problems on the board and take ques:ons

to clarify concepts

• These slides provide the outline sketch of the ques:ons to be

covered

07/02/17 Reinforcement Learning 2

0. Interpreta:on of V and Q

07/02/17 Reinforcement Learning 3

Using the task of selecting

a club to play the game of

golf, discuss the meaning

of V and Q

What are:

- States

- Actions

- Rewards

What do you understand

by the shape and numbers

in this figure?

I. Interpreta:on of Vπ and π

• Cells = States

• NSEW ac:ons resul:ng in

movement by 1 cell

• Ac:ons taking agent off grid

have no effect but incur

reward of -1

• All other ac:ons result in a

reward of 0

– except those that move

the agent out of the special

states A and B.

07/02/17 Reinforcement Learning 4

Inspect and interpret Vπ

I. Interpreta:on of Vπ

07/02/17 Reinforcement Learning 5

I. Interpreta:on of V* and π*

07/02/17 Reinforcement Learning 6

Calculate and show that

Bellman’s equation holds

for centre state – to

understand nature of V*

Interpre:ng V: Cost-to-go

07/02/17 Reinforcement Learning 7

Finding the shortest path in a graph

using optimal substructure; a straight

line indicates a single edge; a wavy line

indicates a shortest path between two

vertices it connects (other nodes on

these paths are not shown); bold line

is the overall shortest path

from start to goal. [From Wikipedia]

Understanding the recursion:

If shortest path from LA to NY must

include Chicago, then shortest path

from LA to Chicago can be computed

separately from last leg.

II. Value/ Policy Itera:on

using Grid World

• Calculate ini:al steps of Policy Evalua:on using a grid world

example seen in our earlier lectures

07/02/17 Reinforcement Learning 8

Vπ and Greedy π at k = 2

07/02/17 Reinforcement Learning 9

III. MC Value Evalua:on

• Work out some steps of the MC value evalua:on process for

the 5-state Markov Chain example (for a random walker who

goes one step to the le] or right with equal probability)

07/02/17 Reinforcement Learning 10

IV. Understanding MC through modified

random walk

07/02/17 Reinforcement Learning 11

• The transi:on probabili:es for state C are as shown. For all

other states, the transi:ons are based on a fair coin flip. The

square is an absorbing terminal state with reward as shown.

Perform some initial

steps of calculation of

Vπ using first-visit

MC.

Discuss MC with Exploring Starts, etc.

Exploring starts: Every state-action pair

has a non-zero probability of being the

starting pair

V. Cliff Walking: TD

07/02/17 Reinforcement Learning 12

Discuss SARSA and Q-learning

procedures with respect to this example