程序代写代做代考 Linear Function Approximation Example

Linear Function Approximation Example

Svetlin Penkov
School of Informatics

Mar 17, 2017

Reinforcement Learning

Grid Game

P0 R P1

M

M

M M M

P2 P3

Artificial Intelligence: Foundations of Computational Agents

http://artint.info/html/ArtInt_272.html#davids-simple-game-features-ex
http://artint.info/html/ArtInt_272.html#davids-simple-game-features-ex

Prize

P0 R P1

M

M

M M M

P2 P3

● Prize could be in one of the corners
or no prize

● r(P) = +10
● When the prize is taken it disappears

until a new prize is respawned with

certain probability.

Damage

P0 R P1

M

M

M M M

P2 P3

● At each timestep a monster can
appear in any of the M cells.

● If a monster appears in the agent’s cell
then the agent gets damaged.

● If the agent has already been
damaged then it receives r(D, M) = -10

● The agent can get repaired by visiting
the R cell

State

● Fully observable environment
● Represent the state as(X, Y, P, D)

○ X: X position of the agent
○ Y: Y position of the agent
○ P: position of the prize (P=4 – no prize)
○ D: 1 if the agent is damaged,

otherwise 0

P0 R P1

M

M

M M M

P2 P3

● State-action value function:

Qw(s, a) = w0 + w1F1(s, a) + … + wnFn(s, a)

Linear Function Approximation

● State-action value function:

Qw(s, a) = w0 + w1F1(s, a) + … + wnFn(s, a)

● What features can we choose?

Linear Function Approximation

Possible Features

Feature Value

F1(s, a)
1 – if action a would most likely take the agent from state s into a location where a
monster could appear;
0 – otherwise

F2(s, a)
1 – if action a would most likely take the agent into a wall;
0 – otherwise

F3(s, a)
1 – if action a would most likely take the agent toward a prize;
0 – otherwise

F4(s, a)
1 – if the agent is damaged and action a would most likely take it to the repair station;
0 – otherwise

F5(s, a)
1 – if the agent is damaged and action a would most likely take it to a monster;
0 – otherwise

Possible Features

Feature Value

F6(s, a)
1 – if the agent is damaged in state s;
0 – otherwise

F7(s, a)
1 – if the agent is not damaged in state s;
0 – otherwise

F8(s, a)
1 – if the agent is damaged and there is a prize in the direction of action a;
0 – otherwise

F9(s, a)
1 – if the agent is not damaged and there is a prize in the direction of action a;
0 – otherwise

Possible Features

Feature Value

F10(s, a) distance from left wall if prize at location P0

F11(s, a) distance from right wall if prize at location P0

F12 – 29(s, a) Similar to F10 and F11 for different wall and prize combinations

● Let δ = r + γQ(s’,a’) – Q(s,a) then update the weights with wi←wi+ηδFi(s,a)

Q(s, a) = 2.0 – 1.0 * F1(s, a) – 0.4 * F2(s, a) – 1.3 * F3(s, a)

– 0.5 * F4(s, a) – 1.2 * F5(s, a) – 1.6 * F6(s, a)

+ 3.5 * F7(s, a) + 0.6 * F8(s, a) + 0.6 * F9(s, a)

– 0.0 * F10(s, a) + 1.0 * F11(s, a) + …

Training with SARSA

Svetlin Penkov
School of Informatics

Mar 17, 2017

Reinforcement Learning

Setup

● The README file provides lots of information
● On a DICE machine:

Sensing

● The same agent interface for playing Enduro

● The sensing capabilities of the agent are enhanced
○ Cars
○ Speed
○ Grid

● Plain 2-dimensional array
containing [x, y] points in pixel
coordinates.

● Note: The image which is
displayed is a scaled version of
the actual game frame.

● Dictionary containing the size and
location of each car in the game frame

● (x, y) top left pixel coordinate
● (w, h) size in pixels

{ ‘self’: (x, y, w, h),
‘others’: [(x1, y1, w1, h1),
(x2, y2, w2, h2),
(x3, y3, w3, h3)]}

Cars

● Speed relative to the opponents in the range [-50, 50]

● The speed is set to -50 when the agent collides

● If the agent is moving as fast as possible its speed is 50

Speed

Grid

Row 0

0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 2 0 0 0 0

Row 10

Col 0 Col 9

Questions?

Posted in Uncategorized