CS代考计算机代写 algorithm DNA CS 561a: Introduction to Artificial Intelligence
CS 561a: Introduction to Artificial Intelligence
CS 561, Sessions 45
1
This time: informed search
Informed search:
Use heuristics to guide the search
Best first
A*
Heuristics
Hillclimbing
Simulated annealing
CS 561, Sessions 45
2
Bestfirst search
Idea:
use an evaluation function for each node; estimate of “desirability”
expand most desirable unexpanded node.
Implementation:
QueueingFn = insert successors in decreasing order of desirability
Special cases:
greedy search
A* search
CS 561, Sessions 45
3
Romania with step costs in km
374
329
253
CS 561, Sessions 45
4
Greedy search
Estimation function:
h(n) = estimate of cost from n to goal (heuristic)
For example:
hSLD(n) = straightline distance from n to Bucharest
Greedy search expands first the node that appears to be closest to the goal, according to h(n).
CS 561, Sessions 45
5
CS 561, Sessions 45
6
CS 561, Sessions 45
7
CS 561, Sessions 45
8
CS 561, Sessions 45
9
Properties of Greedy Search
Complete?
Time?
Space?
Optimal?
CS 561, Sessions 45
10
Properties of Greedy Search
Complete? No – can get stuck in loops
e.g., Iasi > Neamt > Iasi > Neamt > …
Complete in finite space with repeatedstate checking.
Time? O(b^m) but a good heuristic can give
dramatic improvement
Space? O(b^m) – keeps all nodes in memory
Optimal? No.
CS 561, Sessions 45
11
A* search
Idea: avoid expanding paths that are already expensive
evaluation function: f(n) = g(n) + h(n) with:
g(n) – cost so far to reach n
h(n) – estimated cost to goal from n
f(n) – estimated total cost of path through n to goal
A* search uses an admissible heuristic, that is,
h(n) h*(n) where h*(n) is the true cost from n.
For example: hSLD(n) never overestimates actual road distance.
Theorem: A* search is optimal
CS 561, Sessions 45
12
A* search
A* search uses an admissible heuristic, that is,
h(n) h*(n) where h*(n) is the true cost from n.
Theorem: A* search is optimal
Note: A* is also optimal if the heuristic is consistent, i.e.,
(a consistent heuristic is admissible (by induction), but the converse is not always true)
Actual cost
From N to P
CS 561, Sessions 45
13
CS 561, Sessions 45
14
CS 561, Sessions 45
15
CS 561, Sessions 45
16
CS 561, Sessions 45
17
CS 561, Sessions 45
18
CS 561, Sessions 45
19
1
Optimality of A* (standard proof)
Suppose some suboptimal goal G2 has been generated and is in the queue. Let n be an unexpanded node on a shortest path to an optimal goal G1.
CS 561, Sessions 45
20
Optimality of A* (more useful proof)
CS 561, Sessions 45
21
fcontours
How do the contours look like when h(n) =0?
CS 561, Sessions 45
22
Properties of A*
Complete?
Time?
Space?
Optimal?
CS 561, Sessions 45
23
Properties of A*
Complete? Yes, unless infinitely many nodes with f f(G)
Time? Exponential in [(relative error in h) x (length of solution)]
Space? Keeps all nodes in memory
Optimal? Yes – cannot expand fi+1 until fi is finished
CS 561, Sessions 45
24
Proof of lemma: pathmax
CS 561, Sessions 45
25
Admissible heuristics
CS 561, Sessions 45
26
Admissible heuristics
CS 561, Sessions 45
27
Relaxed Problem
Admissible heuristics can be derived from the exact solution cost of a relaxed version of the problem.
If the rules of the 8puzzle are relaxed so that a tile can move anywhere, then h1(n) gives the shortest solution.
If the rules are relaxed so that a tile can move to any adjacent square, then h2(n) gives the shortest solution.
CS 561, Sessions 45
28
This time
Iterative improvement
Hill climbing
Simulated annealing
CS 561, Sessions 45
29
Iterative improvement
In many optimization problems, path is irrelevant;
the goal state itself is the solution.
Then, state space = space of “complete” configurations.
Algorithm goal:
– find optimal configuration (e.g., TSP), or,
– find configuration satisfying constraints
(e.g., nqueens)
In such cases, can use iterative improvement algorithms: keep a single “current” state, and try to improve it.
CS 561, Sessions 45
30
Iterative improvement example: vacuum world
Simplified world: 2 locations, each may or not contain dirt,
each may or not contain vacuuming agent.
Goal of agent: clean up the dirt.
If path does not matter, do not need to keep track of it.
CS 561, Sessions 45
31
Iterative improvement example: nqueens
Goal: Put n chessgame queens on an n x n board, with no two queens on the same row, column, or diagonal.
Here, goal state is initially unknown but is specified by constraints that it must satisfy.
CS 561, Sessions 45
32
Hill climbing (or gradient ascent/descent)
Iteratively maximize “value” of current state, by replacing it by successor state that has highest value, as long as possible.
CS 561, Sessions 45
33
Hill climbing
Note: minimizing a “value” function v(n) is equivalent to maximizing –v(n),
thus both notions are used interchangeably.
Notion of “extremization”: find extrema (minima or maxima) of a value function.
CS 561, Sessions 45
34
Hill climbing
Problem: depending on initial state, may get stuck in local extremum.
CS 561, Sessions 45
35
Minimizing energy
Let’s now change the formulation of the problem a bit, so that we can employ new formalism:
– let’s compare our state space to that of a physical system that is subject to natural interactions,
– and let’s compare our value function to the overall potential energy E of the system.
On every updating,
we have DE 0
CS 561, Sessions 45
36
Minimizing energy
Hence the dynamics of the system tend to move E toward a minimum.
We stress that there may be different such states — they are local minima. Global minimization is not guaranteed.
CS 561, Sessions 45
37
Local Minima Problem
Question: How do you avoid this local minimum?
starting
point
descend
direction
local minimum
global minimum
barrier to local search
CS 561, Sessions 45
38
Boltzmann machines
h
The Boltzmann Machine of
Hinton, Sejnowski, and Ackley (1984)
uses simulated annealing to escape local minima.
To motivate their solution, consider how one might get a ballbearing traveling along the curve to “probably end up” in the deepest minimum. The idea is to shake the box “about h hard” — then the ball is more likely to go from D to C than from C to D. So, on average, the ball should end up in C’s valley.
CS 561, Sessions 45
39
Consequences of the Occasional Ascents
Help escaping the
local optima.
desired effect
Might pass global optima
after reaching it
adverse effect
(easy to avoid by
keeping track of
bestever state)
CS 561, Sessions 45
40
Simulated annealing: basic idea
From current state, pick a random successor state;
If it has better value than current state, then “accept the transition,” that is, use successor state as current state;
Otherwise, do not give up, but instead flip a coin and accept the transition with a given probability (that is lower as the successor is worse).
So we accept to sometimes “unoptimize” the value function a little with a nonzero probability.
CS 561, Sessions 45
41
Demo
CS 561, Sessions 45
42
Boltzmann’s statistical theory of gases
In the statistical theory of gases, the gas is described not by a deterministic dynamics, but rather by the probability that it will be in different states.
The 19th century physicist Ludwig Boltzmann developed a theory that included a probability distribution of temperature (i.e., every small region of the gas had the same kinetic energy).
Hinton, Sejnowski and Ackley’s idea was that this distribution might also be used to describe neural interactions, where low temperature T is replaced by a small noise term T (the neural analog of random thermal motion of molecules). While their results primarily concern optimization using neural networks, the idea is more general.
CS 561, Sessions 45
43
Boltzmann distribution
At thermal equilibrium at temperature T, the Boltzmann distribution gives the relative probability that the system will occupy state A vs. state B as:
where E(A) and E(B) are the energies associated with states A and B.
CS 561, Sessions 45
44
Simulated annealing
Kirkpatrick et al. 1983:
Simulated annealing is a general method for making likely the escape from local minima by allowing jumps to higher energy states.
The analogy here is with the process of annealing used by a craftsman in forging a sword from an alloy.
He heats the metal, then slowly cools it as he hammers the blade into shape.
If he cools the blade too quickly the metal will form patches of different composition;
If the metal is cooled slowly while it is shaped, the constituent metals will form a uniform alloy.
CS 561, Sessions 45
45
Simulated annealing in practice
set T
optimize for given T
lower T (see Geman & Geman, 1984)
repeat
Geman & Geman (1984): if T is lowered sufficiently slowly (with respect to the number of iterations used to optimize at a given T), simulated annealing is guaranteed to find the global minimum.
Caveat: this algorithm has no end (Geman & Geman’s T decrease schedule is in the 1/log of the number of iterations, so, T will never reach zero), so it may take an infinite amount of time for it to find the global minimum.
CS 561, Sessions 45
46
Simulated annealing algorithm
Idea: Escape local extrema by allowing “bad moves,” but gradually decrease their size and frequency.
Note: goal here is to
maximize E.
–
CS 561, Sessions 45
47
Simulated annealing algorithm
Idea: Escape local extrema by allowing “bad moves,” but gradually decrease their size and frequency.
Algorithm when goal
is to minimize E.
<


CS 561, Sessions 45
48
Note on simulated annealing: limit cases
Boltzmann distribution: accept “bad move” with E<0 (goal is to maximize E) with probability P(E) = exp(E/T)
If T is large: E < 0
E/T < 0 and very small
exp(E/T) close to 1
accept bad move with high probability
If T is near 0: E < 0
E/T < 0 and very large
exp(E/T) close to 0
accept bad move with low probability
CS 561, Sessions 45
49
Note on simulated annealing: limit cases
Boltzmann distribution: accept “bad move” with E<0 (goal is to maximize E) with probability P(E) = exp(E/T)
If T is large: E < 0
E/T < 0 and very small
exp(E/T) close to 1
accept bad move with high probability
If T is near 0: E < 0
E/T < 0 and very large
exp(E/T) close to 0
accept bad move with low probability
Random walk
Deterministic
uphill
50
Genetic Algorithms
50
CS 561
51
How do you find a solution in a large complex space?
Ask an expert?
Adapt existing designs?
Trial and error?
51
What has been traditionally done in the heat exchanger world is to take a best “guess” at a design with the help of an expert. Traditional heat exchanger designs are usually fully mathematically described.
This initial guess can then be used as a basis, and various parameters are “tweaked”. The performance of the heat exchanger can be recalculated to see if the modified design is an improvement on the original one.
Surely there must be a better way? There is  we don’t need to look very far to find examples of optimisation in Nature.
CS 561
52
Example: Traveling Sales Person (TSP)
Classic Example: You have N cities, find the shortest route such that your salesperson will visit each city once and return.
This problem is known to be NPHard
As a new city is added to the problem, computation time in the classic solution increases exponentially O(2n) … (as far as we know)
Dallas
Houston
San Antonio
Austin
Mos Eisley
Is this the shortest path???
A Texas Sales Person
What if………
Lets create a whole bunch of random sales people and see how well they do and pick the best one(s).
Salesperson A
Houston > Dallas > Austin > San Antonio > Mos Eisely
Distance Traveled 780 Km
Salesperson B
Houston > Mos Eisley > Austin > San Antonio > Dallas
Distance Traveled 820 Km
Salesperson A is better (more fit) than salesperson B
Perhaps we would like sales people to be more like A and less like B
Question:
do we want to just keep picking random sales people like this and keep testing them?
CS 561
53
CS 561
54
Represent problem like a DNA sequence
San Antonio
Dallas
Mos Eisely
Houston
Austin
Dallas
Houston
Mos Eisely
San Antonio
Austin
Each DNA sequence is an encoding of a
possible solution to the problem.
DNA – Salesperson A
DNA – Salesperson B
The order of the cities in
the genes is the order of
the cities the TSP will take.
54
One the fitness has been assigned, pairs of chromosomes representing heat exchanger designs can be chosen for mating.
The higher the fitness, the greater the probability of the design being selected. Consequently, some of the weaker population members do not mate at all, whilst superior ones are chosen many times.
It is even statistically possible for a member to be chosen to mate with itself. This has no advantage, as the offspring will be identical to the parent.
CS 561
55
Ranking by Fitness:
Here we’ve created three different salespeople. We then checked to see how
far each one has to travel. This gives us a measure of “Fitness”
Note: we need to be able to measure fitness in polynomial time, otherwise we are in trouble.
Travels Shortest Distance
55
One the population has been formed, (either randomly in the initial generation, or by mating in subsequent generations), each population member needs to be assessed against the desired properties – such a rating is called a “fitness”.
The design parameters represented by the zeros and ones in the binary code of each chromosome are fed into the mathematical model describing the heat exchanger. The output parameters for each design are used to give the fitness rating. A good design has a high fitness value, and a poor design a lower value.
Let’s breed them!
We have a population of traveling sales people. We also know their fitness based on how long their trip is. We want to create more, but we don’t want to create too many.
We take the notion that the salespeople who perform better are closer to the optimal salesperson than the ones which performed more poorly. Could the optimal sales person be a “combination” of the better sales people?
We create a population of sales people as solutions to the problem.
How do we actually mate a population of data???
CS 561
56
CS 561
57
Crossover:
Exchanging information through some part of information (representation)
Once we have found the best sales people we will in a sense mate them. We can do this in several ways. Better sales people should mate more often and poor sales people should mate lest often.
Sales People City DNA
Parent 1 F A B  E C G D
Parent 2 D E A  C G B F
Child 1 F A B  C G B F
Child 2 D E A  E C G D
Sales person A (parent 1)
Sales person B (parent 2)
Sales person C (child 1)
Sales person D (child 2)
57
The mating process is analogous to crossover carried out in living cells.
A pair of binary strings are used. A site along the length of the string is chosen randomly. In this example it is shown between the 6th and 7th bits, but it could be anywhere.
Both members of the pair are severed at that site, and their latter portions are exchanged. Two parents form two children, and these two “daughter” designs become members of the population for the next generation.
This process takes place for each pair selected, so the new population has the same number of members as the previous generation.
Crossover Bounds (Houston we have a problem)
Not all crossed pairs are viable. We can only visit a city once.
Different GA problems may have different bounds.
CS 561
58
San Antonio
Dallas
Mos Eisely
Houston
Austin
Dallas
Houston
Austin
San Antonio
Mos Eisely
Dallas
Houston
Mos Eisely
Houston
Austin
San Antonio
Dallas
Austin
San Antonio
Mos Eisely
Parents
Children
Not Viable!!
TSP needs some special rules for crossover
Many GA problems also need special crossover rules.
Since each genetic sequence contains all the cities in the travel, crossover is a swapping of travel order.
Remember that crossover also needs to be efficient.
CS 561
59
San Antonio
Dallas
Mos Eisely
Houston
Austin
Dallas
Mos Eisely
Houston
San
Antonio
Austin
San Antonio
Dallas
Houston
Austin
Mos Eisely
Parents
Children
Viable
Dallas
Houston
Austin
San Antonio
Mos Eisely
What about local extrema?
With just crossover breading, we are constrained to gene sequences which are a cross product of our current population.
Introduce random effects into our population.
Mutation – Randomly twiddle the genes with some probability.
Cataclysm – Kill off n% of your population and create fresh new salespeople if it looks like you are reaching a local minimum.
Annealing of Mating Pairs – Accept the mating of suboptimal pairs with some probability.
Etc…
CS 561
60
CS 561
61
In summation: The GA Cycle
Fitness
Selection
Crossover
Mutation
New Population
(see, e.g., Wikipedia on Fitness
proportionate selection)
61
Summary of the previous steps to the model.
Populations are continuously produced, going round the outer loop of this diagram, until the desired amount of optimisation has been achieved.
CS 561
62
Demo
62
Summary of the previous steps to the model.
Populations are continuously produced, going round the outer loop of this diagram, until the desired amount of optimisation has been achieved.
GA and TSP: the claims
Can solve for over 3500 cities (still took over 1 CPU years).
Maybe holds the record.
Will get within 2% of the optimal solution.
This means that it’s not a solution per se, but is an approximation.
CS 561
63
GA Discussion
We can apply the GA solution to any problem where the we can represent the problems solution (even very abstractly) as a string.
We can create strings of:
Digits
Labels
Pointers
Code Blocks – This creates new programs from strung together blocks of code. The key is to make sure the code can run.
Whole Programs – Modules or complete programs can be strung together in a series. We can also rearrange the linkages between programs.
The last two are examples of Genetic Programming
CS 561
64
Things to consider
How large is your population?
A large population will take more time to run (you have to test each member for fitness!).
A large population will cover more bases at once.
How do you select your initial population?
You might create a population of approximate solutions. However, some approximations might start you in the wrong position with too much bias.
How will you cross bread your population?
You want to cross bread and select for your best specimens.
Too strict: You will tend towards local minima
Too lax: Your problem will converge slower
How will you mutate your population?
Too little: your problem will tend to get stuck in local minima
Too much: your population will fill with noise and not settle.
CS 561
65
GA is a good no clue approach to problem solving
GA is superb if:
Your space is loaded with lots of weird bumps and local minima.
GA tends to spread out and test a larger subset of your space than many other types of learning/optimization algorithms.
You don’t quite understand the underlying process of your problem space.
NO I DONT: What makes the stock market work??? Don’t know? Me neither! Stock market prediction might thus be good for a GA.
YES I DO: Want to make a program to predict people’s height from personality factors? This might be a Gaussian process and a good candidate for statistical methods which are more efficient.
You have lots of processors
GA’s parallelize very easily!
CS 561
66
Why not use GA?
Creating generations of samples and cross breading them can be resource intensive.
Some problems may be better solved by a general gradient descent method which uses less resource.
However, resourcewise, GA is still quite efficient (no computation of derivatives, etc).
In general if you know the mathematics, shape or underlying process of your problem space, there may be a better solution designed for your specific need.
Consider Kernel Based Learning and Support Vector Machines?
Consider Neural Networks?
Consider Traditional Polynomial Time Algorithms?
Etc.
CS 561
67
More demos: motorcycle design
CS 561
68
CS 561, Sessions 45
69
Summary
Bestfirst search = general search, where the minimumcost nodes (according to some measure) are expanded first.
Greedy search = bestfirst with the estimated cost to reach the goal as a heuristic measure.
– Generally faster than uninformed search
– not optimal
– not complete.
A* search = bestfirst with measure = path cost so far + estimated path cost to goal.
– combines advantages of uniformcost and greedy searches
– complete, optimal and optimally efficient
– space complexity still exponential
Hill climbing and simulated annealing: iteratively improve on current state
– lowest space complexity, just O(1)
– risk of getting stuck in local extrema (unless following proper simulated annealing schedule)
Genetic algorithms: parallelize the search problem
B
C
A
Basin of
Attraction for C
D
E
EMBED Word.Picture.8
_997354951.unknown
)
/
)
(
exp(
)
/
)
(
exp(
)
(
)
(
exp
)
(
)
(
T
A
E
T
B
E
T
B
E
A
E
B
P
A
P
=
÷
ø
ö
ç
è
æ
–
–
=
/docProps/thumbnail.jpeg