1. 练习:Admissible Heuristic (可接受启发)
A* 算法的问题
Admissable Heuristic
H(x)<= cost-to-goal
就是说,如果这个启发函数是可接受的话,那么对于任意的节点n,满足:
h(n)<=h*(n)
可参考维基百科解释:https://en.wikipedia.org/wiki/Admissible_heuristic
n 是一个节点
h是启发函数
h(n) 是从n到目标goal时,采用h函数得到的cost(代价)
h*(n)是n到goal的最优代价
2. 练习:Admissible Heuristic 2
3. 练习:Bad Heuristic
如果容许启发式函数不正确的话,可能导致A*搜索函数最终找到一个次优的目标路径。
4. 练习:Diagonal Motion
在之前的练习中,使用过曼哈顿距离来估算h的值,如果地图上允许方格间进行对角线(Diagonal)方向的移动的话。那么估算方式就要调整了。
5. 练习:Stochastic Motion
随机(Stochastic)移动,用浮点数来表示这些值,并由于随机移动性,加入概率值因素。
练习代码:
# -*- coding: utf-8 -*-
# --------------
# USER INSTRUCTIONS
#
# Write a function called stochastic_value that
# returns two grids. The first grid, value, should
# contain the computed value of each cell as shown
# in the video. The second grid, policy, should
# contain the optimum policy for each cell.
#
# --------------
# GRADING NOTES
#
# We will be calling your stochastic_value function
# with several different grids and different values
# of success_prob, collision_cost, and cost_step.
# In order to be marked correct, your function must
# RETURN (it does not have to print) two grids,
# value and policy.
#
# When grading your value grid, we will compare the
# value of each cell with the true value according
# to this model. If your answer for each cell
# is sufficiently close to the correct answer
# (within 0.001), you will be marked as correct.
delta = [[-1, 0 ], # go up
[ 0, -1], # go left
[ 1, 0 ], # go down
[ 0, 1 ]] # go right
delta_name = ['^', '<', 'v', '>'] # Use these when creatingyour policy grid.
# ---------------------------------------------
# Modify the functionstochastic_value below
# ---------------------------------------------
def stochastic_value(grid,goal,cost_step,collision_cost,success_prob):
failure_prob = (1.0 -success_prob)/2.0 # Probability(stepping left) = prob(stepping right) =failure_prob
value = [[collision_costfor col in range(len(grid[0]))] for row in range(len(grid))]
policy = [[' ' for col inrange(len(grid[0]))] for row in range(len(grid))]
isChanged = True
while isChanged:
isChanged = False
for x inrange(len(grid)):
for y inrange(len(grid[0])):
if goal[0] ==x and goal[1] == y:
ifvalue[x][y] > 0:
value[x][y] = 0
policy[x][y] = '*'
isChanged = True
elifgrid[x][y] == 0:
for a inrange(len(delta)):
# a isoriginal position.
v2 = cost_step
for iin range(-1,2):
a2= (a + i + len(delta)) % len(delta)
x2= x + delta[a2][0]
y2= y + delta[a2][1]
p2= 1.0
ifi == 0:
p2 = success_prob
else:
p2 = failure_prob
ifx2 >= 0 and x2 < len(grid) and y2 >=0 and y2 < len(grid[0]) \
and grid[x2][y2] == 0:
v2 += value[x2][y2] * p2
else:
v2 += collision_cost * p2
if v2< value[x][y]:
value[x][y] = v2
policy[x][y] = delta_name[a]
isChanged = True
return value, policy
# ---------------------------------------------
# Use the code below to testyour solution
# ---------------------------------------------
grid = [[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 1, 1, 0]]
goal = [0, len(grid[0])-1] # Goal is in top right corner
cost_step = 1
collision_cost = 1000
success_prob = 0.5
value,policy =stochastic_value(grid,goal,cost_step,collision_cost,success_prob)
for row in value:
print(row)
for row in policy:
print(row)
# Expected outputs:
#
#[471.9397246855924, 274.85364957758316, 161.5599867065471, 0],
#[334.05159958720344, 230.9574434590965, 183.69314862430264,176.69517762501977],
#[398.3517867450282, 277.5898270101976, 246.09263437756917,335.3944132514738],
#[700.1758933725141, 1000, 1000, 668.697206625737]
#
# ['>', 'v', 'v', '*']
# ['>', '>', '^', '<']
# ['>', '^', '^', '<']
# ['^', ' ', ' ', '^']