课程 13:Problem Set 4

课程 13:Problem Set 4

1.   练习:Admissible Heuristic  (可接受启发)

A* 算法的问题

Admissable Heuristic

H(x)<= cost-to-goal

就是说,如果这个启发函数是可接受的话,那么对于任意的节点n,满足:

  h(n)<=h*(n)

可参考维基百科解释:https://en.wikipedia.org/wiki/Admissible_heuristic

    n 是一个节点

   h是启发函数

h(n) 是从n到目标goal时,采用h函数得到的cost(代价)

h*(n)是n到goal的最优代价

 

2.   练习:Admissible Heuristic 2

 

3.   练习:Bad Heuristic

如果容许启发式函数不正确的话,可能导致A*搜索函数最终找到一个次优的目标路径。

4.   练习:Diagonal Motion

在之前的练习中,使用过曼哈顿距离来估算h的值,如果地图上允许方格间进行对角线(Diagonal)方向的移动的话。那么估算方式就要调整了。

5.   练习:Stochastic Motion

随机(Stochastic)移动,用浮点数来表示这些值,并由于随机移动性,加入概率值因素。

练习代码:

# -*- coding: utf-8 -*-

# --------------

# USER INSTRUCTIONS

#

# Write a function called stochastic_value that

# returns two grids. The first grid, value, should

# contain the computed value of each cell as shown

# in the video. The second grid, policy, should

# contain the optimum policy for each cell.

#

# --------------

# GRADING NOTES

#

# We will be calling your stochastic_value function

# with several different grids and different values

# of success_prob, collision_cost, and cost_step.

# In order to be marked correct, your function must

# RETURN (it does not have to print) two grids,

# value and policy.

#

# When grading your value grid, we will compare the

# value of each cell with the true value according

# to this model. If your answer for each cell

# is sufficiently close to the correct answer

# (within 0.001), you will be marked as correct.

 

delta = [[-1, 0 ], # go up

         [ 0, -1], # go left

         [ 1, 0 ], # go down

         [ 0, 1 ]] # go right

 

delta_name = ['^', '<', 'v', '>'] # Use these when creatingyour policy grid.

 

# ---------------------------------------------

#  Modify the functionstochastic_value below

# ---------------------------------------------

 

def stochastic_value(grid,goal,cost_step,collision_cost,success_prob):

    failure_prob = (1.0 -success_prob)/2.0 # Probability(stepping left) = prob(stepping right) =failure_prob

    value = [[collision_costfor col in range(len(grid[0]))] for row in range(len(grid))]

    policy = [[' ' for col inrange(len(grid[0]))] for row in range(len(grid))]

   

    isChanged = True

    while isChanged:

        isChanged = False

       

        for x inrange(len(grid)):

            for y inrange(len(grid[0])):

                if goal[0] ==x and goal[1] == y:

                    ifvalue[x][y] > 0:

                       value[x][y] = 0

                       policy[x][y] = '*'

                       isChanged = True

                elifgrid[x][y] == 0:

                    for a inrange(len(delta)):

                        # a isoriginal position.

                       

                        v2 = cost_step

                        for iin range(-1,2):

 

                            a2= (a + i + len(delta)) % len(delta)

                            x2= x + delta[a2][0]

                            y2= y + delta[a2][1]

                        

                            p2= 1.0

                            ifi == 0:

                               p2 = success_prob

                           else:

                               p2 = failure_prob

                       

                            ifx2 >= 0 and x2 < len(grid) and y2 >=0 and y2 < len(grid[0]) \

                                                  and grid[x2][y2] == 0:

                               v2 += value[x2][y2] * p2

                           else:

                               v2 += collision_cost * p2

 

                        if v2< value[x][y]:

                           value[x][y] = v2

                           policy[x][y] = delta_name[a]

                           isChanged = True

    

    return value, policy

 

# ---------------------------------------------

#  Use the code below to testyour solution

# ---------------------------------------------

 

grid = [[0, 0, 0, 0],

        [0, 0, 0, 0],

        [0, 0, 0, 0],

        [0, 1, 1, 0]]

goal = [0, len(grid[0])-1] # Goal is in top right corner

cost_step = 1

collision_cost = 1000

success_prob = 0.5

 

value,policy =stochastic_value(grid,goal,cost_step,collision_cost,success_prob)

for row in value:

    print(row)

for row in policy:

    print(row)

 

# Expected outputs:

#

#[471.9397246855924, 274.85364957758316, 161.5599867065471, 0],

#[334.05159958720344, 230.9574434590965, 183.69314862430264,176.69517762501977],

#[398.3517867450282, 277.5898270101976, 246.09263437756917,335.3944132514738],

#[700.1758933725141, 1000, 1000, 668.697206625737]

 

 

#

# ['>', 'v', 'v', '*']

# ['>', '>', '^', '<']

# ['>', '^', '^', '<']

# ['^', ' ', ' ', '^']

 

你可能感兴趣的:(课程笔记)