http://blog.csdn.net/pipisorry/article/details/48814183
矩阵参数每行代表一个观测值,计算结果就是每行之间的metric距离。Distance matrix computation from a collection of raw observation vectors stored in a rectangular array.
这里计算的是两两之间的距离,而不是相似度,如计算cosine距离后要用1-cosine才能得到相似度。从下面的consine计算公式就可以看出。
观测值(n维)两两之间的距离。Pairwise distances between observations in n-dimensional space.
值越大,相关度越小
Y = pdist(X, ’euclidean’) #d=sqrt((x1-x2)^2+(y1-y2)^2+(z1-z2)^2)
Y = pdist(X, ’minkowski’, p)
...
Computes distance between each pair of the two collections of inputs.
当然XA\XB最简单的形式是一个二维向量(也必须是,否则报错ValueError: XA must be a 2-dimensional array.),计算的就是两个向量之间的metric距离度量。
Converts a vector-form distance vector to a square-form distance matrix, and vice-versa.
注意:Distance matrix 'X' must be symmetric&diagonal must be zero.
示例1
x
array([[0, 2, 3],
[2, 0, 6],
[3, 6, 0]])
y=dis.pdist(x)
Iy
array([ 4.12310563, 5.83095189, 8.54400375])
z=dis.squareform(y)
z
array([[ 0. , 4.12310563, 5.83095189],
[ 4.12310563, 0. , 8.54400375],
[ 5.83095189, 8.54400375, 0. ]])
type(z)
numpy.ndarray
type(y)
numpy.ndarray
示例2
print(sim) print(spatial.distance.cdist(sim[0].reshape((1, 2)), sim[1].reshape((1, 2)), metric='cosine')) print(spatial.distance.pdist(sim, metric='cosine'))[[-2.85 -0.45]
[[ 0.14790689]]
[ 0.14790689]
皮皮blog
is_valid_dm(D[, tol, throw, name, warning]) Returns True if input array is a valid distance matrix.
is_valid_y(y[, warning, throw, name]) Returns True if the input array is a valid condensed distance matrix.
num_obs_dm(d) Returns the number of original observations that correspond to a square, redundant num_obs_y(Y) Returns the number of original observations that correspond to a condensed distance
皮皮blog
皮皮blog
from:http://blog.csdn.net/pipisorry/article/details/48814183
ref:scipy-ref-0.14.0-p933