np.percentile
用来计算一组数的百分位数,其中50%
分位数就是中位数。函数文档如下
In [1]: import numpy as np
In [2]: np.percentile?
Signature:
np.percentile(
a,
q,
axis=None,
out=None,
overwrite_input=False,
interpolation='linear',
keepdims=False,
)
Docstring:
Compute the q-th percentile of the data along the specified axis.
Returns the q-th percentile(s) of the array elements.
Parameters
----------
a : array_like
Input array or object that can be converted to an array.
q : array_like of float
Percentile or sequence of percentiles to compute, which must be between
0 and 100 inclusive.
axis : {int, tuple of int, None}, optional
Axis or axes along which the percentiles are computed. The
default is to compute the percentile(s) along a flattened
version of the array.
.. versionchanged:: 1.9.0
A tuple of axes is supported
out : ndarray, optional
Alternative output array in which to place the result. It must
have the same shape and buffer length as the expected output,
but the type (of the output) will be cast if necessary.
overwrite_input : bool, optional
If True, then allow the input array `a` to be modified by intermediate
calculations, to save memory. In this case, the contents of the input
`a` after this function completes is undefined.
interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}
This optional parameter specifies the interpolation method to
use when the desired percentile lies between two data points
``i < j``:
* 'linear': ``i + (j - i) * fraction``, where ``fraction``
is the fractional part of the index surrounded by ``i``
and ``j``.
* 'lower': ``i``.
* 'higher': ``j``.
* 'nearest': ``i`` or ``j``, whichever is nearest.
* 'midpoint': ``(i + j) / 2``.
.. versionadded:: 1.9.0
keepdims : bool, optional
If this is set to True, the axes which are reduced are left in
the result as dimensions with size one. With this option, the
result will broadcast correctly against the original array `a`.
.. versionadded:: 1.9.0
其余的参数我们都忽略,重点来看interpolation
。从代码中我们可以知道np.percentile
默认使用的是linear
即线性插值的方式来计算百分位数。从文档对interpolation
的解释来看,不论哪种计算方式都会涉及到i、j
。注意:i、j
是值,不是下标。那么接下来就介绍i、j
如何计算:
a = [4, 2, 1, 3]
排序之后 a = [1, 2, 3, 4]
loc = 1 + (n - 1) * p
。其中n
是数组长度;p
为百分位数0 <= p <= 1
,例如p = 0.95
表示95%
分位数;loc
表示百分位数是数组中的第几个元素,例如loc = 3.0
表示百分位数是数组中第3
个元素其下标是2
。注意:在Python中如果 p 为浮点数则最终结果为浮点数。loc
小数部分不为零则i = a[loc整数部分 - 1]
;j = a[loc整数部分]
;若loc
为整数或小数部分为零则i = j = 百分位数 = a[loc整数部分 - 1]
。代码示例
import numpy as np
a=np.array(([5, 3, 1, 7, 9]))
a = np.sort(a) # a = [1, 3, 5, 7, 9]
loc = 1 + (5 - 1) * 0.5 # loc=3.0;50% 分位数即中位数
a[int(loc) - 1] # 结果为:5;因为loc小数部分为零所以执行a[loc整数部分 - 1],即a[3 - 1] = a[2] = 5
下面介绍线性插值法计算百分位数
loc
为整数或小数部分为零,则a[loc整数部分 - 1]
即为百分位数。loc
小数部分非零,例如2.4
则表示百分位数是数组中第2.4
个元素,此时百分位数的计算公式如下:a[loc整数部分 - 1] + (a[loc整数部分] - a[loc整数部分 - 1]) * loc小数部分
即i + (j - i) * loc小数部分
。以2.4
为例:第2.4
个元素在第二和第三个元素之间即a[1]
和a[2]
之间,所以结果为a[1] + (a[2] - a[1]) * 0.4
.例1
In [1]: import numpy as np
In [2]: a=np.array(([7, 9, 5, 1, 3]))
In [3]: np.percentile(a,30)
Out[3]: 3.4000000000000004
解析:
a
进行排序得到[1, 3, 5, 7, 9]
loc = 1 + (5 - 1) * 0.3 = 2.2
表示30%
分位数为第数组中第2.2
个元素2.2
小数部分非零,所以百分位数计算公式为:a[1] + (a[2] - a[1]) * 0.2 = 3 + (5 - 3) * 0.2 = 3.4
例2
In [4]: np.percentile(a,50)
Out[4]: 5.0
解析:
a
进行排序得到[1, 3, 5, 7, 9]
loc = 1 + (5 - 1) * 0.5 = 3.0
表示50%
分位数为第数组中第3
个元素3.0
小数部分为零所以百分位数为a[3 - 1]
即a[2]
,结果是5