重新理解pandas.DataFrame.ewm

  • Overview

    《理解pandas.DataFrame.ewm》

    《理解exponential weighted || 指数》

    通过上述两文初步了解了ewm,但还是没能很清楚,几经周折。

  • Source code

    查看DataFrame.ewm()的源码:

    @doc(ExponentialMovingWindow)
    def ewm(
    	self,
        com=None,
        span=None,
        halflife=None,
        alpha=None,
        min_period=0,
        adjust=True,
        ignore_na=False,
        axis=0,
        times=None,
    ):
        axis = self._get_axis_number(axis)
        return ExponentialMovingWindow(
        	self,
            com=com,
            span=span,
            halflife=halflife,
            alpha=alpha,
            min_period=min_periods,
            adjust=adjust,
            ignore_na=ignore_na,
            axis=axis,
            times=times,
        )
    

    其调用了pandas/core/window/ewm.py模块中的ExponentialMovingWindow这个类,provide exponential weighted (EW) functions. Available EW functions : mean(), var(), std(), corr(), cov(). Exactly one parameter: com, span, halflife, alpha must be provided.

    class ExponentialMovingWindow(_Rolling): # inherite from class _Rolling
        _attrubutes = ["com", "min_periods", "adjust", "ignore_na", "axis"]
        def __init__(
        	self,
            obj,
            com: Optional[float] = None,
            span: Optional[float] = None,
            halflife: Optional[Union[float, TimedeltaConvertibleTypes]] = None,
            alpha: Optional[float] = None,
            min_periods: int = 0,
            adjust: bool = True,
            ignore_na: bool = False,
            axis: int = 0,
            times: Optional[Union[str, np.array, FrameOrSeries]] = None,
        ):
            self.com: Optional[float]
            self.obj = obj
            self.min_periods = max(int(min_periods), 1)
            self.adjust = adjust
            self.ignore_na = ignore_na
            self.axis = axis
            self.on = None
            if times is not None:
                if isinstance(time, str):
                    times = self._selected_obj[times] 
                if not is_datetime64_ns_dtype(times):
                    raise ValueError("times must be datetime64[ns] dtype.")
                if len(times) != len(obj):
                    raise ValueError("times must be the same length as the object")
                if not isinstance(halflife, (str, datetime.timedelta)):
                    raise ValueError(
                    	"halflife must be a string or datetime.timedelta object"
                    )
                self.times = np.asarray(times.astype(np.int64))
                self.halflife = Timedelta(halflife).value
                if common.count_not_none(com, span, alpha)>0:
                    self.com  = get_center_of_mass(com, span, None, alpha) # 三选一,通过com span alpha确定质心
                else:
                    self.com = None
            else:
                if halflife is not None and isinstance(halflife, (str, datetime.timedelta)):
                    raise ValueError(
                    	"halflife can only be a timedelta convertible argument if times is not None"
                    )
                self.times = None
                self.halflife = None
                self.com = get_center_of_mass(com, span, halflife, alpha)
        def _apply(self, func):
            np.apply_along_axis(func, self.axis, values)
        @Substitution(name="ewm", func)_name="mean"
        @Appender(_doc_template)
        def mean(self, *args, **kwargs):
            nv.validate_window_func("mean", args, kwargs)
            window_func = partial()
            return self._apply(window_func)
       	@Substitution(name="ewm", func)_name="cov"
        @Appender(_doc_template)
        def cov(self, other, pairwise, bias, **kwargs):
            pairwise=True if pairwise is None else pairwise
    

    通过源码,可以看到。class ExponentialMovingWindow(_Rolling) from pandas/pandas/core/window/ewm.py module define mean(), std(), var(), cov(), corr()等实现方法,每个方法并非直接return数值结果,而是返回一个函数调用,这个函数调用再通过调用numpy底层来最终实现结果。

    pandas.DataFrame.ewm() 返回一个Exponential Moving Windows类

    pandas.DataFrame.ewm().mean() 通过Exponential Moving Windows类调用自身的mean()方法实现结果。

  • Point 1 : adjust = True or False

    The ew functions support two variants of exponential weights.

    The default adjsut = True, uses the weights w i = ( 1 − α ) i w_i=(1-\alpha)^i wi=(1α)i which gives EW moving average:
    y t = x t + ( 1 − α ) x t − 1 + ( 1 − α ) t x 0 1 + ( 1 + α ) + ( 1 + α ) t y_t=\frac{x_t+(1-\alpha)x_{t-1}+(1-\alpha)^tx_{0}}{1+(1+\alpha)+(1+\alpha)^t} yt=1+(1+α)+(1+α)txt+(1α)xt1+(1α)tx0
    When adjust = False is specified, moving average are calculated as
    y 0 = x 0 y t = ( 1 − α ) y t − 1 + α x t y_0=x_0\\y_t=(1-\alpha)y_{t-1}+\alpha x_t y0=x0yt=(1α)yt1+αxt
    The difference between the above two variants because we are dealing with series which have finite history.

  • Point 2 : alpha vs. com,span,halflife

    α = { 1 1 + c        f o r    c o m 2 1 + s        f o r    s p a n 1 − e x p l o g 0.5 h        f o r    h a l f l i f e \alpha = \begin{cases} \frac{1}{1+c}\;\;\;for \;com\\\frac{2}{1+s}\;\;\; for\;span \\ 1-exp^{\frac{log0.5}{h}}\;\;\;for\;halflife\end{cases} α=1+c1forcom1+s2forspan1exphlog0.5forhalflife

    span corresponds to what is commonly called an “N-day EW moving average”

    Center of mass has a more physical interpretation and can be thought of in terms of span : c = s − 1 2 c=\frac{s-1}{2} c=2s1

    Half-life is the period of time for the exponential weighted to reduce to one half.

    Alpha specifies the smoothing factor directly.

  • Point 3 : min_periods

    Has the same meaning it does for all the .expanding and .rolling methods : no output values will be set until at least min_periods non-null values are encountered in the window.

    一个window周期内,至少又min_periods个数据才进行计算,不然就返回Na.

  • Point4 : ignore_na

    ignore_na = False(default) weights are calculated based on absolute positions, so that intermediate null values affect the result.

    ignore_na = True weights are calculated by ignoring intermediate null values.

    No matter which one, 都是在既定window周期内。如果False就是不忽略na值,主要是影响weight指数的数值,False情况下,遇到Na值,权重指数也加1,True则只针对not Na进行权重指数递增。

  • Reference

  1. 3.5 Exponentially Weighted Windows

你可能感兴趣的:(#,小白学Python,ewm,exponential)