时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们

时间序列预测最大预测误差

入门(Getting Started)

Measurement is the first step that leads to control and eventually improvement.

测量是导致控制并最终改善的第一步。

H. James Harrington

詹姆斯·哈灵顿

In many business applications, the ability to plan ahead is paramount and in the majority of such scenarios, we use forecasts to help us plan ahead. For eg., If I run a retail store, how many boxes of that shampoo should I order today? Look at the Forecast. Will I achieve my financial targets by the end of the year? Let’s forecast and make adjustments if necessary. If I run a bike rental firm, how many bikes do I need to keep at a metro station tomorrow at 4pm?

在许多业务应用程序中,预先计划的能力至关重要,在大多数此类情况下,我们使用预测来帮助我们预先计划。 例如,如果我经营一家零售店,今天应该订购几盒这种洗发水? 查看预测。 我会在年底之前实现财务目标吗? 让我们进行预测并在必要时进行调整。 如果我经营一家自行车租赁公司,明天下午4点我需要在地铁站养多少辆自行车?

If for all of these scenarios, we are taking actions based on the forecast, we should also have an idea about how good those forecasts are. In classical statistics or machine learning, we have a few general loss functions, like the squared error or the absolute error. But because of the way Time Series Forecasting has evolved, there are a lot more ways to assess your performance.

如果对于所有这些情况,我们都基于预测采取了行动,那么我们也应该对这些预测的良好程度有所了解。 在经典统计或机器学习中,我们有一些一般的损失函数,例如平方误差或绝对误差。 但是,由于时间序列预测的发展方式,有很多评估绩效的方法。

In this blog post, let’s explore the different Forecast Error measures through experiments and understand the drawbacks and advantages of each of them.

在此博客文章中,让我们通过实验探索不同的“预测误差”度量,并了解它们各自的弊端和优势。

时间序列预测中的指标 (Metrics in Time Series Forecasting)

There are a few key points which makes the metrics in Time Series Forecasting stand out from the regular metrics in Machine Learning.

有几个关键点使时间序列预测中的指标与机器学习中的常规指标脱颖而出。

1. Temporal Relevance

1.时间相关性

As the name suggests, Time Series Forecasting have the temporal aspect built into it and there are metrics like Cumulative Forecast Error or Forecast Bias which takes this temporal aspect as well.

顾名思义,“时间序列预测”内置了时间方面,并且诸如“累积预测误差”或“预测偏差”之类的指标也采用了该时间方面。

2. Aggregate Metrics

2.汇总指标

In most business use-cases, we would not be forecasting a single time series, rather a set of time series, related or unrelated. And the higher management would not want to look at each of these time series individually, but rather an aggregate measure which tells them directionally how well we are doing the forecasting job. Even for practitioners, this aggregate measure helps them to get an overall sense of the progress they make in modelling.

在大多数业务用例中,我们不会预测单个时间序列,而是一组相关或不相关的时间序列。 而且高层管理人员不想单独查看这些时间序列中的每个时间序列,而是希望通过汇总指标来定向地告诉他们我们在预测工作中的表现如何。 即使对于从业者,这种总体衡量标准也可以帮助他们全面了解建模方面的进展。

3. Over or Under Forecasting

3.高于或低于预测

Another key aspect in forecasting is the concept of over and under forecasting. We would not want the forecasting model to have structural biases which always over or under forecasts. And to combat these, we would want metrics which doesn’t favor either over-forecasting or under-forecasting.

预测的另一个关键方面是预测过度和预测不足的概念。 我们不希望预测模型具有总是高于或低于预测的结构性偏差。 为了解决这些问题,我们希望采用既不偏高预测又不偏低预测的指标。

4. Interpretability

4.可解释性

The final aspect is interpretability. Because these metrics are also used by non-analytics business functions, it needs to be interpretable.

最后一个方面是可解释性。 因为这些度量标准也由非分析业务功能使用,所以它必须是可解释的。

Because of these different use cases, there are a lot of metrics that is used in this space and here we try to unify it under some structure and also critically examine them.

由于这些用例不同,因此在此空间中使用了很多指标,在这里我们尝试将其统一为某种结构,并对其进行严格审查。

预测指标分类 (Taxonomy of Forecast Metrics)

We can classify the different forecast metrics. broadly,. into two buckets — Intrinsic and Extrinsic. Intrinsic measures are the measures which just take the generated forecast and ground truth to compute the metric. Extrinsic measures are measures which use an external reference forecast also in addition to the generated forecast and ground truth to compute the metric.

我们可以对不同的预测指标进行分类。 宽广地,。 分为两个部分-内部和外部。 本质度量是仅采用生成的预测和基础事实来计算度量的度量。 外在测度是除了生成的预测和地面事实以外还使用外部参考预测来计算度量的测度。

Let’s stick with the intrinsic measures for now(Extrinsic ones require a whole different take on these metrics). There are four major ways in which we calculate errors — Absolute Error, Squared Error, Percent Error and Symmetric Error. All the metrics that come under these are just different aggregations of these fundamental errors. So, without loss of generality, we can discuss about these broad sections and they would apply to all the metrics under these heads as well.

现在让我们继续使用内在度量(外在度量需要对这些度量采取完全不同的处理)。 我们有四种主要的计算误差的方法-绝对误差,平方误差,百分比误差和对称误差。 这些指标下的所有指标只是这些基本错误的不同汇总。 因此,在不失一般性的前提下,我们可以讨论这些广泛的部分,它们也将适用于这些主题下的所有指标。

绝对误差 (Absolute Error)

This group of error measurement uses the absolute value of the error as the foundation.

这组误差测量以误差的绝对值为基础。

Image for post

平方误差 (Squared Error)

Instead of taking the absolute, we square the errors to make it positive, and this is the foundation for these metrics.

我们将误差平方成正数,而不是取绝对值,这是这些指标的基础。

Image for post

误差百分比 (Percent Error)

In this group of error measurement, we scale the absolute error by the ground truth to convert it into a percentage term.

在这组误差测量中,我们根据基本事实对绝对误差进行缩放,以将其转换为百分比项。

Image for post

对称误差 (Symmetric Error)

Symmetric Error was proposed as an alternative to Percent Error, where we take the average of forecast and ground truth as the base on which to scale the absolute error.

提出了“对称误差”作为“百分比误差”的替代方法,在“百分比误差”中,我们将预测和地面真实情况的平均值作为缩放绝对误差的基础。

Image for post

实验 (Experiments)

Instead of just saying that these are the drawbacks and advantages of such and such metrics, let’s design a few experiments and see for ourselves what those advantages and disadvantages are.

我们不只是说这些是此类指标的弊端和优势,而是让我们设计一些实验并亲自了解一下这些优缺点是什么。

规模依赖 (Scale Dependency)

In this experiment, we try and figure out the impact of the scale of timeseries in aggregated measures. For this experiment, we

在此实验中,我们尝试找出时间序列规模对汇总度量的影响。 对于本实验,我们

  1. Generate 10000 synthetic time series at different scales, but with same error.

    生成10000个不同比例的合成时间序列,但误差相同。
  2. Split these series into 10 histogram bins

    将这些系列划分为10个直方图箱
  3. Sample Size = 5000; Iterate over each bin

    样本大小= 5000; 遍历每个垃圾箱
  4. Sample 50% from current bin and res, equally distributed, from other bins.

    从当前箱中取样50%,从其他箱中平均分配资源。
  5. Calculate the aggregate measures on this set of time series

    计算这组时间序列的合计度量
  6. Record against the bin lower edge

    记录在纸槽下边缘
  7. Plot the aggregate measures against the bin edges.

    相对于垃圾箱边缘绘制总体度量。

对称性 (Symmetricity)

The error measure should be symmetric to the inputs, i.e. Forecast and Ground Truth. If we interchange the forecast and actuals, ideally the error metric should return the same value.

误差度量应与输入对称,即“预测”和“地面真相”。 如果我们将预测值与实际值互换,则理想情况下,误差指标应返回相同的值。

To test this, let’s make a grid of 0 to 10 for both actuals and forecast and calculate the error metrics on that grid.

为了测试这一点,让我们将实际值和预测值都设为0到10的网格,并计算该网格上的错误度量。

互补对 (Complementary Pairs)

In this experiment, we take complementary pairs of ground truths and forecasts which add up to a constant quantity and measure the performance at each point. Specifically, we use the same setup as we did the Symmetricity experiment, and calculate the points along the cross diagonal where ground truth + forecast always adds up to 10.

在此实验中,我们采用互补的基础事实和预测对,它们加起来为一个常数,并测量每个点的性能。 具体来说,我们使用与对称性实验相同的设置,并计算沿对角线的点,其中地面真实+预测总和为10。

损耗曲线 (Loss Curves)

Our metrics depend on two entities — forecast and ground truth. We can fix one and vary the other one using a symmetric range of errors((for eg. -10 to 10), then we expect the metric to behave the same way on both sides of that range. In our experiment, we chose to fix the Ground Truth because in reality, that is the fixed quantity, and we are measure the forecast against ground truth.

我们的指标取决于两个实体-预测和真实情况。 我们可以使用对称的误差范围(例如-10到10)来修正一个误差,并改变另一个误差,然后我们期望该指标在该误差范围的两侧表现相同。在我们的实验中,我们选择了修正地面真理,因为实际上这是固定数量,我们正在根据地面真理来衡量预测。

上下预测实验 (Over & Under Forecasting Experiment)

In this experiment, we generate 4 random time series — ground truth, baseline forecast, low forecast and high forecast. These are just random numbers generated within a range. Ground Truth and Baseline Forecast are random numbers generated between 2 and 4. Low forecast is a random number generated between 0 and 3 and High Forecast is a random number generated between 3 and 6. In this setup, the Baseline Forecast should act as a baseline for us, Low Forecast is a forecast where we continuously under-forecast, and High Forecast is a forecast where we continuously over-forecast. And now let’s calculate the MAPE for these three forecasts and repeat the experiment for 1000 times.

在此实验中,我们生成4个随机时间序列-地面真实情况,基线预测,低预测和高预测。 这些只是一个范围内生成的随机数。 Ground Truth和Baseline Forecast是在2到4之间生成的随机数。Low Forecast是在0到3之间生成的随机数,High Forecast是在3到6之间生成的随机数。在此设置中,Baseline Forecast应该充当基线对我们来说,低预测是我们不断进行低预测的预测,高预测是我们不断进行高预测的预测。 现在,我们为这三个预测计算MAPE,并重复进行1000次实验。

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第1张图片

异常影响 (Outlier Impact)

To check the impact on outliers, we set up the below experiment.

为了检查对异常值的影响,我们设置了以下实验。

We want to check the relative impact of outliers on two axes — number of outliers, scale of outliers. So we define a grid — number of outliers [0%-40%] and scale of outliers [0 to 2]. Then we picked a synthetic time series at random, and iteratively introduced outliers according to the parameters of the grid we defined earlier and recorded the error measures.

我们要检查离群值在两个轴上的相对影响-离群值数量,离群值规模。 因此,我们定义了一个网格-离群值[0%-40%]和离群值[0至2]。 然后,我们随机选择一个合成时间序列,并根据我们先前定义的网格参数迭代引入离群值,并记录误差度量。

结果与讨论 (Results and Discussion)

绝对误差(Absolute Error)

对称性(Symmetricity)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第2张图片

That’s a nice symmetric heatmap. We see zero errors along the diagonal, and higher errors spanning away from it in a nice symmetric pattern.

这是一个很好的对称热图。 我们在对角线上看到零误差,而在对角线上有一个很好的对称图案,误差更大。

损耗曲线 (Loss Curves)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第3张图片

Again symmetric. MAE varies equally if we go on both sides of the curve.

再次对称。 如果我们在曲线的两边走,MAE的变化也一样。

互补对 (Complementary Pairs)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第4张图片

Again good news. If we vary forecast, keeping actuals constant, and vice versa the variation in the metric is also symmetric.

再次是好消息。 如果我们改变预测,保持实际不变,反之亦然,则指标的变化也是对称的。

过度和不足预测 (Over and Under Forecasting)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第5张图片

As expected, over or under forecasting doesn’t make much of a difference in MAE. Both are equally penalized.

不出所料,MAE的高估或低估并没有太大改变。 两者均受到同等处罚。

规模依赖 (Scale Dependency)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第6张图片

This is the Achilles heel of MAE. here, as we increase the base level of the time-series, we can see that the MAE increases linearly. This means that when we are comparing performances across timeseries, this is not the measure you want to use. For eg., when comparing two timeseries, one with a level of 5 and another with a level of 100, using MAE would always assign a higher error to the timeseries with level 100. Another example is when you want to compare different sub-sections of your set of timeseries to see where the error is higher(for eg. different product categories, etc.), then using MAE would always tell you that the sub-section which has a higher average sales would also have a higher MAE, but that doesn’t mean that sub-section is not doing well.

这是MAE的致命弱点。 在这里,随着我们增加时间序列的基本水平,我们可以看到MAE呈线性增加。 这意味着当我们比较整个时间序列的性能时,这不是您要使用的度量。 例如,当比较两个时间序列(一个级别为5,另一个级别为100)时,使用MAE总是会给级别100的时间序列分配更高的误差。另一个示例是当您要比较不同的子部分时时间序列集以查看误差较高的位置(例如,不同的产品类别等),那么使用MAE总是会告诉您,具有较高平均销售额的子部分也将具有较高的MAE,但是这并不意味着该小节的表现不佳。

平方误差 (Squared Error)

对称性(Symmetricity)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第7张图片

Squared Error also shows the symmetry we are looking for. But one additional point we can see here is that the errors are skewed towards higher errors. The distribution of color from the diagonal is not as uniform as we saw in Absolute Error. This is because the squared error(because of the square term), assigns higher impact to higher errors that lower errors. This is also why Squared Errors are, typically, more prone to distortion due to outliers.

平方误差还显示了我们正在寻找的对称性。 但是我们在这里可以看到的另一点是,错误倾向于更高的错误。 对角线的颜色分布不像我们在“绝对误差”中看到的那样均匀。 这是因为平方误差(由于平方项)将较高的影响分配给较低的误差。 这也是为什么平方误差通常更容易由于异常值而导致失真的原因。

Side Note: Since squared error and absolute error are also used as loss functions in many machine learning algorithms, this also has the implications on the training of such algorithms. If we choose squared error loss, we are less sensitive to smaller errors and more to higher ones. And if we choose absolute error, we penalize higher and lower errors equally and therefore a single outlier will not influence the total loss that much.

旁注:由于平方误差和绝对误差在许多机器学习算法中也用作损失函数,因此也对训练此类算法有影响。 如果选择平方误差损失,则对较小误差的敏感性较低,而对较高误差的敏感性较高。 而且,如果我们选择绝对误差,我们将同等地惩罚较高和较低的误差,因此单个异常值将不会对总损失产生太大的影响。

损耗曲线 (Loss Curves)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第8张图片

We can see the same pattern here as well. It is symmetric around the origin, but because of the quadratic form, higher errors are having disproportionately more error as compared to lower ones.

我们在这里也可以看到相同的模式。 它围绕原点对称,但是由于是二次形式,与较低的误差相比,较高的误差具有更多的误差。

互补对 (Complementary Pairs)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第9张图片

过度和不足预测(Over and Under Forecasting)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第10张图片

Similar to MAE, because of the symmetry, Over and Under Forecasting has pretty much the same impact.

与MAE相似,由于对称性,“过度预测”和“欠预测”具有几乎相同的影响。

规模依赖 (Scale Dependency)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第11张图片

Similar to MAE, RMSE also has the scale dependency problem, which means that all the disadvantages we discussed for MAE, applied here as well, but worse. We can see that RMSE scales quadratically when we increase the scale.

与MAE相似,RMSE也存在规模依赖问题,这意味着我们针对MAE讨论的所有缺点在这里也适用,但更糟。 我们可以看到,当我们增加规模时,RMSE呈二次方规模。

误差百分比 (Percent Error)

Percent Error is the most popular error measure used in the industry. A couple of reasons why it is hugely popular are:

百分比误差是行业最常用误差测量。 它之所以大受欢迎的原因有两个:

  1. Scale Independent — As we saw in the scale dependency plots earlier, the MAPE line is flat as we increase the scale of the timeseries.

    与比例无关—如我们先前在比例相关图中所看到的,随着我们增加时间序列的比例,MAPE线是平坦的。
  2. Interpretability — Since the error is represented as a percentage term, which is quite popular and interpretable, the error measure also instantly becomes interpretable. If we say the RMSE is 32, it doesn’t mean anything in isolation. But on the other hand, if we say the MAPE is 20%, we instantly know ho good or bad the forecast is.

    可解释性-由于误差是用百分率表示的,它非常流行并且可以解释,因此误差度量也立即变得可以解释。 如果我们说RMSE是32,那并不意味着孤立。 但是,另一方面,如果我们说MAPE为20%,我们会立即知道预测的好坏。

对称性 (Symmetricity)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第12张图片

Now that doesn’t look right, does it? Percent Error, the most popular of them all, doesn’t look symmetric at all. In fact, we can see that the errors peak when actuals is close to zero and tending to infinity when actuals is zero(the colorless band at the bottom is where the error is infinity because of division by zero).

现在看起来不对,是吗? 百分比误差(最普遍的百分比误差)看起来根本不是对称的。 实际上,我们可以看到,当实际值接近零时,误差达到峰值,而当实际值为零时,误差趋于无穷大(底部的无色带是由于被零除而导致误差为无穷大)。

We can see two shortcomings of the percent error here:

我们可以在此处看到百分比误差的两个缺点:

  1. It is undefined when ground truth is zero(because of division by zero)

    当地面实况为零时(由于除以零),它是未定义的
  2. It assigns higher error when ground truth value is lower(top right corner)

    当地面真实值较低时(右上角),它将分配较高的误差

Let’s look at the Loss Curves and Complementary Pairs plots to understand more.

让我们看一下损耗曲线和互补对图,以了解更多信息。

损耗曲线 (Loss Curves)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第13张图片

Suddenly, the asymmetry we are seeing is no more. If we keep the ground truth fixed, Percent Error is symmetric around the origin.

突然之间,我们看到的不对称不再存在。 如果我们保持基本真理不变,则误差百分数围绕原点对称。

互补对 (Complementary Pairs)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第14张图片

But when we look at complementary pairs, we see the asymmetry we were seeing earlier in the heatmap. When the actuals are low, the same error is having a much higher Percent Error than the same error when the forecast was low.

但是,当我们看互补对时,我们会看到我们之前在热图中看到的不对称性。 当实际值较低时,与预测值较低时相比,同一错误的百分比误差要高得多。

All of this is because of the base which we take for scaling it. Even if we have the same magnitude of error, if the ground truth is low, the percent error will be high and vice versa. For example, let’s review two cases:

所有这些都是因为我们扩展它的基础。 即使我们具有相同的误差幅度,但如果基本事实较低,则百分比误差也将较高,反之亦然。 例如,让我们回顾两种情况:

  1. F = 8, A=2 -> Absolute Percent Error =(8–2)/2 = 3

    F = 8,A = 2->绝对百分比误差=(8–2)/ 2 = 3
  2. F=2, A=8 -> Absolute Percent Error =(8–2)/8 = 0.75

    F = 2,A = 8->绝对百分比误差=(8–2)/ 8 = 0.75

There are countless papers and blogs which claim the asymmetry of percent error to be a deal breaker. The popular claim is that absolute percent error penalizes over-forecasting more than under-forecasting, or in other words, it incentivizes under-forecasting.

有无数的论文和博客声称百分比误差的不对称性会破坏交易。 普遍的说法是,绝对百分比误差对过度预测的惩罚要大于预测不足的惩罚,或者换句话说,它会刺激预测不足。

One argument against this point is that this asymmetry is only there because we change the ground truth. An error of 6 for a time series which has an expected value of 2 is much more serious than an error of 2 for a time series which has an expected value of 6. So according to that intuition, the percent error is doing what it is supposed to do, isn’t it?

一个反对这一观点的论点是,这种不对称之所以存在,只是因为我们改变了基本事实。 预期值为2的时间序列的误差为6,比预期值为6的时间序列的误差为2严重得多。因此,根据这种直觉,百分比误差在做什么应该做的,不是吗?

过度和不足预测 (Over and Under Forecasting)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第15张图片

Not exactly. On some levels the criticism of percent error is rightly justified. Here we see that the forecast where we were under-forecasting has a consistently lower MAPE than the ones where we were over-forecasting. The spread of the low MAPE is also considerably lower than the others. But does that mean that the forecast which always predicts on the lower side is the better forecast as far as the business is concerned? Absolutely not. In a Supply Chain, that leads to stock outs, which is not where you want to be if you want to stay competitive in the market.

不完全是。 在某种程度上,对误差百分比的批评是合理的。 在这里,我们看到预测不足的预测的MAPE始终低于预测过度的预测。 低MAPE的价差也大大低于其他价差。 但这是否意味着就业务而言,总是在较低端进行预测的预测是更好的预测? 绝对不。 在供应链中,这会导致缺货,如果您想保持市场竞争力,那不是您想要的。

对称误差 (Symmetric Error)

Symmetric Error was proposed as a better alternative to Percent error. There were two key disadvantages for Percent Error — Undefined when Ground Truth is zero and Asymmetry. And Symmetric Error proposed to solve both by using the average of ground truth and forecast as the base over which we calculate the percent error.

提出了对称误差作为百分比误差的更好替代方案。 百分比误差有两个关键的缺点-地面真值为零时不确定和不对称。 对称误差提出了以地面真实值和预测平均值作为基础来计算百分比误差的方法来解决。

对称性 (Symmetricity)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第16张图片

Right off the bat, we can see that this is symmetric around the diagonal, almost similar to Absolute Error in case of symmetry. And the bottom bar which was empty, now has colors(which means they are not undefined). But a closer look reveals something more. It is not symmetric around the second diagonal. We see the errors are higher when both actuals and forecast are low.

马上,我们可以看到这是围绕对角线对称的,几乎类似于对称情况下的绝对误差。 并且底部的栏为空,现在具有颜色(这意味着它们不是未定义的)。 但是仔细观察会发现更多。 它围绕第二对角线不对称。 当实际值和预测值都较低时,我们会看到误差较高。

损耗曲线 (Loss Curves)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第17张图片

This is further evident in the Loss Curves. We can see the asymmetry as we increase errors on both sides of the origin. And contrary to the name, Symmetric error penalizes under forecasting more than over forecasting.

这在损耗曲线中更加明显。 当我们增加原点两侧的误差时,我们可以看到不对称。 与名称相反,对称误差对预测不足的影响大于对预测的影响。

互补对 (Complementary Pairs)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第18张图片

But when we look at complementary pairs, we can see it is perfectly symmetrical. This is probably because of the base, which we are keeping constant.

但是,当我们看互补对时,我们可以看到它是完全对称的。 这可能是因为基数保持不变。

过度和不足预测 (Over and Under Forecasting)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第19张图片

We can see the same here as well. The over forecasting series has a consistently lower error as compared to the under forecasting series. So in the effort to normalize the bias towards under forecasting of Percent Error, Symmetric Error shot the other way and is biased towards over forecasting.

我们在这里也可以看到相同的内容。 与预测不足系列相比,预测过度系列的误差始终较低。 因此,为了使偏误误差的预测偏向正常化,对称误差采取了另一种方式,偏向于过度预测。

异常影响 (Outlier Impact)

In addition to the above experiments, we had also run an experiment to check the impact of outliers(single predictions which are wildly off) on the aggregate metrics.

除上述实验外,我们还进行了一项实验,以检查异常值(单个预测很不正确)对总体指标的影响。

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第20张图片

All four error measures have similar behavior, when coming to outliers. The number of outliers have a much higher impact than the scale of outliers.

当出现异常值时,所有四个错误度量都有相似的行为。 离群值的数量比离群值的规模具有更高的影响。

Among the four, RMSE is having the biggest impact from outliers. We can see the contour lines are spaced far apart, showing the rate of change is high when we introduce outliers. On the other end of the spectrum, we have sMAPE which has the least impact from outliers. It is evident from the flat and closely spaced contour lines. MAE and MAPE are behaving almost similarly, probably MAPE a tad bit better.

在这四项中,RMSE受异常值的影响最大。 我们可以看到等高线相距很远,表明当引入异常值时变化率很高。 在频谱的另一端,我们拥有sMAPE,它对异常值的影响最小。 从平坦且间隔很近的轮廓线可以明显看出。 MAE和MAPE的行为几乎相似,也许MAPE稍微好一点。

概要 (Summary)

时间序列预测最大预测误差_预测误差的措施可以通过实验了解它们_第21张图片

To close off, there is no one metric which satisfies all the desiderata of an error measure. And depending on the use case, we need to pick and choose. Out of the four intrinsic measures( and all its aggregations like MAPE, MAE, etc.), if we are not concerned by Interpretability and Scale Dependency, we should choose Absolute Error Measures. And when we are looking for scale independent measures, Percent Error is the best we have(even with all of its shortcomings). Extrinsic Error measures like Scaled Error offer a much better alternative in such cases (Maybe in another blog post I’ll cover those as well.)

以关闭,没有一个指标,用于满足所有错误措施的必要条件。 根据使用情况,我们需要选择。 在四个内在测度(及其所有的汇总,如MAPE,MAE等)中,如果我们对可解释性和规模依赖性不感兴趣,则应选择绝对误差测度。 而且,当我们寻找与规模无关的度量时,百分比误差是我们所拥有的最好的误差(即使有所有缺点)。 在这种情况下,诸如Scaled Error之类的外部错误度量提供了更好的选择(也许在另一篇博客文章中,我也将介绍这些内容)。

Code to recreate Experiments

重新创建实验的代码

https://github.com/manujosephv/forecast_metrics/tree/master

https://github.com/manujosephv/forecast_metrics/tree/master

Further Reading

进一步阅读

  • Shcherbakov et al. 2013, A Survey of Forecast Error Measures

    Shcherbakov等。 2013,预测误差测度调查

  • Goodwin & Lawton, 1999, On the asymmetry of symmetric MAPE

    Goodwin&Lawton,1999,关于对称MAPE的不对称性

Edited(29–09–2020): Fixed a mislabeling issue in the contour map

编辑(29–09–2020):修复了等高线图中的标签错误问题

Originally published at http://deep-and-shallow.com on September 26, 2020.

最初于2020年9月26日发布在http://deep-and-shallow.com上。

翻译自: https://towardsdatascience.com/forecast-error-measures-understanding-them-through-experiments-da7ddcb0b035

时间序列预测最大预测误差

你可能感兴趣的:(python,机器学习,人工智能,算法,深度学习)