Thursday, April 30, 2020

Periodicity in UK Daily Deaths

When looking at the number of hospital-based deaths reported in the government briefing each night, it's not difficult to see a degree of periodicity. See the chart below (recycled from yesterday). Here we can observe a semi-regular pattern of five relatively higher days compared to two lower ones, these values, of course, correspond to the reporting of weekday and weekend fatalities. The obvious questions then being, can we build this into our forecast and will it improve accuracy?


The first step then, it to look more closely at the data to see if a pattern emerges. Here we again can use the centred moving average (CMA) that I covered yesterday to benchmark each day's value to create a relative size index.

So our index for each day is defined as : (daily deaths on date=t ) / (CMA on date=t)

Take the values created and then place them in a table as below. Each value represents whether a particular day is higher or lower than expected, with 1.00 meaning the value was as expected, 1.1 was 10% higher, 0.90 was 10% lower etc.

Then by looking along each row we can see if the numbers reported on a particular day of the week are often similar values which would suggest an underlying cyclic pattern exists.


At this point the process includes a little bit of judgement rather than blind process, since it is often valuable to inspect these values to see if there are any changes over time or outliers that we need to address.

Firstly the week beginning 16th March looks very different to the later weeks, so I will exclude this from my average calculation. Secondly, March 25th was very low at 0.37 caused by a change in the reporting process, so I have excluded this value along with March 30th which was low at 0.54. The remaining values look reasonable and consistent giving the mean values in the last but one column. The final column shows the standard deviation of the weekly values for each day - the lower this number the more consistent the values are, at the moment Friday is the most consistent with Monday the least, but overall I am pleased with the results obtained and demonstrate a clear periodicity in the data. 

The values in the 5 week means column indicate that on average the values reported on a Monday are only 0.75 times the underlying trend, whereas on a Saturday the reported numbers are 1.21 times higher than the trend.

So if we used these multipliers to modify our original prediction for daily deaths will the results be better? The chart below, showing the actual numbers against the two predictions looks better, but we should really evaluate this properly.


The temptation at this point might be to pick a single approach, generate a metric and make a decision, but there are several metrics we could choose with their own strengths and weaknesses. For the sake of a few extra minutes, why not run a few different ones and see if a consensus emerges?


The first metric is a simple correlation (function CORREL() in Excel) between actual and prediction (known as r). The correlation values is often more useful than the more commonly used r-squared measure, since it also tells us about the direction of correlation than just the match. The closer to +1 or -1 you get the better the correlation. For this metric the newer method is more accurate. 

R-squared is simply the value of r, as above, but squared (function RSQ() in Excel). This again shows that the new method is better.

MAE is Mean Absolute Error so measures the average absolute distance between prediction and actual value. The absolute part stops negative errors offsetting positive ones. The lower the value, the better, and the new method wins again. 

MSE is Mean Square Error. Similar to the method above, but the error (the distance between prediction and actual) is squared. This has the effect of placing a higher weight on big errors, so an error of 10 counts one hundred times more than an error of 1. This is a common approach to use and the concept of squaring the error is indirectly included in other metrics like correlation and r-squared. The lower the value of MSE the better - the new approach wins again. 

MAPE is Mean Absolute Percentage Error. This is useful if the data has a wide range of values - our sample is not too bad in this respect - this method prevents small percentage errors on large values from distorting the measure of accuracy. Again, lower is better and the new method wins again. 

Last but not least is MSPE - Mean Squared Percentage Error - like MAPE but penalising higher percentage errors more than small ones. Low values indicate a better forecast, so again the new method wins. 

So with six out of six metrics showing that adding these weekly weights to the daily death predictions for the UK, it clearly shows this approach is more accurate. I have therefore implemented the newer approach in the UK dashboard and will evaluate for other markets moving forward. 


No comments:

Post a Comment