English COVID Cases to Hospitalizations Throughput & Forecaster
<Guy Nason Home Page>

Version 0.11 - June 20th 2021.

Usage Guidance

This page provides guidance for using the English COVID Cases to Hospitalizations Throughput & Forecaster shiny app.

Tweet button. Feel free to re-tweet this page and comment.

Main Panel

Main Plot: Shows hospitalizations plotted against cases taken "My Lag" days earlier. So, if "My Lag" is 7, this plot shows hospitalizations plotted against cases recorded a week earlier. Both cases and hospitalizations are plotted on a log scale, but axis labels are shown on the original data scale. The dotted black line is the linear regression model of the plotted points. If shown, the red lines show how the "New cases used for forward prediction" would translate into future hospitalizations "My Lag" days later according to this model.

Text Panel: records various information about the plot.

New cases used for forward prediction: what is the current number of cases used for forward prediction. This value is controlled by the "Manual Newcases" option. By default, the most recent seven day mean of cases is used.

Predicted hospitalizations: using the "new cases" number and the current linear regression, predict the number of hospitalizations "My Lag" days into the future, with an approximate 95% confidence interval. Note: in my view the confidence interval is usually too small and other considerations such as model uncertainty should be taken into account properly.

Most recent hospitalization date: most recent date that hospitalization data is available from the NHS data dashboard.

Last 5 hospitalizations and corresponding cases: useful for understanding what data the app is using.

Number of days data in model: confirms the "Days of historical data" slider - number of days is reduced by one for each positive lag.

Date range of data included: the dates corresponding to the "Days of historical data" and "Number of days to go back" sliders.

Slope and intercept: current linear regression parameter estimates.

Current Correlation: the actual cross-correlation between cases and hospitalizations at the given lag.

Correlations for different lags: table. The time between becoming a `case' and being `hospitalized' (if you are) will vary from person to person. However, this table shows all of the cross-correlations between cases and hospitalizations (in a statistical sense, for all recorded pairs) at different lags. The idea being that you can choose a lag with the highest cross-correlation, which corresponds to the shift that best aligns the two variables. In our experience, often this lag is 7, but can be 14 and, in principle, any number. The buttons at the bottom of the table permit the table values to be copied, put into a CSV file, or Excel file, as PDF or printed.

Side Panel Options - buttons and sliders

Reset button. Sets all parameters to starting values.

Days of Historical Data: for each regression, how many days to include in the model. This gets reduced by the lag (so if lag=7, there are seven fewer days included).

Number of days to go back: chooses the period under study. This sets the last date in the model by choosing how many days in the past will be the `last date'. Jointly, the "Number of days to go back" and "Days of historical data" can be used to select the time period under study.

My Lag: people who "get" COVID (=cases), sometimes go on to develop symptoms and then, unfortunately, some become so ill that they are hospitalized. The time taken from being a case to a hospitalizated person can vary. This option sets the time between case and hospitalization to be "My Lag" days. Naturally, not all cases adhere to this time frame, but this option is used to pick a likely value. Often, we pick a value of "My Lag" to be the lag that maximises the cross-correlation between cases and hospitalizations and these are shown in the "Correlations for different lags" table.

Manual Newcases: If unticked (default) then the future number of hospitalzations is forward-predicted using a number ocases which is the most recent seven-day mean of cases. This value is shown in the first line of the table as "New cases used for forward prediction". If ticked, then a text entry box is shown and you can enter your own number of cases. This can be useful to use the case-hospitalisation models, but with your own number of cases to answer "what if" questions. For example, if the NHS Dashboard shows a sudden decrease in cases, this can be input immediately as "Manual Newcases" so that you can see how that might translate into new hospitalizations (but a time period later according to "My Lag").

Model Diagnostics: If ticked, then the app displays the usual R lm (linear model) diagnostic plot. In addition the serial autocorrelation plot of the residuals is plotted, along with a histogram of the residuals. In addition, a panel with two significance test diagnostics. One is the Wilcox rank sum test for zero mean of the residuals and the other is the Shapiro-Wilk test for normality of the residuals.

Multi-Slope Plot: If ticked, then the regression is repeated over a collection of days in the past. This enables one to see how the relationship between cases and hospitalizations has changed over time. If ticked, then a new tick box, three sliders and a new plot appear. The plot shows the effect of the changing regression over time (if it has changed). The most recent time is to the right, and the longest ago is to the left. At each time point chosen, the linear model itself (intercept/slope) is not shown directly. Instead, we show the result of applying the regression model estimated for that time point to the cases that are currently indicated by "Manual Newcases". If this is set, then the number of "Manual Newcases" is used each time. If it is not set (default), then, as with the main regression, the most recent (relative to the point in time in the past that the dot pertains to) seven day case mean is used. The resultant number of hospitalisations is then expressed as a number per 100 cases. So, if the dot is vertically located at 5 and the lag was 7, we can say that the case to hospitalizations a week later average throughput at that time (horizontal location of the dot) is 5%. This was of expressing the throughput is useful for estimating what actually happened (ie the cases use reflect the number of cases around at the time).

Options for Multi-Slope Plot

Quadratic Regression: If unticked a linear regression is drawn as a red dashed line through the case-hospitalization throughput dots and the slope appears in the legend. If ticked a quadratic is drawn instead, and the slope and quadratic coefficient appear in the legend. Note: the regressions are simple least-squares fit lines to gauge the likely evolution of the throughput points. We make no statement about the statistical distribution or more incisive inferential statements such as confidence bands.

Go Back Start, End and Interval: chooses the most recent day in the past, the further day in the past and the interval between days in calculating the regressions. The main regression (above) is recalculated for each of these choices of Go Back values.

Caveats

The modelling used here is simple. The key regression is a simple intercept and slope model with Gaussian iid errors. Model diagnostic plots and output from related tests is provided, but the app is not configured to run robust regressions nor models that might be more suitable when the data do not suggest a linear form. Now allowance is made for model uncertainty and the model works according to frequentist principles.

It is likely, for individuals, that the time from being identified as a case to being identified as a hospitalization will vary, quite considerably. The "My Lag" quantity is a data-driven concept that is meant to quantify the "mean" lag. Better models would include the day variability and the probabilities associated with that. A better modelling paradigm might be to use a time series transfer model.

By default, for "prediction", the mean of the last seven days cases are used. A forward prediction can then be rather inaccurate, especially when the cases are going through a period of rapid change (either up or down).

In the multi-slope plot - this just shows the likely effect of the case-hospitalization throughput using the approximate number of cases at the time. In the plot, we do not claim that these throughput rates have any particular distribution and the regression is merely a least-squares fit to give an idea of the slope and intercept.

Packages used

DT: A Wrapper of the JavaScript Library 'DataTables

httr: Tools for Working with URLs and HTTP

jsonlite: A simple and robust JSON parser and generator for R

lR: What is R?

shiny: Web Application Framework for R

zoo: S3 Infrastructure for Regular and Irregular Time Series (Z's Ordered Observations)

Shiny App Code

User Interface

Server Code

Subsidiary Routines

Change Log

xxth Month, version x.x: Placeholder

© Guy Nason 2021

Publications List