Socioeconomic and environmental determinants of COVID-19 in England

Individual Coursework Assignment

Date: 2021.02

Brief Description

The data of COVID-19 cumulative deaths and positive cases used in this study were extracted from The UK Government (2021).

There are many conjecture factors to be connected to the diffusion of COVID-19 (Bherwani et al. 2020, Hu et al. 2020). Pollution was found to be correlated to COVID cases in the recent literature (Bashir et al. 2020). Besides, these factors also include socioeconomic characteristics, health status, age structure, accessibility, transportation geography, etc. (Carten`ı et al. 2021, Ji et al. 2020, Kulu and Dorey 2020). In this study, we widen the selection by including the ratio of gender and ethnic groups from the census dataset. To handle such a huge amount of candidate variables, exploratory data analysis, such as Ordinary Least Squares (OLS) and Geographically Weighted Regression (GWR) will be perfect to explore the relationships of different candidate factors and COVID statistics (Charlton 2020).

总平面

Methodology

The first step is infection and death pattern detection. Hot spot analysis (G statistics) can help us identify clustering of high or low infection and death. Anselin local Moran's I is a local statistics to measure the strength of patterns for each feature and identify unusual areas with higher or lower statistics.

Then we conducted regression analysis. There are multiple regression modelling to be used in the geographical enquiry:

Where at zone \(i\), \[y_i = \beta_{i0}+\sum_{j=1}^m \beta_{ij}x_{ij}+\epsilon_i, i=1, 2, ..., n\]

  • \(y_i\) is the dependent variable, extracted from COVID statistics;
  • \(x_{ij}\) is the value of jth independent variables;
  • \(\beta_{i0}\) is the y-intercept;
  • \(\beta_{ij}\) is the parameter to be estimated;
  • \(\epsilon_i\) is a random error term, assumed to be normally distributed.
  • this study, we extended the regression analysis with GWR. GWR allows the parameters to be derived for each location separately based on geographic context:

    Where at zone \(i\), \[y_i = \beta_{i0}+\sum_{j=1}^m \beta_{ij}x_{ij}+\epsilon_i, i=1, 2, ..., n\]

  • \(y_i\) is the dependent variable, extracted from COVID statistics;
  • \(x_{ij}\) is the value of jth independent variables;
  • \(\beta_{i0}\) is the y-intercept;
  • \(\beta_{ij}\) is the parameter to be estimated;
  • \(\epsilon_i\) is a random error term, assumed to be normally distributed.
  • Results and Discussion

    The graph suggests that for both death rate and positive rate, there are clearly high concentration in the northwestern part of England and low concentration in the southwestern part of England. But when it comes to positive rate, the southeastern part shows a clear high gathering.

    总平面

    According to fig 4 (a), the unusual lower areas of death rate (dark blue areas) mainly located in the northwestern part of England, while unusual higher areas are around the middle southern part. When it comes to infection rate, the unusual lower areas are around both northwestern and southeastern, covering broader parts than death rate, but unusual area is quite few (see fig 4 (b)).

    总平面

    OLS: Multivariate analysis

    To begin with, we started with univariate OLS analysis on COVID positive and death with each candidate variable. We can see the graphical relations of COVID variables and candidate independent variables for each theme (see fig 5 for the census; fig 6 for environmental deprivation; fig 7 for air pollution).

    总平面

    总平面

    总平面

    Furthermore, according to table 1, we can see the coefficient and model performance (R2). It demonstrates that models of COVID death acquire poor performance and weak coefficient (R2 <0.2). So the further analysis will only include qualified dependent variable of positive rate (grey cells in table 1).

    The map of standard deviation residual tells the spatial pattern of over-prediction (negative residual or blue tones) and under-prediction (positive residuals or red tones) (see fig 8). The spatial pattern indicates further spatial autocorrelation analysis is needed here. Given the global Moran's I is 0.319, z-score is greater than 2.58 and p-value is less than 0.01, we can see that this clustered pattern of residual is highly impossible because of random chance.

    总平面

    GWR: Multivariate analysis

    Table 3 shows the global Moran's I indices of potential variables for GWR model for spatial autocorrelations of disparity. In Moran's I analysis, all variables were all positive spatial autocorrelated and statistically significant (p<0.01).

    For retaining the comparability of both global and local regression models, the same explanatory variables, as specified for GWR, were used in OLS model. The following diagnostic measures were examined: AICc, estimated standard deviation for the residuals (see fig 9 (a)), global adjusted R2 as measures of goodness of fit. The analysis of AICc revealed a better fit of GWR model to observations than that for OLS. This was also supported by the analysis of global R2 that showed the increase of infection rate explained by GWR model up to 75.6% compared with 61% of OLS model (see table 4).

    The analysis of Moran's I index of GWR residuals showed significantly better specification of GWR model in comparison to OLS. After the implementation of GWR, the statistically significant tendency for clustering of similar residuals was largely weaken (see table 4).

    Local models provide us with different envision: in the global model, non-white ethnicity displays a positive correlation with positive rate while it is reversed in most of local authorities in local GWR model (see fig 9 (c)). The variables of age, house member number and health show similar coefficient pattern spatially. For air pollution (PM 2.5) (see fig 9 (b), (d) and (e)), obvious positive coefficients are covered most parts of England (see fig 9 (f)).

    总平面

    Discussion

    In the study, we compiled 31 candidate variables that could explain the spatial pattern of COVID-19 death and infection rate at the local authority level in England. These variables were extracted from three themes, namely census,environmental deprivation and air pollution. Individual or combinations of these variables was used to model the geographic distribution of COVID-19 attributes.

    Comparing the spatial results achieved by the OLS and GWR techniques, the latter has a better performance in explaining local correlation and recognising non-stationary characteristics. Findings of multiple regression modelling suggested that age structure, house size, health condition and air pollution are positively correlated to infection rate locally and globally. Besides, economic variables, crime are more globally correlated to positive rate than locally. Nonwhite ethnicity rate has reversed conclusions of global and local levels. Hence, the government should cultivate the awareness of protection of white ethnicity, especially for those who live in the middle part of England.

    As Bambra et al. (2020) alluded, arise pandemic may bring more severe socioeconomic disadvantages and inequality. According to our results, this is also likely to a bring higher possibility of infection, leading to worse outcomes from COVIS-19 in more disadvantaged areas and groups. If the policymaker doesn't make a targeted intervention, prevention and policies, it will bring everywhere into the vicious circle.

    The study still has some insufficiency to be revised in future studies. For instance, one of the main data sources is from the census of year 2011. As the UK government started the census of 2021 (Office for National Statistics 2021), it should be promising to describe a more precise model with a more current data source. Besides, the spread of COVID is a dynamic process over time. Future studies can further investigate how to model the correlation with the help of the extension of GWR technique, geographical and temporal weighted regression.