Content

We calculated the mean and sd in 110 m and 250 m windows for the following variables:

Here is how those measures are correlated with their focal variables:

canopy_height
cell_10m mean_110m mean_250m sd_110m sd_250m
cell_10m +1.00 +0.88 +0.80 +0.40 +0.56
mean_110m +1.00 +0.95 +0.30 +0.53
mean_250m +1.00 +0.28 +0.42
sd_110m +1.00 +0.81
sd_250m +1.00
dtm_10m
cell_10m mean_110m mean_250m sd_110m sd_250m
cell_10m +1.00 +1.00 +1.00 +0.30 +0.32
mean_110m +1.00 +1.00 +0.30 +0.32
mean_250m +1.00 +0.30 +0.33
sd_110m +1.00 +0.92
sd_250m +1.00
ns_groundwater_summer_mean_110m
cell_10m ns_groundwater_summer_mean_250m ns_groundwater_summer_sd_110m ns_groundwater_summer_sd_250m ns_groundwater_summer_utm32_10m
cell_10m +1.00 +0.97 +0.33 +0.38 +0.96
ns_groundwater_summer_mean_250m +1.00 +0.36 +0.41 +0.91
ns_groundwater_summer_sd_110m +1.00 +0.87 +0.31
ns_groundwater_summer_sd_250m +1.00 +0.35
ns_groundwater_summer_utm32_10m +1.00
vegetation_density
cell_10m mean_110m mean_250m sd_110m sd_250m
cell_10m +1.00 +0.82 +0.71 +0.17 +0.29
mean_110m +1.00 +0.93 -0.03 +0.16
mean_250m +1.00 -0.07 -0.00
sd_110m +1.00 +0.76
sd_250m +1.00

Variation Inflation Factors

To reduce the number of features systematically, we calculate variance inflation factors (vIFs). A VIF above 5 indicates that the variable introduces multicolliniearity in the dataset. A conservative rule is to only keep variables with VIFs below 2.5.

Here we carry out a step-wise selection based on the VIFs and the correlation tables above. VIFs exceeding 5 are highlighted in red.

1) All variables

Variables VIF
canopy_height 6.66
canopy_height_mean_110m 33.43
canopy_height_mean_250m 25.65
canopy_height_sd_110m 6.12
canopy_height_sd_250m 8.93
dtm_10m 618.45
dtm_10m_mean_110m 1688.75
dtm_10m_mean_250m 511.75
dtm_10m_sd_110m 8.71
dtm_10m_sd_250m 8.92
ns_groundwater_summer_mean_110m 61.96
ns_groundwater_summer_mean_250m 28.76
ns_groundwater_summer_sd_110m 5.14
ns_groundwater_summer_sd_250m 5.27
ns_groundwater_summer_utm32_10m 19.16
vegetation_density 4.54
vegetation_density_mean_110m 23.77
vegetation_density_mean_250m 19.87
vegetation_density_sd_110m 5.33
vegetation_density_sd_250m 6.82

The mean variables seem to introduce a lot of collinearity (very high VIFs, and see correlation tables above). We drop them first.

2) Drop mean variables

Variables VIF
canopy_height 2.76
canopy_height_sd_110m 5.84
canopy_height_sd_250m 7.42
dtm_10m 1.26
dtm_10m_sd_110m 7.76
dtm_10m_sd_250m 7.76
ns_groundwater_summer_sd_110m 5.09
ns_groundwater_summer_sd_250m 5.27
ns_groundwater_summer_utm32_10m 1.38
vegetation_density 1.72
vegetation_density_sd_110m 4.79
vegetation_density_sd_250m 5.06

The focal variables of different window sizes are highly correlated with each other. The correlation tables (above) suggest the 110 m windows are less correlated with the 10 m cell values, so we drop the 250 m windows next.

3) Drop 250 m variables

Variables VIF
canopy_height 2.14
canopy_height_sd_110m 2.54
dtm_10m 1.25
dtm_10m_sd_110m 1.77
ns_groundwater_summer_sd_110m 1.5
ns_groundwater_summer_utm32_10m 1.39
vegetation_density 1.7
vegetation_density_sd_110m 2.19

The final set of variables includes only the 10 m cell values and the sd calculated for the 110 m windows.