Content
We calculated the mean and sd in 110 m and 250 m windows for the following variables:
- dtm_10m
- canopy_height
- vegetation_density
- ns_ground_water
Here is how those measures are correlated with their focal variables:
canopy_height
|
cell_10m
|
mean_110m
|
mean_250m
|
sd_110m
|
sd_250m
|
cell_10m
|
+1.00
|
+0.88
|
+0.80
|
+0.40
|
+0.56
|
mean_110m
|
|
+1.00
|
+0.95
|
+0.30
|
+0.53
|
mean_250m
|
|
|
+1.00
|
+0.28
|
+0.42
|
sd_110m
|
|
|
|
+1.00
|
+0.81
|
sd_250m
|
|
|
|
|
+1.00
|
dtm_10m
|
cell_10m
|
mean_110m
|
mean_250m
|
sd_110m
|
sd_250m
|
cell_10m
|
+1.00
|
+1.00
|
+1.00
|
+0.30
|
+0.32
|
mean_110m
|
|
+1.00
|
+1.00
|
+0.30
|
+0.32
|
mean_250m
|
|
|
+1.00
|
+0.30
|
+0.33
|
sd_110m
|
|
|
|
+1.00
|
+0.92
|
sd_250m
|
|
|
|
|
+1.00
|
ns_groundwater_summer_mean_110m
|
cell_10m
|
ns_groundwater_summer_mean_250m
|
ns_groundwater_summer_sd_110m
|
ns_groundwater_summer_sd_250m
|
ns_groundwater_summer_utm32_10m
|
cell_10m
|
+1.00
|
+0.97
|
+0.33
|
+0.38
|
+0.96
|
ns_groundwater_summer_mean_250m
|
|
+1.00
|
+0.36
|
+0.41
|
+0.91
|
ns_groundwater_summer_sd_110m
|
|
|
+1.00
|
+0.87
|
+0.31
|
ns_groundwater_summer_sd_250m
|
|
|
|
+1.00
|
+0.35
|
ns_groundwater_summer_utm32_10m
|
|
|
|
|
+1.00
|
vegetation_density
|
cell_10m
|
mean_110m
|
mean_250m
|
sd_110m
|
sd_250m
|
cell_10m
|
+1.00
|
+0.82
|
+0.71
|
+0.17
|
+0.29
|
mean_110m
|
|
+1.00
|
+0.93
|
-0.03
|
+0.16
|
mean_250m
|
|
|
+1.00
|
-0.07
|
-0.00
|
sd_110m
|
|
|
|
+1.00
|
+0.76
|
sd_250m
|
|
|
|
|
+1.00
|
Variation Inflation Factors
To reduce the number of features systematically, we calculate variance inflation factors (vIFs). A VIF above 5 indicates that the variable introduces multicolliniearity in the dataset. A conservative rule is to only keep variables with VIFs below 2.5.
Here we carry out a step-wise selection based on the VIFs and the correlation tables above. VIFs exceeding 5 are highlighted in red.
1) All variables
Variables
|
VIF
|
canopy_height
|
6.66
|
canopy_height_mean_110m
|
33.43
|
canopy_height_mean_250m
|
25.65
|
canopy_height_sd_110m
|
6.12
|
canopy_height_sd_250m
|
8.93
|
dtm_10m
|
618.45
|
dtm_10m_mean_110m
|
1688.75
|
dtm_10m_mean_250m
|
511.75
|
dtm_10m_sd_110m
|
8.71
|
dtm_10m_sd_250m
|
8.92
|
ns_groundwater_summer_mean_110m
|
61.96
|
ns_groundwater_summer_mean_250m
|
28.76
|
ns_groundwater_summer_sd_110m
|
5.14
|
ns_groundwater_summer_sd_250m
|
5.27
|
ns_groundwater_summer_utm32_10m
|
19.16
|
vegetation_density
|
4.54
|
vegetation_density_mean_110m
|
23.77
|
vegetation_density_mean_250m
|
19.87
|
vegetation_density_sd_110m
|
5.33
|
vegetation_density_sd_250m
|
6.82
|
The mean variables seem to introduce a lot of collinearity (very high VIFs, and see correlation tables above). We drop them first.
2) Drop mean variables
Variables
|
VIF
|
canopy_height
|
2.76
|
canopy_height_sd_110m
|
5.84
|
canopy_height_sd_250m
|
7.42
|
dtm_10m
|
1.26
|
dtm_10m_sd_110m
|
7.76
|
dtm_10m_sd_250m
|
7.76
|
ns_groundwater_summer_sd_110m
|
5.09
|
ns_groundwater_summer_sd_250m
|
5.27
|
ns_groundwater_summer_utm32_10m
|
1.38
|
vegetation_density
|
1.72
|
vegetation_density_sd_110m
|
4.79
|
vegetation_density_sd_250m
|
5.06
|
The focal variables of different window sizes are highly correlated with each other. The correlation tables (above) suggest the 110 m windows are less correlated with the 10 m cell values, so we drop the 250 m windows next.
3) Drop 250 m variables
Variables
|
VIF
|
canopy_height
|
2.14
|
canopy_height_sd_110m
|
2.54
|
dtm_10m
|
1.25
|
dtm_10m_sd_110m
|
1.77
|
ns_groundwater_summer_sd_110m
|
1.5
|
ns_groundwater_summer_utm32_10m
|
1.39
|
vegetation_density
|
1.7
|
vegetation_density_sd_110m
|
2.19
|
The final set of variables includes only the 10 m cell values and the sd calculated for the 110 m windows.