The NEON Woody plant vegetation structure dataset contains structure measurements, including height, canopy diameter, and stem diameter, as well as mapped position of individual woody plants across the survey area.
This data product contains the quality-controlled, native sampling resolution data from in-situ measurements of live and standing dead woody individuals and shrub groups, from all terrestrial NEON sites with qualifying woody vegetation. The exact measurements collected per individual depend on growth form, and these measurements are focused on enabling biomass and productivity estimation, estimation of shrub volume and biomass, and calibration / validation of multiple NEON airborne remote-sensing data products.
Our analyses focus on the relationship between individual stem height and diameter and how that relationship varies across growth forms.
The data were downloaded from the NEON data portal and processed into a single table using script data-raw/individual.R
First we read in the data and select only the columns we are interested in, i.e stem_diameter
, height
and growth_form
library(dplyr)
library(ggplot2)
individual <- readr::read_csv(here::here("data",
"individual.csv")) %>%
select(stem_diameter, height, growth_form)
## stem_diameter height growth_form
## Min. : 1.00 Min. : 0.30 Length:14961
## 1st Qu.: 11.60 1st Qu.: 5.30 Class :character
## Median : 16.50 Median : 10.60 Mode :character
## Mean : 20.21 Mean : 11.27
## 3rd Qu.: 26.30 3rd Qu.: 16.20
## Max. :373.30 Max. :119.80
## NA's :1346 NA's :1711
To prepare the data we exclude rows for which the value of growth_form
was NA
or liana
.
We also convert growth_form
to a factor and set the levels according to ascending counts of each level in the raw data.
## stem_diameter height growth_form
## Min. : 1.00 Min. : 0.30 small tree : 179
## 1st Qu.: 11.70 1st Qu.: 5.40 sapling : 266
## Median : 16.80 Median : 11.10 single shrub : 983
## Mean : 20.48 Mean : 11.49 small shrub : 986
## 3rd Qu.: 26.70 3rd Qu.: 16.50 multi-bole tree :1531
## Max. :373.30 Max. :119.80 single bole tree:9943
## NA's :1031 NA's :1357
analysis_df %>%
ggplot(aes(y = growth_form, colour = growth_form,
fill = growth_form)) +
geom_bar(alpha = 0.5, show.legend = FALSE)
analysis_df %>%
tidyr::pivot_longer(cols = c(stem_diameter, height),
names_to = "var",
values_to = "value") %>%
ggplot(aes(x = log(value), y = growth_form,
colour = growth_form, fill = growth_form)) +
geom_violin(alpha = 0.5, trim = TRUE, show.legend = FALSE) +
geom_boxplot(alpha = 0.7, show.legend = FALSE) +
facet_grid(~var)
## Warning: Removed 2388 rows containing non-finite values (stat_ydensity).
## Warning: Removed 2388 rows containing non-finite values (stat_boxplot).
stem_diameter
as a function of height
Initially we fit a linear model of form log(stem_diameter)
as a function of log(height)
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.679 0.679 0.487 24613. 0 1 -8132. 16271. 16293.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.561 0.0145 38.7 5.36e-308
## 2 log(height) 0.944 0.00602 157. 0
Our model is statistically significant and has modest coverage, indicated by r.squared
of 0.6792272
analysis_df %>%
ggplot(aes(x = log(height), y = log(stem_diameter))) +
geom_point(alpha = 0.2) +
geom_smooth(method = "lm")
## Warning: Removed 2262 rows containing non-finite values (stat_smooth).
## Warning: Removed 2262 rows containing missing values (geom_point).
However, plotting our data reveals sub groups in the data. We can examine whether including growth_form
in our analysis would improve our model fit by capturing variation explained by differing relationships across growth forms
growth_form
We fit another model, this time including an interaction term for variable growth_form
lm_growth <- lm(log(stem_diameter) ~ log(height) * growth_form,
analysis_df)
lm_growth %>%
broom::glance()
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.799 0.799 0.386 4195. 0 11 -5418. 10862. 10957.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## # A tibble: 12 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.187 0.0957 1.96 5.02e- 2
## 2 log(height) 0.952 0.0490 19.4 1.08e-82
## 3 growth_formsapling 0.223 0.0992 2.25 2.46e- 2
## 4 growth_formsingle shrub -0.690 0.109 -6.34 2.44e-10
## 5 growth_formsmall shrub 0.255 0.107 2.38 1.73e- 2
## 6 growth_formmulti-bole tree 1.16 0.103 11.2 4.57e-29
## 7 growth_formsingle bole tree 1.38 0.0978 14.1 1.27e-44
## 8 log(height):growth_formsapling -0.643 0.0778 -8.27 1.45e-16
## 9 log(height):growth_formsingle shrub 0.251 0.0645 3.89 1.00e- 4
## 10 log(height):growth_formsmall shrub -0.561 0.151 -3.73 1.93e- 4
## 11 log(height):growth_formmulti-bole tree -0.316 0.0519 -6.08 1.22e- 9
## 12 log(height):growth_formsingle bole tree -0.376 0.0497 -7.57 3.88e-14
Our model is still significant but this time explains a larger proportion of the variation (0.7989074).
analysis_df %>%
ggplot(aes(x = log(height), y = log(stem_diameter),
colour = growth_form)) +
geom_point(alpha = 0.1) +
geom_smooth(method = "lm")
## Warning: Removed 2262 rows containing non-finite values (stat_smooth).
## Warning: Removed 2262 rows containing missing values (geom_point).
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.3.3 dplyr_1.0.5 reprex_2.0.0 testthat_3.0.2 devtools_2.4.0
## [6] usethis_2.0.1
##
## loaded via a namespace (and not attached):
## [1] sass_0.3.1 pkgload_1.2.1 tidyr_1.1.3 splines_4.0.4
## [5] jsonlite_1.7.2 here_1.0.1 bslib_0.2.4 shiny_1.6.0
## [9] assertthat_0.2.1 highr_0.9 yaml_2.2.1 remotes_2.3.0
## [13] sessioninfo_1.1.1 lattice_0.20-41 pillar_1.6.0 backports_1.2.1
## [17] glue_1.4.2 digest_0.6.27 promises_1.2.0.1 colorspace_2.0-0
## [21] Matrix_1.3-2 htmltools_0.5.1.1 httpuv_1.5.5 pkgconfig_2.0.3
## [25] broom_0.7.6 purrr_0.3.4 xtable_1.8-4 scales_1.1.1
## [29] processx_3.5.1 later_1.1.0.1 tibble_3.1.1 mgcv_1.8-34
## [33] generics_0.1.0 farver_2.1.0 ellipsis_0.3.1 DT_0.18
## [37] cachem_1.0.4 withr_2.4.2 cli_2.4.0 magrittr_2.0.1
## [41] crayon_1.4.1 mime_0.10 memoise_2.0.0 evaluate_0.14
## [45] ps_1.6.0 fs_1.5.0 fansi_0.4.2 nlme_3.1-152
## [49] pkgbuild_1.2.0 tools_4.0.4 prettyunits_1.1.1 hms_1.0.0
## [53] lifecycle_1.0.0 stringr_1.4.0 munsell_0.5.0 callr_3.7.0
## [57] compiler_4.0.4 jquerylib_0.1.3 rlang_0.4.10 grid_4.0.4
## [61] rstudioapi_0.13 htmlwidgets_1.5.3 crosstalk_1.1.1 labeling_0.4.2
## [65] rmarkdown_2.7 gtable_0.3.0 DBI_1.1.1 R6_2.5.0
## [69] knitr_1.32 fastmap_1.1.0 utf8_1.2.1 rprojroot_2.0.2
## [73] readr_1.4.0 desc_1.3.0 stringi_1.5.3 Rcpp_1.0.6
## [77] vctrs_0.3.7 tidyselect_1.1.0 xfun_0.22