Background

The NEON Woody plant vegetation structure dataset contains structure measurements, including height, canopy diameter, and stem diameter, as well as mapped position of individual woody plants across the survey area.

This data product contains the quality-controlled, native sampling resolution data from in-situ measurements of live and standing dead woody individuals and shrub groups, from all terrestrial NEON sites with qualifying woody vegetation. The exact measurements collected per individual depend on growth form, and these measurements are focused on enabling biomass and productivity estimation, estimation of shrub volume and biomass, and calibration / validation of multiple NEON airborne remote-sensing data products.

Our analyses focus on the relationship between individual stem height and diameter and how that relationship varies across growth forms.

Data

The data were downloaded from the NEON data portal and processed into a single table using script data-raw/individual.R

Read in data and setup analysis

First we read in the data and select only the columns we are interested in, i.e stem_diameter, height and growth_form

library(dplyr)
library(ggplot2)
individual <- readr::read_csv(here::here("data", 
                                         "individual.csv")) %>%
    select(stem_diameter, height, growth_form)
summary(individual)
##  stem_diameter        height       growth_form       
##  Min.   :  1.00   Min.   :  0.30   Length:14961      
##  1st Qu.: 11.60   1st Qu.:  5.30   Class :character  
##  Median : 16.50   Median : 10.60   Mode  :character  
##  Mean   : 20.21   Mean   : 11.27                     
##  3rd Qu.: 26.30   3rd Qu.: 16.20                     
##  Max.   :373.30   Max.   :119.80                     
##  NA's   :1346     NA's   :1711

Prepare data

To prepare the data we exclude rows for which the value of growth_form was NA or liana.

analysis_df <- individual %>%
    filter(!is.na(growth_form), growth_form != "liana")

We also convert growth_form to a factor and set the levels according to ascending counts of each level in the raw data.

gf_levels <- table(analysis_df$growth_form) %>%
    sort() %>%
    names()
analysis_df <- analysis_df %>%
    mutate(growth_form = factor(growth_form, 
                                levels = gf_levels))

Data properties

Statistical summaries of variables

summary(analysis_df)
##  stem_diameter        height                 growth_form  
##  Min.   :  1.00   Min.   :  0.30   small tree      : 179  
##  1st Qu.: 11.70   1st Qu.:  5.40   sapling         : 266  
##  Median : 16.80   Median : 11.10   single shrub    : 983  
##  Mean   : 20.48   Mean   : 11.49   small shrub     : 986  
##  3rd Qu.: 26.70   3rd Qu.: 16.50   multi-bole tree :1531  
##  Max.   :373.30   Max.   :119.80   single bole tree:9943  
##  NA's   :1031     NA's   :1357
analysis_df %>%
    ggplot(aes(y = growth_form, colour = growth_form, 
               fill = growth_form)) +
    geom_bar(alpha = 0.5, show.legend = FALSE)
Figure 1: Counts of growth forms

Figure 1: Counts of growth forms

analysis_df %>%
    tidyr::pivot_longer(cols = c(stem_diameter, height),
                        names_to = "var",
                        values_to = "value") %>%
    ggplot(aes(x = log(value), y = growth_form,
               colour = growth_form, fill = growth_form)) +
    geom_violin(alpha = 0.5, trim = TRUE, show.legend = FALSE) +
    geom_boxplot(alpha = 0.7, show.legend = FALSE) +
    facet_grid(~var)
## Warning: Removed 2388 rows containing non-finite values (stat_ydensity).
## Warning: Removed 2388 rows containing non-finite values (stat_boxplot).
Figure 2: Distribution and statistical summaries of stem_diameter and height across growth_forms

Figure 2: Distribution and statistical summaries of stem_diameter and height across growth_forms

Analysis

Modelling overall stem_diameter as a function of height

Initially we fit a linear model of form log(stem_diameter) as a function of log(height)

lm_overall <- lm(log(stem_diameter) ~ log(height), 
                 analysis_df)
lm_overall %>%
    broom::glance()
## # A tibble: 1 x 12
##   r.squared adj.r.squared sigma statistic p.value    df logLik    AIC    BIC
##       <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl>  <dbl>  <dbl>
## 1     0.679         0.679 0.487    24613.       0     1 -8132. 16271. 16293.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
lm_overall %>%
    broom::tidy()
## # A tibble: 2 x 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)    0.561   0.0145       38.7 5.36e-308
## 2 log(height)    0.944   0.00602     157.  0

Our model is statistically significant and has modest coverage, indicated by r.squared of 0.6792272

analysis_df %>%
    ggplot(aes(x = log(height), y = log(stem_diameter))) +
    geom_point(alpha = 0.2) +
    geom_smooth(method = "lm")
## Warning: Removed 2262 rows containing non-finite values (stat_smooth).
## Warning: Removed 2262 rows containing missing values (geom_point).

However, plotting our data reveals sub groups in the data. We can examine whether including growth_form in our analysis would improve our model fit by capturing variation explained by differing relationships across growth forms

Including an interaction with growth_form

We fit another model, this time including an interaction term for variable growth_form

lm_growth <- lm(log(stem_diameter) ~ log(height) * growth_form,
                analysis_df)
lm_growth %>%
    broom::glance()
## # A tibble: 1 x 12
##   r.squared adj.r.squared sigma statistic p.value    df logLik    AIC    BIC
##       <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl>  <dbl>  <dbl>
## 1     0.799         0.799 0.386     4195.       0    11 -5418. 10862. 10957.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
lm_growth %>%
    broom::tidy()
## # A tibble: 12 x 5
##    term                                    estimate std.error statistic  p.value
##    <chr>                                      <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)                                0.187    0.0957      1.96 5.02e- 2
##  2 log(height)                                0.952    0.0490     19.4  1.08e-82
##  3 growth_formsapling                         0.223    0.0992      2.25 2.46e- 2
##  4 growth_formsingle shrub                   -0.690    0.109      -6.34 2.44e-10
##  5 growth_formsmall shrub                     0.255    0.107       2.38 1.73e- 2
##  6 growth_formmulti-bole tree                 1.16     0.103      11.2  4.57e-29
##  7 growth_formsingle bole tree                1.38     0.0978     14.1  1.27e-44
##  8 log(height):growth_formsapling            -0.643    0.0778     -8.27 1.45e-16
##  9 log(height):growth_formsingle shrub        0.251    0.0645      3.89 1.00e- 4
## 10 log(height):growth_formsmall shrub        -0.561    0.151      -3.73 1.93e- 4
## 11 log(height):growth_formmulti-bole tree    -0.316    0.0519     -6.08 1.22e- 9
## 12 log(height):growth_formsingle bole tree   -0.376    0.0497     -7.57 3.88e-14

Our model is still significant but this time explains a larger proportion of the variation (0.7989074).

analysis_df %>%
    ggplot(aes(x = log(height), y = log(stem_diameter),
               colour = growth_form)) +
    geom_point(alpha = 0.1) +
    geom_smooth(method = "lm")
## Warning: Removed 2262 rows containing non-finite values (stat_smooth).
## Warning: Removed 2262 rows containing missing values (geom_point).
Figure 4: Log stem diameter as a function of the interaction of log height and growth form

Figure 4: Log stem diameter as a function of the interaction of log height and growth form

Session Info

sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.3.3  dplyr_1.0.5    reprex_2.0.0   testthat_3.0.2 devtools_2.4.0
## [6] usethis_2.0.1 
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.3.1        pkgload_1.2.1     tidyr_1.1.3       splines_4.0.4    
##  [5] jsonlite_1.7.2    here_1.0.1        bslib_0.2.4       shiny_1.6.0      
##  [9] assertthat_0.2.1  highr_0.9         yaml_2.2.1        remotes_2.3.0    
## [13] sessioninfo_1.1.1 lattice_0.20-41   pillar_1.6.0      backports_1.2.1  
## [17] glue_1.4.2        digest_0.6.27     promises_1.2.0.1  colorspace_2.0-0 
## [21] Matrix_1.3-2      htmltools_0.5.1.1 httpuv_1.5.5      pkgconfig_2.0.3  
## [25] broom_0.7.6       purrr_0.3.4       xtable_1.8-4      scales_1.1.1     
## [29] processx_3.5.1    later_1.1.0.1     tibble_3.1.1      mgcv_1.8-34      
## [33] generics_0.1.0    farver_2.1.0      ellipsis_0.3.1    DT_0.18          
## [37] cachem_1.0.4      withr_2.4.2       cli_2.4.0         magrittr_2.0.1   
## [41] crayon_1.4.1      mime_0.10         memoise_2.0.0     evaluate_0.14    
## [45] ps_1.6.0          fs_1.5.0          fansi_0.4.2       nlme_3.1-152     
## [49] pkgbuild_1.2.0    tools_4.0.4       prettyunits_1.1.1 hms_1.0.0        
## [53] lifecycle_1.0.0   stringr_1.4.0     munsell_0.5.0     callr_3.7.0      
## [57] compiler_4.0.4    jquerylib_0.1.3   rlang_0.4.10      grid_4.0.4       
## [61] rstudioapi_0.13   htmlwidgets_1.5.3 crosstalk_1.1.1   labeling_0.4.2   
## [65] rmarkdown_2.7     gtable_0.3.0      DBI_1.1.1         R6_2.5.0         
## [69] knitr_1.32        fastmap_1.1.0     utf8_1.2.1        rprojroot_2.0.2  
## [73] readr_1.4.0       desc_1.3.0        stringi_1.5.3     Rcpp_1.0.6       
## [77] vctrs_0.3.7       tidyselect_1.1.0  xfun_0.22