Rescaling and Distribution Fitting
Joe Thorley and Rebecca Fisher
2025-06-13
Source:vignettes/rescaling.Rmd
rescaling.Rmd
Introduction
By default ssdtools
does not rescale data when fitting
distributions so that the parameter estimates can be used to directly
estimate the HCx values. However, if rescale = TRUE
in the
ssd_fit_dists()
or ssd_fit_burrlioz()
functions then the data is rescaled by dividing by the geometric mean of
the minimum and maximum positive finite values which may aid model
fitting in some instances. To examine the extent to which model fitting
is improved we fit the 9 distributions with valid likelihoods currently
implemented in ssdtools
to the 729 acute datasets in the
envirotox
R data package with and without rescaling.
Methods
The R code that performs the analysis is as follows.
Consistent with the default settings in ssdtools
a
distribution was considered to have successfully fitted if it had
converged irrespective of whether the standard errors were computable
for the estimates based on the likelihood or whether a parameter was at
a boundary.
dists <- ssdtools::ssd_dists_all()
fit_dists <- function(data, d, r) {
list(ssdtools::ssd_fit_dists(data = data, dists = d, rescale = r,
computable = FALSE, at_boundary_ok = TRUE, silent = TRUE))
}
data <- envirotox::envirotox_acute |>
dplyr::nest_by(Chemical) |>
dplyr::mutate(ssd_fit_unscale = fit_dists(.data$data, d = dists, r = FALSE),
ssd_fit_rescale = fit_dists(.data$data, d = dists, r = TRUE),
dists_unscale = list(names(ssd_fit_unscale)),
dists_rescale = list(names(ssd_fit_rescale))) |>
dplyr::select(!c(ssd_fit_unscale, ssd_fit_rescale))
unscaled <- data |>
dplyr::select(Chemical, Distribution = dists_unscale) |>
tidyr::unnest(Distribution) |>
dplyr::ungroup() |>
dplyr::count(Distribution) |>
dplyr::mutate(n = n / nrow(data) * 100) |>
dplyr::select(Distribution, Unscaled = n)
rescaled <- data |>
dplyr::select(Chemical, Distribution = dists_rescale) |>
tidyr::unnest(Distribution) |>
dplyr::ungroup() |>
dplyr::count(Distribution) |>
dplyr::mutate(n = n / nrow(data) * 100) |>
dplyr::select(Distribution, Rescaled = n)
results <- unscaled |>
dplyr::inner_join(rescaled, by = "Distribution")
Findings
Distribution | Unscaled | Rescaled |
---|---|---|
burrIII3 | 100.0 | 100.0 |
gamma | 100.0 | 100.0 |
lgumbel | 100.0 | 100.0 |
llogis | 100.0 | 100.0 |
llogis_llogis | 95.9 | 95.7 |
lnorm | 100.0 | 100.0 |
lnorm_lnorm | 97.0 | 96.4 |
weibull | 100.0 | 100.0 |
The results indicate that with the 729 acute datasets considered, rescaling has little to no effect on fitting for all the currently implemented distributions with valid likelihoods with one exception. The exception is the gompertz distribution for which the fitting rate increases from to %. Despite substantial improvement for the gompertz the fitting rate is still only ~ % which is insufficient to warrant reconsideration of its inclusion in the default set.
Recommendations
Rescaling the data has little to no effect on the fitting rate for
the models in the default set. Consequently we recommend that the
ssd_fit_dists()
or ssd_fit_burrlioz()
continue
to use rescale = FALSE
as the default value and that it
remain the fixed option in the ssd_fit_bcanz()
function.
Session Info
The results were generated with the following packages.
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.5.0 (2025-04-11)
#> os Ubuntu 24.04.2 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en-US
#> collate C.UTF-8
#> ctype C.UTF-8
#> tz UTC
#> date 2025-06-13
#> pandoc 3.1.11 @ /opt/hostedtoolcache/pandoc/3.1.11/x64/ (via rmarkdown)
#> quarto NA
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> abind 1.4-8 2024-09-12 [1] RSPM
#> bslib 0.9.0 2025-01-30 [1] RSPM
#> cachem 1.1.0 2024-05-16 [1] RSPM
#> chk 0.10.0 2025-01-24 [1] RSPM
#> cli 3.6.5 2025-04-23 [1] RSPM
#> codetools 0.2-20 2024-03-31 [3] CRAN (R 4.5.0)
#> desc 1.4.3 2023-12-10 [1] RSPM
#> digest 0.6.37 2024-08-19 [1] RSPM
#> dplyr 1.1.4 2023-11-17 [1] RSPM
#> envirotox 0.0.0.9001 2025-06-13 [1] Github (poissonconsulting/envirotox@c3dabe2)
#> evaluate 1.0.3 2025-01-10 [1] RSPM
#> farver 2.1.2 2024-05-13 [1] RSPM
#> fastmap 1.2.0 2024-05-15 [1] RSPM
#> fs 1.6.6 2025-04-12 [1] RSPM
#> furrr 0.3.1 2022-08-15 [1] RSPM
#> future 1.58.0 2025-06-05 [1] RSPM
#> generics 0.1.4 2025-05-09 [1] RSPM
#> ggplot2 3.5.2 2025-04-09 [1] RSPM
#> globals 0.18.0 2025-05-08 [1] RSPM
#> glue 1.8.0 2024-09-30 [1] RSPM
#> goftest 1.2-3 2021-10-07 [1] RSPM
#> gtable 0.3.6 2024-10-25 [1] RSPM
#> htmltools 0.5.8.1 2024-04-04 [1] RSPM
#> jquerylib 0.1.4 2021-04-26 [1] RSPM
#> jsonlite 2.0.0 2025-03-27 [1] RSPM
#> knitr 1.50 2025-03-16 [1] RSPM
#> lattice 0.22-6 2024-03-20 [3] CRAN (R 4.5.0)
#> lifecycle 1.0.4 2023-11-07 [1] RSPM
#> listenv 0.9.1 2024-01-29 [1] RSPM
#> magrittr 2.0.3 2022-03-30 [1] RSPM
#> Matrix 1.7-3 2025-03-11 [3] CRAN (R 4.5.0)
#> parallelly 1.45.0 2025-06-02 [1] RSPM
#> pillar 1.10.2 2025-04-05 [1] RSPM
#> pkgconfig 2.0.3 2019-09-22 [1] RSPM
#> pkgdown 2.1.3 2025-05-25 [1] any (@2.1.3)
#> plyr 1.8.9 2023-10-02 [1] RSPM
#> purrr 1.0.4 2025-02-05 [1] RSPM
#> R6 2.6.1 2025-02-15 [1] RSPM
#> ragg 1.4.0 2025-04-10 [1] RSPM
#> rbibutils 2.3 2024-10-04 [1] RSPM
#> RColorBrewer 1.1-3 2022-04-03 [1] RSPM
#> Rcpp 1.0.14 2025-01-12 [1] RSPM
#> Rdpack 2.6.4 2025-04-09 [1] RSPM
#> rlang 1.1.6 2025-04-11 [1] RSPM
#> rmarkdown 2.29 2024-11-04 [1] RSPM
#> sass 0.4.10 2025-04-11 [1] RSPM
#> scales 1.4.0 2025-04-24 [1] RSPM
#> sessioninfo 1.2.3 2025-02-05 [1] RSPM
#> ssddata 1.0.0 2021-11-05 [1] RSPM
#> ssdtools 2.3.0.9004 2025-06-13 [1] Github (poissonconsulting/ssdtools@17874d3)
#> stringi 1.8.7 2025-03-27 [1] RSPM
#> stringr 1.5.1 2023-11-14 [1] RSPM
#> systemfonts 1.2.3 2025-04-30 [1] RSPM
#> textshaping 1.0.1 2025-05-01 [1] RSPM
#> tibble 3.3.0 2025-06-08 [1] RSPM
#> tidyr 1.3.1 2024-01-24 [1] RSPM
#> tidyselect 1.2.1 2024-03-11 [1] RSPM
#> TMB 1.9.17 2025-03-10 [1] RSPM
#> universals 0.0.5 2022-09-22 [1] RSPM
#> vctrs 0.6.5 2023-12-01 [1] RSPM
#> withr 3.0.2 2024-10-28 [1] RSPM
#> xfun 0.52 2025-04-02 [1] RSPM
#> yaml 2.3.10 2024-07-26 [1] RSPM
#>
#> [1] /home/runner/work/_temp/Library
#> [2] /opt/R/4.5.0/lib/R/site-library
#> [3] /opt/R/4.5.0/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────