Changelog
Source:NEWS.md
ssdtools 2.0.0
CRAN release: 2024-10-09
ssdtools
v2.0.0, which now includes David Fox and Rebecca Fisher as co-authors, is the second major release of ssdtools
.
Major Changes
The following changes are major in the sense that they could alter previous hazard concentrations or break code.
Model Fitting and Averaging
Modifications
The following arguments were added to ssd_hc()
and ssd_hp()
-
multi_est = TRUE
to calculate model averaged estimates treating the distributions as constituting a single mixture distribution (previously it was effectivelyFALSE
). -
method_ci = "weighted_samples"
to specify whether to use"weighted_samples"
,"weighted_arithmetic"
,"multi_free"
or"multi_fixed"
methods to generate confidence intervals (previously it was effectively"weighted_arithmetic"
).
In addition the data frame returned by ssd_hc()
and predict()
now includes a column proportion
with values between 0 and 1 as opposed to a column percentage
with between 0 and 100.
Finally, with censored data confidence intervals can now only be estimated by non-parametric bootstrapping as the methods of parametrically bootstrapping censored data require review.
Minor Changes
The remaining changes are minor.
Model Fitting
Modifications
The following arguments of ssd_fit_dists()
were changed to reduce the chances of the lnorm_lnorm
bimodal distribution being dropped from the default set:
-
min_pmix = ssd_min_pmix(nrow(data))
so that by defaultmin_pmix
is 0.1 or3/nrow(data)
if greater. -
at_boundary_ok = TRUE
. -
computable = TRUE
.
These changes also allowed the min_pboot = 0.95
argument to be changed from 0.80
for all bootstrapping functions.
It is worth noting that the second two changes also reduce the chances of the BurrIII distribution being dropped.
In addition rescale = TRUE
now divides by the geometric mean of the minimum and maximum positive finite values as opposed to dividing by the geometric mean of the maximum finite value to improve the chances of convergence although ssd_fit_bcanz()
no longer rescales by default.
Other minor modifications to the model fitting functions include
-
estimates.fitdists()
now includes weights in returned parameters as well as anall_estimates = FALSE
argument to allow parameter values for all implemented distributions to be included. -
delta = 7
instead ofdelta = 9.21
to ensure weight of included models at least 0.01. - seeds are now allocated to bootstrap samples as opposed to distributions (which results in a speed gain when there are more cores than the number of distributions).
-
lnorm
andgompertz
initial values are offset from their maximum likelihood estimates to avoid errors inoptim()
.
The following functions and arguments were also added:
-
ssd_hp_bcanz()
andssd_hp.fitburrlioz()
to get hazard proportions. -
ssd_pmulti()
,ssd_qmulti()
andssd_rmulti()
for combined mixture distributions. -
ssd_exx()
functions to get default parameter estimates for distributions. -
ssd_censor_data()
to censor data. -
npars = c(2L, 5L)
argument tossd_dists_bcanz()
to specify the number of parameters. -
dists = ssd_dists_bcanz()
tossd_fit_bcanz()
to allow other packages to modify. -
samples = FALSE
tossd_hc()
andssd_hp()
include bootstrap samples as list of numeric vector(s). -
save_to = NULL
tossd_hc()
andssd_hp()
to specify a directory in which to save the bootstrap datasets as csv files and parameter estimates as .rds files.
Deprecations
The following functions and arguments were deprecated:
-
ssd_wqg_bc()
andssd_wqg_burrlioz()
were deprecated. -
percent = 5
inssd_hc()
andpredict()
was soft-deprecated forproportion = 0.05
. -
is_censored()
is now defunct.
Plotting
Perhaps the biggest plotting change is that ssd_plot_cdf()
now plots the average SSD together with the individual distributions if average = NA
.
In addition, the following functions and arguments were added.
-
scale_fill_ssd()
for color-blind fill scale. -
ssd_label_comma()
for formatting of x-axis labels. -
trans = "log10"
andadd_x = 0
tossd_plot()
andssd_plot_data()
to control x-axis scale. -
big.mark = ","
for x-axis labels andsuffix = "%"
for y-axis labels to all plotting functions.
and the following functions deprecated
-
comma_signif()
was soft-deprecated. -
is_censored()
,plot.fitdists()
,ssd_plot_cf()
geom_ssd()
andstat_ssd()
are now defunct.
ssdtools 1.0.4
CRAN release: 2023-05-17
- Added contributors.
- Now tests table values to 6 significant figures.
- Fixed bug that was not preserving NaN (returning NA_real_) for cumulative distribution and quantile functions.
ssdtools 1.0.3
CRAN release: 2023-04-12
- Replaced
size = 0.5
withlinewidth = 0.5
ingeom_hcintersect()
andgeom_xribbon()
. - Replaced
aes_string()
withaes()
in examples (and internally). - Removed use of
tidyverse
package. - Now tests values to 12 significant digits.
- Fixed description of
ssd_hp()
to be percent affected rather than percent protected.
ssdtools 1.0.2
CRAN release: 2022-05-14
- Fixed bug that was producing estimates of 0 for lower HCx values for log-normal mixture model with rescaled data spanning many orders of magnitude.
ssdtools 1.0.0
CRAN release: 2022-04-01
ssdtools version 1.0.0 is the first major release of ssdtools
with some important improvements and breaking changes.
Fitting
An important change to the functionality of ssd_fit_dists()
was to switch from model fitting using fitdistrplus
to TMB
which has resulted in improved handling of censored data. Although it was hoped that model fitting would be faster this is currently not the case.
As a result of the change the fitdists
objects returned by ssd_fit_dists()
from previous versions of ssdtools
are not compatible with the major release and should be regenerated.
BCANZ
As a result of an international collaboration British Columbia and Canada and Australia and New Zealand selected a set of recommended distributions for model averaging and settings when generating final guidelines.
The distributions are {r} > ssd_dists_bcanz() [1] "gamma" "lgumbel" "llogis" "lnorm" "lnorm_lnorm" "weibull"
The ssd_fit_bcanz()
and ssd_hc_bcanz()
functions were added to the package to facilitate the fitting of these distributions and estimation of hazard concentrations using the recommended settings.
Convergence
In the previous version of ssdtools
a distribution was considered to have converged if the following condition was met
-
stats::optim()
returns a code of 0 (indicating successful completion).
In the new version an additional two conditions must also be met
- Bounded parameters are not at a boundary (this condition can be turned off by setting
at_boundary_ok = TRUE
or the user can specify different boundary values - see below) - Standard errors are computable for all the parameter values (this condition can be turned off by setting
computable = FALSE
)
Censored Data
Censoring can now be specified by providing a data set with one or more rows that have
- a finite value for the left column that is smaller than the finite value in the right column (interval censored)
- a zero or missing value for the left column and a finite value for the right column (left censored)
It is currently not possible to fit distributions to data sets that have
- a infinite or missing value for the right column and a finite value for the left column (right censored)
Rows that have a zero or missing value for the left column and an infinite or missing value for the right column (fully censored) are uninformative and will result in an error.
Akaike Weights
For uncensored data, Akaike Weights are calculated using AICc (which corrects for small sample size). In the case of censored data, Akaike Weights are calculated using AIC (as the sample size cannot be estimated) but only if all the distributions have the same number of parameters (to ensure the weights are valid).
Distributions
Previously the density functions for the available distributions were exported as R functions to make them accessible to fitdistrplus
. This meant that ssdtools
had to be loaded to fit distributions. The density functions are now defined in C++ as TMB templates and are no longer exported.
The distribution, quantile and random generation functions are more generally useful and are still exported but are now prefixed by ssd_
to prevent clashes with existing functions in other packages. Thus for example plnorm()
, qlnorm()
and rlnorm()
have been renamed ssd_plnorm()
, ssd_qlnorm()
and ssd_rlnorm()
.
The following distributions were added (or in the case of burrIII3
readded) to the new version
-
burrIII3
- burrIII three parameter distribution -
invpareto
- inverse pareto (with bias correction in scale order statistic) -
lnorm_lnorm
log-normal/log-normal mixture distribution -
llogis_llogis
log-logistic/log-logistic mixture distribution
The following arguments were added to ssd_fit_dists()
-
rescale
(by defaultFALSE
) to specify whether to rescale concentrations values by dividing by the largest (finite) value. This alters the parameter estimates, which can help some distributions converge, but not the estimates of the hazard concentrations/protections. -
reweight
(by defaultFALSE
) to specify whether to reweight data points by dividing by the largest weight. -
at_boundary_ok
(by defaultFALSE
) to specifying whether a distribution with one or more parameters at a boundary has converged. -
min_pmix
(by default 0) to specify the boundary for the minimum proportion for a mixture distribution. -
range_shape1
(by defaultc(0.05, 20)
) to specify the lower and upper boundaries for the shape1 parameter of the burrIII3 distribution. -
range_shape2
(by default the same asrange_shape2
) to specify the lower and upper boundaries for the shape2 parameter of the burrIII3 distribution. -
control
(by default an empty list) to pass a list of control parameters tostats::optim()
.
It also worth noting that the default value of
-
computable
argument was switched fromFALSE
toTRUE
to enforce stricter requirements on convergence (see above).
Subsets of Distributions
The following were added to handle multiple distributions
-
ssd_dists()
to specify subsets of the available distributions. -
delta
argument (by default 7) to thesubset()
generic to only keep those distributions within the specified AIC(c) difference of the best supported distribution.
Burrlioz
The function ssd_fit_burrlioz()
was added to approximate the behaviour of Burrlioz.
Hazard Concentration/Protection Estimation
Hazard concentration estimation is performed by ssd_hc()
(which is wrapped by predict()
) and hazard protection estimation by ssd_hp()
. By default confidence intervals are estimated by parametric bootstrapping.
To reduce the time required for bootstrapping, parallelization was implemented using the future package.
The following arguments were added to ssd_hc()
and ssd_hp()
-
delta
(by default 7) to only keep those distributions within the specified AIC difference of the best supported distribution. -
min_pboot
(by default 0.90) to specify minimum proportion of bootstrap samples that must successfully fit. -
parametric
(by defaultTRUE
) to allow non-parametric bootstrapping. -
control
(by default an empty list) to pass a list of control parameters tostats::optim()
.
and the following columns were added to the output data frame
-
wt
to specify the Akaike weight. -
method
to indicate whether parametric or non-parametric bootstrap was used. -
nboot
to indicate how many bootstrap samples were used. -
pboot
to indicate the proportion of bootstrap samples which fitted.
It also worth noting that the
-
dist
column was moved from the last to the first position in the output data frame.
Goodness of Fit
The pvalue
argument (by default FALSE
) was added to ssd_gof()
to specify whether to return p-values for the test statistics as opposed to the test statistics themselves.
Plotting
There have also been some substantive changes to the plotting functionality.
Added following functions
-
ssd_plot_data()
to plot censored and uncensored data by callinggeom_ssdpoint()
for the left and for the right column (alpha parameter values should be adjusted accordingly) -
geom_ssdsegment()
to allow plotting of the range of a censored data points using segments. -
scale_colour_ssd()
(andscale_color_ssd()
) to provide an 8 color-blind scale.
Made the following changes to ssd_plot()
- added
bounds
(by defaultc(left = 1, right = 1)
) argument specify how many orders of magnitude to extend the plot beyond the minimum and maximum (non-missing) values. - added
linetype
(by defaultNULL
) argument to specify line type. - added
linecolor
(by defaultNULL
) argument to specify line color. - changed default value of
ylab
from “Percent of Species Affected” to “Species Affected”.
Renamed - GeomSsd
to GeomSsdpoint
. - StatSsd
to StatSsdpoint
Soft-deprecated - geom_ssd()
for geom_ssdpoint()
. - stat_ssd()
. - ssd_plot_cf()
for fitdistrplus::descdist()
.
Data
ssddata
The dataset boron_data
was renamed ccme_boron
and moved to the ssddata
R package together with the other CCME datasets.
The ssddata
package provides a suite of datasets for testing and comparing species sensitivity distribution fitting software.
Data Handling Functions
Added
-
ssd_data()
to return original data for afitdists
object. -
ssd_ecd_data()
to get empirical cumulative density for data. -
ssd_sort_data()
to sort data by empirical cumulative density.
Miscellaneous
-
npars()
now orders by distribution name. - All functions and arguments that were soft-deprecated prior to v0.3.0 now warn unconditionally.
ssdtools 0.3.3
CRAN release: 2021-02-19
- Increased requirement that R >= 3.5 due to VGAM.
- Modified
comma_signif()
so that now rounds to 3 significant digits by default and only appliesscales::comma()
to values >= 1000. - Soft-deprecated the
...
argument tocomma_signif()
.
ssdtools 0.3.0
CRAN release: 2020-07-09
Breaking Changes
- Soft-deprecated ‘burrIII3’ distribution as poorly defined.
- Soft-deprecated ‘pareto’ distribution as poor fit on SSD data.
Major Changes
- Reparameterized ‘llogis’ distribution in terms of locationlog and scalelog.
- Reparameterized ‘burrIII3’ distribution in terms of lshape1, lshape2 and lscale.
- Reparamaterized ‘burrIII2’ distribution in terms of locationlog and scalelog.
- Reparamaterized ‘lgumbel’ distribution in terms of locationlog and scalelog.
- Reparamaterized ‘gompertz’ distribution in terms of llocation and lshape.
- Standardized handling of arguments for d,p,q,r and s functions for distributions.
ssdtools 0.2.0
CRAN release: 2020-04-15
Breaking Changes
- Changed computable (whether standard errors must be computable to be considered to have converged) to FALSE by default.
- Enforces only one of ‘llogis’, ‘llog’ or ‘burrIII2’ in all sets (as identical).
ssdtools 0.1.0
CRAN release: 2020-01-13
Breaking Changes
- Default distributions changed to ‘burrIII2’, ‘gamma’ and ‘lnorm’ from ‘gamma’, ‘gompertz’, ‘lgumbel’, ‘llog’, ‘lnorm’ and ‘weibull’.
- Changed implicit behaviour of
ssd_hc()
andpredict()
whereci = TRUE
to explicitssd_hc(ci = FALSE)
andpredict(ci = FALSE)
. - Replaced
shape
andscale
arguments tollog()
withlshape
andlscale
. - Replaced
location
andscale
arguments tolgumbel()
withllocation
andlscale
.
Major Features
- Added Burr Type-III Two-Parameter Distribution (
burrIII2
). - Added
ssd_hp()
to calculate hazard percent at specific concentrations. - Added
ssd_exposure()
to calculate proportion exposed based on distribution of concentrations. - Optimized
predict()
and added parallel argument. - Tidyverse style error and warning messages.
Minor Features
-
ssd_fit_dists()
now checks if standard errors computable. - Added Burr Type-III Three-Parameter Distribution (
burrIII3
). - Added
sdist(x)
functionality to set starting values for distributions. - Added
ssd_plot_cdf()
to plot cumulative distribution function (equivalent toautoplot()
) -
nobs()
for censored data now returns a missing value. - Default
ssd_fit_dists()
distributions now ordered alphabetically.
Deprecated
- Deprecated
ssd_hc()
argumenthc = 5L
forpercent = 5L
. - Deprecated
dllog()
etc fordllogis()
. - Deprecated
ssd_cfplot()
forssd_plot_cf()
.
ssdtools 0.0.3
CRAN release: 2018-11-25
- added citation
- Added ssdtools-manual vignette
- Changed predict() and ssd_hc() nboot argument from 1001 to 1000
- Added hc5_boron data object
- No longer export ssd_fit_dist() as ssd_fit_dists() renders redundant
- geom_hcintersect() now takes multiple values
- More information in DESCRIPTION
- Added CRAN badge
- Removed dependencies: dplyr, magrittr, plyr, purrr
- Moved from depends to imports: VGAM, fitdistrplus, graphics, ggplot, stats
- Moved from imports to suggests: tibble