Summarise Uploaded Results, In Place (the cloud ssd_summarise())

The cloud counterpart of ssd_summarise(): a generic, dispatched on the upload object's class, that fans a step's uploaded shards into a single lazy duckplyr table read in place (no download). For an Azure destination it reads the <container>[/<prefix>]/<step>/**/part.parquet Hive glob - or, for the combined summaries, the single blob summary.parquet (step = "summary") / summary-samples.parquet (step = "summary_samples", shipped only when the scenario set samples = TRUE) - via DuckDB's azure extension - resolving the same front-end secret as the write path and remapping it (with the account derived from url) into a DuckDB azure secret - and returns the union as a lazy duckplyr tibble (not collected, so the read and projection stay in DuckDB). By default it projects away the heavy dists/samples list-columns (the analysis-ready summary, mirroring ssd_summarise()); pass drop_samples = FALSE to keep them when the in-flight bootstrap samples are needed. Because the uploaded compact summary physically lacks those columns, step = "summary" with drop_samples = FALSE aborts pointing at step = "summary_samples" rather than silently returning a sample-less table. The default method (an unknown destination) and the dry-run method both abort.

Usage

ssd_summarise_uploaded(
  upload,
  step = "hc",
  ...,
  drop_samples = TRUE,
  prudence = "stingy"
)

Arguments

upload: An upload destination from ssd_upload_azure() or ssd_upload_dryrun().
step: One of "sample", "fit", "hc" (the step layer to read), "summary" (the uploaded compact summary), or "summary_samples" (the uploaded full summary retaining the dists/samples list-columns, shipped only when the scenario set samples = TRUE).
...: These dots are for future extensions and must be empty.
drop_samples: Flag (default TRUE): project away the heavy dists/samples list-columns for the analysis-ready summary. Pass FALSE to keep them (e.g. when the in-flight bootstrap samples are needed).
prudence: The duckplyr prudence of the returned table (default "stingy"): "stingy" keeps it lazy and composable but makes an implicit materialisation (e.g. nrow()/$) against the remote glob error rather than triggering an unbounded download/scan; "lavish" restores automatic materialisation on first access. dplyr::collect() and duckplyr::compute_parquet() work under either.

Value

A lazy duckplyr/DuckDB tibble over the unioned, uploaded step layer (not collected), composable with dplyr verbs - dplyr::collect() it (or write it with duckplyr::compute_parquet()) when you need the rows in R.

Examples

if (FALSE) { # \dontrun{
upload <- ssd_upload_azure("https://acct.blob.core.windows.net", "results")
ssd_summarise_uploaded(upload, "hc")
ssd_summarise_uploaded(upload, "hc", drop_samples = FALSE) # keep samples
} # }

Summarise Uploaded Results, In Place (the cloud `ssd_summarise()`)

Usage

Arguments

Value

See also

Examples