Summarise Uploaded Results, In Place (the cloud ssd_summarise())
Source: R/upload.R
ssd_summarise_uploaded.RdThe cloud counterpart of ssd_summarise(): a generic, dispatched on the
upload object's class, that fans a step's uploaded shards into a single
lazy duckplyr table read in place (no download). For an Azure
destination it reads the <container>[/<prefix>]/<step>/**/part.parquet Hive
glob - or, for the combined summaries, the single blob summary.parquet
(step = "summary") / summary-samples.parquet (step = "summary_samples",
shipped only when the scenario set samples = TRUE) - via DuckDB's azure
extension - resolving the same front-end
secret as the write path and remapping it (with the account derived from
url) into a DuckDB azure secret - and returns the union as a lazy
duckplyr tibble (not collected, so the read and projection stay in DuckDB).
By default it projects away the heavy dists/samples list-columns (the
analysis-ready summary, mirroring ssd_summarise()); pass
drop_samples = FALSE to keep them when the in-flight bootstrap samples
are needed. Because the uploaded compact summary physically lacks those
columns, step = "summary" with drop_samples = FALSE aborts pointing at
step = "summary_samples" rather than silently returning a sample-less
table. The default method
(an unknown destination) and the dry-run method both abort.
Arguments
- upload
An upload destination from
ssd_upload_azure()orssd_upload_dryrun().- step
One of
"sample","fit","hc"(the step layer to read),"summary"(the uploaded compact summary), or"summary_samples"(the uploaded full summary retaining thedists/sampleslist-columns, shipped only when the scenario setsamples = TRUE).- ...
These dots are for future extensions and must be empty.
- drop_samples
Flag (default
TRUE): project away the heavydists/sampleslist-columns for the analysis-ready summary. PassFALSEto keep them (e.g. when the in-flight bootstrapsamplesare needed).- prudence
The duckplyr prudence of the returned table (default
"stingy"):"stingy"keeps it lazy and composable but makes an implicit materialisation (e.g.nrow()/$) against the remote glob error rather than triggering an unbounded download/scan;"lavish"restores automatic materialisation on first access.dplyr::collect()andduckplyr::compute_parquet()work under either.
Value
A lazy duckplyr/DuckDB tibble over the unioned, uploaded step
layer (not collected), composable with dplyr verbs - dplyr::collect()
it (or write it with duckplyr::compute_parquet()) when you need the rows
in R.
Examples
if (FALSE) { # \dontrun{
upload <- ssd_upload_azure("https://acct.blob.core.windows.net", "results")
ssd_summarise_uploaded(upload, "hc")
ssd_summarise_uploaded(upload, "hc", drop_samples = FALSE) # keep samples
} # }