Time series data will be either classified as reasonable, questionable, or erroneous in the status_id column or NA if the value is missing.
Usage
classify_time_series_data(
data,
...,
date_time = "date_time",
value = "value",
questionable_min = 0,
questionable_max = 30,
erroneous_min = -0.5,
erroneous_max = 40,
questionable_rate = 2,
erroneous_rate = 5,
questionable_buffer = 1,
erroneous_buffer = 1,
gap_range = 5
)
Arguments
- data
A data frame.
- ...
These dots are for future extensions and must be empty.
- date_time
A string indicating the column name of the POSIXct vector.
- value
A string indicating the column name of the value vector.
- questionable_min
A numeric value indicating the lower bound of the questionable range of temperature values.
- questionable_max
A numeric value indicating the upper bound of the questionable range of temperature values.
- erroneous_min
A numeric value indicating the lower bound of the erroneous range of temperature values.
- erroneous_max
A numeric value indicating the upper bound of the erroneous range of temperature values.
- questionable_rate
A numeric value indicating the rate of change (temperature per hour) of temperature values that is considered questionable.
- erroneous_rate
A numeric value indicating the rate of change (temperature per hour) of temperature values that is considered erroneous.
- questionable_buffer
A numeric value indicating the buffer in hours for questionable values.
- erroneous_buffer
A numeric value indicating the number of hours buffer for erroneous values.
- gap_range
A numeric value indicating the number of hours between two non reasonable values that will be coded as questionable or erroneous.
Details
The function only works on a single time series.
The function will error if there are missing or duplicated date time.
The data is processed by:
Classifying the time series values based on their values (`questionable_min, questionable_max, erroneous_min, erroneous_max).
The rate of change to each value is then calculated and the values are classified based on the absolute rate of change (questionable_rate, erroneous_rate).
Adjacent values to all questionable/erroneous are then coded as questionable/erroneous.
Next any value within the time buffer of a questionable/erroneous value is classified as questionable/erroneous (questionable_buffer, erroneous_buffer).
In addition, ignoring the buffer, reasonable values between two questionable/erroneous values are coded as questionable if the hourly duration of the gap is within the (gap_range).
Examples
data <- data.frame(
date_time =
as.POSIXct(c(
"2021-05-07 08:00:00", "2021-05-07 09:00:00",
"2021-05-07 10:00:00", "2021-05-07 11:00:00", "2021-05-07 12:00:00",
"2021-05-07 13:00:00"
)),
water_temperature = c(4.124, 4.078, 4.102, 4.189, 4.243, 6.578)
)
classify_time_series_data(data, value = "water_temperature")