Skip to main content

Configurable Variables

Module-level configuration

Path: includes/custom/modules/anomaly_detection/config.json (or .yaml / .js)

FieldTypeRequiredNotes
enabledbooleanYesEnables the module when true
versionnumberYesConfig version marker
casesstring[]YesCase filenames (without extension) to execute

Default values are defined in includes/core/modules/anomaly_detection/default_config.json.

Case-level configuration

Path: includes/custom/modules/anomaly_detection/cases/{case}.yaml

Data source and series mapping

FieldTypeRequiredDescription
case_data_projectstringYesSource project
case_data_datasetstringYesSource dataset
case_data_tablestringYesSource table
time_series_timestamp_colstringYesTimestamp/date column used by the model
time_series_data_colstringYesMetric expression or column to aggregate
time_series_data_aggstringNoAggregation method: sum, count, or count_distinct. Defaults to sum when omitted
time_series_id_colsstring[]NoDimension columns that define a unique time series. Omit or set to [] to model a single global series
top_n_time_seriesnumberNoLimits training to the top N series by total training-window volume. Omit or set to null to include all series
metric_capnumberNoCaps the metric value at this number before model training using LEAST(_y, metric_cap). Prevents extreme outliers from skewing the ARIMA model. Omit or set to null to disable capping

Training window

Use one complete mode:

ModeRequired fields
Explicit datestraining_start_date, training_end_date
Rolling offsetstraining_end_days_ago, training_window_days

Explicit dates take precedence when both modes are present. If neither mode is fully defined, the module throws a compile-time error.

Model options

FieldTypeDescription
model_typestringBQML model type; ARIMA_PLUS is the recommended default
data_frequencystringTime grain of the series; use DAILY for GA4 data
decompose_time_seriesbooleanSeparates trend and seasonality components before fitting
clean_spikes_and_dipsbooleanRemoves transient spikes from training to improve baseline quality
adjust_step_changesbooleanHandles permanent level shifts in the series
auto_arimabooleanLets BQML select the best ARIMA order automatically
model_versionnumber or stringAppended to the model name to version the BQML artifact
model_cronstringCron expression controlling when the time-series preparation and model training actions run. On days that match the expression, the pipeline rebuilds the training table and retrains the model. On non-matching days, those two actions are disabled entirely and the existing model is left in place. Anomaly scoring always depends on the most recently trained model.

Series quality thresholds

FieldUsed inDescription
training_min_series_daysmodel trainingMinimum number of rows per series required to include it in training
training_min_series_avgmodel trainingMinimum average metric value per series required to include it in training
detection_min_series_daysscoring flagsMinimum rows in the detection window for a series to be marked as strong
detection_min_series_avgscoring flagsMinimum average metric in the detection window for a series to be marked as strong

Detection window and sensitivity

FieldDescription
anomaly_detection_end_days_agoHow many days ago detection ends (e.g., 1 targets yesterday)
anomaly_detection_window_daysNumber of days to score (e.g., 1 scores a single day)
anomaly_prob_thresholdProbability threshold passed to ML.DETECT_ANOMALIES; higher values reduce false positives

Validation notes

  • time_series_data_agg must be one of: sum, count, count_distinct.
  • time_series_id_cols must be an array when provided; it may be empty.
  • top_n_time_series must be a positive integer when set.
  • model_cron must be a valid cron expression when set.
  • Training dates must resolve to exactly one valid mode (explicit or rolling). If neither mode is fully defined, the module raises a compile-time error.