Query Library

This page contains ready-to-use SQL queries for exploring and working with the GA4Dataform anomaly_detection_report table. Copy and adapt these queries to your project and use case.

Exploration queries

See recent anomalies

Start simple. Peek at anomalies detected in the last week:

SELECT
  date,
  case_name,
  time_series_id,
  metric_value,
  lower_bound,
  upper_bound,
  is_strong_series
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
  AND is_anomaly = TRUE
  AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
ORDER BY date DESC, case_name

Run this daily to browse recent results. Do they look reasonable? Are there patterns worth noting?

Calculate deviation percentage

Find anomalies and quantify how far they deviate from the expected range:

SELECT
  date,
  case_name,
  time_series_id,
  metric_value,
  lower_bound,
  upper_bound,
  is_strong_series,
  CASE
    WHEN metric_value > upper_bound
      THEN ROUND(100 * (metric_value - upper_bound) / upper_bound, 1)
    WHEN metric_value < lower_bound
      THEN ROUND(100 * (lower_bound - metric_value) / lower_bound, 1)
    ELSE 0
  END AS percent_off_expected
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
  AND is_anomaly = TRUE
  AND is_strong_series = TRUE
  AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
ORDER BY percent_off_expected DESC

This adds a percent_off_expected column that shows the magnitude of deviation. For example:

30% = "30% fewer than expected"
5% = "Only 5% change"

Use this to determine an appropriate alert threshold. A threshold of 15% or above may be a reasonable starting point; adjust based on your tolerance for noise.

Alert candidate queries

Find persistent anomalies

Alert on anomalies that appear on multiple days (persistence). This removes single-day blips:

WITH recent_anomalies AS (
  SELECT
    case_name,
    time_series_id,
    COUNT(*) as anomaly_days
  FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
  WHERE source = 'anomaly_detection'
    AND is_anomaly = TRUE
    AND is_strong_series = TRUE
    AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)
  GROUP BY case_name, time_series_id
)
SELECT *
FROM recent_anomalies
WHERE anomaly_days >= 2

This finds time series that showed anomalies on at least 2 days in the last 3 days. These are worth paying attention to.

Alert based on impact threshold

Find anomalies that are both persistent and significant:

WITH candidate_anomalies AS (
  SELECT
    date,
    case_name,
    time_series_id,
    metric_value,
    lower_bound,
    upper_bound,
    CASE
      WHEN metric_value > upper_bound
        THEN ROUND(100 * (metric_value - upper_bound) / upper_bound, 1)
      WHEN metric_value < lower_bound
        THEN ROUND(100 * (lower_bound - metric_value) / lower_bound, 1)
      ELSE 0
    END AS percent_off_expected
  FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
  WHERE source = 'anomaly_detection'
    AND is_anomaly = TRUE
    AND is_strong_series = TRUE
    AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
),
persistence_check AS (
  SELECT
    case_name,
    time_series_id,
    COUNT(*) as anomaly_days
  FROM candidate_anomalies
  GROUP BY case_name, time_series_id
)
SELECT
  c.date,
  c.case_name,
  c.time_series_id,
  c.metric_value,
  c.lower_bound,
  c.upper_bound,
  c.percent_off_expected,
  p.anomaly_days
FROM candidate_anomalies c
JOIN persistence_check p USING (case_name, time_series_id)
WHERE p.anomaly_days >= 2
  AND c.percent_off_expected >= 15  -- Adjust threshold as needed
ORDER BY c.percent_off_expected DESC, c.date DESC

Adjust the percent_off_expected >= 15 threshold based on your tolerance for false positives.

Monitoring queries

Anomaly volume by case

See how many anomalies each case is generating:

SELECT
  case_name,
  COUNT(*) as anomaly_count,
  COUNT(DISTINCT time_series_id) as affected_series,
  MIN(date) as first_anomaly,
  MAX(date) as latest_anomaly
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
  AND is_anomaly = TRUE
  AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY case_name
ORDER BY anomaly_count DESC

Use this to understand which cases are producing the most alerts and whether patterns are changing over time.

Most affected series

See which time series are generating the most anomalies:

SELECT
  case_name,
  time_series_id,
  COUNT(*) as anomaly_count,
  MIN(date) as first_anomaly,
  MAX(date) as latest_anomaly
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
  AND is_anomaly = TRUE
  AND is_strong_series = TRUE
  AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY case_name, time_series_id
ORDER BY anomaly_count DESC
LIMIT 20

This helps you identify chronic problems (series that alert repeatedly) versus one-off blips.

Debugging queries

Check series strength

Verify which series have enough data to be reliable:

SELECT
  case_name,
  time_series_id,
  is_strong_series,
  COUNT(*) as total_observations,
  COUNT(CASE WHEN source = 'training' THEN 1 END) as training_rows,
  COUNT(CASE WHEN source = 'anomaly_detection' THEN 1 END) as detection_rows
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
GROUP BY case_name, time_series_id, is_strong_series
HAVING COUNT(*) < 100
ORDER BY case_name, is_strong_series DESC

This shows series that might not have enough data for reliable detection.

Training data coverage

See which date ranges are covered by training data:

SELECT
  case_name,
  time_series_id,
  MIN(date) as first_date,
  MAX(date) as last_date,
  COUNT(*) as observation_count
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'training'
GROUP BY case_name, time_series_id
ORDER BY last_date DESC, case_name

Use this to confirm the module is training on the expected date range.

Exploration queries​

See recent anomalies​

Calculate deviation percentage​

Alert candidate queries​

Find persistent anomalies​

Alert based on impact threshold​

Monitoring queries​

Anomaly volume by case​

Most affected series​

Debugging queries​

Check series strength​

Training data coverage​