Skip to main content

Query Library

This page contains ready-to-use SQL queries for exploring and working with the GA4Dataform anomaly_detection_report table. Copy and adapt these queries to your project and use case.

Exploration queries

See recent anomalies

Start simple. Peek at anomalies detected in the last week:

SELECT
date,
case_name,
time_series_id,
metric_value,
lower_bound,
upper_bound,
is_strong_series
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
AND is_anomaly = TRUE
AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
ORDER BY date DESC, case_name

Run this daily to browse recent results. Do they look reasonable? Are there patterns worth noting?

Calculate deviation percentage

Find anomalies and quantify how far they deviate from the expected range:

SELECT
date,
case_name,
time_series_id,
metric_value,
lower_bound,
upper_bound,
is_strong_series,
CASE
WHEN metric_value > upper_bound
THEN ROUND(100 * (metric_value - upper_bound) / upper_bound, 1)
WHEN metric_value < lower_bound
THEN ROUND(100 * (lower_bound - metric_value) / lower_bound, 1)
ELSE 0
END AS percent_off_expected
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
AND is_anomaly = TRUE
AND is_strong_series = TRUE
AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
ORDER BY percent_off_expected DESC

This adds a percent_off_expected column that shows the magnitude of deviation. For example:

  • 30% = "30% fewer than expected"
  • 5% = "Only 5% change"

Use this to determine an appropriate alert threshold. A threshold of 15% or above may be a reasonable starting point; adjust based on your tolerance for noise.

Alert candidate queries

Find persistent anomalies

Alert on anomalies that appear on multiple days (persistence). This removes single-day blips:

WITH recent_anomalies AS (
SELECT
case_name,
time_series_id,
COUNT(*) as anomaly_days
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
AND is_anomaly = TRUE
AND is_strong_series = TRUE
AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)
GROUP BY case_name, time_series_id
)
SELECT *
FROM recent_anomalies
WHERE anomaly_days >= 2

This finds time series that showed anomalies on at least 2 days in the last 3 days. These are worth paying attention to.

Alert based on impact threshold

Find anomalies that are both persistent and significant:

WITH candidate_anomalies AS (
SELECT
date,
case_name,
time_series_id,
metric_value,
lower_bound,
upper_bound,
CASE
WHEN metric_value > upper_bound
THEN ROUND(100 * (metric_value - upper_bound) / upper_bound, 1)
WHEN metric_value < lower_bound
THEN ROUND(100 * (lower_bound - metric_value) / lower_bound, 1)
ELSE 0
END AS percent_off_expected
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
AND is_anomaly = TRUE
AND is_strong_series = TRUE
AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
),
persistence_check AS (
SELECT
case_name,
time_series_id,
COUNT(*) as anomaly_days
FROM candidate_anomalies
GROUP BY case_name, time_series_id
)
SELECT
c.date,
c.case_name,
c.time_series_id,
c.metric_value,
c.lower_bound,
c.upper_bound,
c.percent_off_expected,
p.anomaly_days
FROM candidate_anomalies c
JOIN persistence_check p USING (case_name, time_series_id)
WHERE p.anomaly_days >= 2
AND c.percent_off_expected >= 15 -- Adjust threshold as needed
ORDER BY c.percent_off_expected DESC, c.date DESC

Adjust the percent_off_expected >= 15 threshold based on your tolerance for false positives.

Monitoring queries

Anomaly volume by case

See how many anomalies each case is generating:

SELECT
case_name,
COUNT(*) as anomaly_count,
COUNT(DISTINCT time_series_id) as affected_series,
MIN(date) as first_anomaly,
MAX(date) as latest_anomaly
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
AND is_anomaly = TRUE
AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY case_name
ORDER BY anomaly_count DESC

Use this to understand which cases are producing the most alerts and whether patterns are changing over time.

Most affected series

See which time series are generating the most anomalies:

SELECT
case_name,
time_series_id,
COUNT(*) as anomaly_count,
MIN(date) as first_anomaly,
MAX(date) as latest_anomaly
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
AND is_anomaly = TRUE
AND is_strong_series = TRUE
AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY case_name, time_series_id
ORDER BY anomaly_count DESC
LIMIT 20

This helps you identify chronic problems (series that alert repeatedly) versus one-off blips.

Debugging queries

Check series strength

Verify which series have enough data to be reliable:

SELECT
case_name,
time_series_id,
is_strong_series,
COUNT(*) as total_observations,
COUNT(CASE WHEN source = 'training' THEN 1 END) as training_rows,
COUNT(CASE WHEN source = 'anomaly_detection' THEN 1 END) as detection_rows
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
GROUP BY case_name, time_series_id, is_strong_series
HAVING COUNT(*) < 100
ORDER BY case_name, is_strong_series DESC

This shows series that might not have enough data for reliable detection.

Training data coverage

See which date ranges are covered by training data:

SELECT
case_name,
time_series_id,
MIN(date) as first_date,
MAX(date) as last_date,
COUNT(*) as observation_count
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'training'
GROUP BY case_name, time_series_id
ORDER BY last_date DESC, case_name

Use this to confirm the module is training on the expected date range.