Query Library
This page contains ready-to-use SQL queries for exploring and working with the GA4Dataform anomaly_detection_report table. Copy and adapt these queries to your project and use case.
Exploration queries
See recent anomalies
Start simple. Peek at anomalies detected in the last week:
SELECT
date,
case_name,
time_series_id,
metric_value,
lower_bound,
upper_bound,
is_strong_series
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
AND is_anomaly = TRUE
AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
ORDER BY date DESC, case_name
Run this daily to browse recent results. Do they look reasonable? Are there patterns worth noting?
Calculate deviation percentage
Find anomalies and quantify how far they deviate from the expected range:
SELECT
date,
case_name,
time_series_id,
metric_value,
lower_bound,
upper_bound,
is_strong_series,
CASE
WHEN metric_value > upper_bound
THEN ROUND(100 * (metric_value - upper_bound) / upper_bound, 1)
WHEN metric_value < lower_bound
THEN ROUND(100 * (lower_bound - metric_value) / lower_bound, 1)
ELSE 0
END AS percent_off_expected
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
AND is_anomaly = TRUE
AND is_strong_series = TRUE
AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
ORDER BY percent_off_expected DESC
This adds a percent_off_expected column that shows the magnitude of deviation. For example:
- 30% = "30% fewer than expected"
- 5% = "Only 5% change"
Use this to determine an appropriate alert threshold. A threshold of 15% or above may be a reasonable starting point; adjust based on your tolerance for noise.
Alert candidate queries
Find persistent anomalies
Alert on anomalies that appear on multiple days (persistence). This removes single-day blips:
WITH recent_anomalies AS (
SELECT
case_name,
time_series_id,
COUNT(*) as anomaly_days
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
AND is_anomaly = TRUE
AND is_strong_series = TRUE
AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)
GROUP BY case_name, time_series_id
)
SELECT *
FROM recent_anomalies
WHERE anomaly_days >= 2
This finds time series that showed anomalies on at least 2 days in the last 3 days. These are worth paying attention to.
Alert based on impact threshold
Find anomalies that are both persistent and significant:
WITH candidate_anomalies AS (
SELECT
date,
case_name,
time_series_id,
metric_value,
lower_bound,
upper_bound,
CASE
WHEN metric_value > upper_bound
THEN ROUND(100 * (metric_value - upper_bound) / upper_bound, 1)
WHEN metric_value < lower_bound
THEN ROUND(100 * (lower_bound - metric_value) / lower_bound, 1)
ELSE 0
END AS percent_off_expected
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
AND is_anomaly = TRUE
AND is_strong_series = TRUE
AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
),
persistence_check AS (
SELECT
case_name,
time_series_id,
COUNT(*) as anomaly_days
FROM candidate_anomalies
GROUP BY case_name, time_series_id
)
SELECT
c.date,
c.case_name,
c.time_series_id,
c.metric_value,
c.lower_bound,
c.upper_bound,
c.percent_off_expected,
p.anomaly_days
FROM candidate_anomalies c
JOIN persistence_check p USING (case_name, time_series_id)
WHERE p.anomaly_days >= 2
AND c.percent_off_expected >= 15 -- Adjust threshold as needed
ORDER BY c.percent_off_expected DESC, c.date DESC
Adjust the percent_off_expected >= 15 threshold based on your tolerance for false positives.
Monitoring queries
Anomaly volume by case
See how many anomalies each case is generating:
SELECT
case_name,
COUNT(*) as anomaly_count,
COUNT(DISTINCT time_series_id) as affected_series,
MIN(date) as first_anomaly,
MAX(date) as latest_anomaly
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
AND is_anomaly = TRUE
AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY case_name
ORDER BY anomaly_count DESC
Use this to understand which cases are producing the most alerts and whether patterns are changing over time.
Most affected series
See which time series are generating the most anomalies:
SELECT
case_name,
time_series_id,
COUNT(*) as anomaly_count,
MIN(date) as first_anomaly,
MAX(date) as latest_anomaly
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'anomaly_detection'
AND is_anomaly = TRUE
AND is_strong_series = TRUE
AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY case_name, time_series_id
ORDER BY anomaly_count DESC
LIMIT 20
This helps you identify chronic problems (series that alert repeatedly) versus one-off blips.
Debugging queries
Check series strength
Verify which series have enough data to be reliable:
SELECT
case_name,
time_series_id,
is_strong_series,
COUNT(*) as total_observations,
COUNT(CASE WHEN source = 'training' THEN 1 END) as training_rows,
COUNT(CASE WHEN source = 'anomaly_detection' THEN 1 END) as detection_rows
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
GROUP BY case_name, time_series_id, is_strong_series
HAVING COUNT(*) < 100
ORDER BY case_name, is_strong_series DESC
This shows series that might not have enough data for reliable detection.
Training data coverage
See which date ranges are covered by training data:
SELECT
case_name,
time_series_id,
MIN(date) as first_date,
MAX(date) as last_date,
COUNT(*) as observation_count
FROM `{PROJECT}.{OUTPUTS_DATASET}.anomaly_detection_report`
WHERE source = 'training'
GROUP BY case_name, time_series_id
ORDER BY last_date DESC, case_name
Use this to confirm the module is training on the expected date range.