Skip to main content

SQL Style Guide

1. SQL Style in This Repo (BigQuery + Dataform)

File & Object Structure

  • One model per file; file names are snake_case.sqlx.
  • Layering and naming:
    • sources declared in JS (01_sources/declarations.js)
    • intermediate models prefixed with int_...
    • outputs without int_ (e.g., ga4_events, ga4_sessions, ga4_transactions)
    • assertions under assertions_*.sqlx
  • Top-of-file block comment reserved for legal text placeholder.

Config Block First, Always

  • config { ... } is always at the top with:
    • type: "incremental" or type: "table" (assertions sometimes type: "operations").
    • schema: pulled from dataform.projectConfig.vars.* (never hard‑coded).
    • tags: always includes dataform.projectConfig.vars.GA4_DATASET plus logical tags like "events", "sessions", "outputs", "intermediate", "assertions", or module tags (e.g., "gcp_cost").
    • description: short, action‑oriented sentence.
    • BigQuery options set inside bigquery: { ... }:
      • partitionBy: a date column (e.g., event_date, session_date, transaction_date).
      • clusterBy: IDs or frequently filtered dimensions (e.g., session_id, event_name).
      • labels: always via require("includes/core/helpers.js").helpers.storageLabels().

Incremental Logic & Pre-operations

  • Incremental models declare and set a date_checkpoint in a pre_operations { ... } block:
    • Uses when(incremental(), ..., ...) to compute:
      • coalesce(max(<date>)+1, date('${config.GA4_START_DATE}')) from self() for incremental runs,
      • or date('${config.GA4_START_DATE}') for full builds.
  • pre_operations also sets @@query_label to ${helpers.executionLabels()}.
  • For incrementals, they DELETE existing rows newer than date_checkpoint from self() before re‑inserting.

Common Query Patterns

  • Extensive use of CTEs (WITH ... AS (...)) for readability and staged logic.
  • Column naming is snake_case.
  • Deduplication with QUALIFY and window functions.
  • Use of COALESCE, IF, SAFE_CAST, and helper‑generated SQL snippets.
  • Unnesting / parameter extraction driven by helper functions.
  • Avoids hard‑coding; relies on config variables.
  • Documentation for columns attached via columns: require("includes/core/documentation/helpers.js").ga4*.

Dependencies & Ordering

  • dependencies: specified where needed.
  • Mix of string names and {name: "table_name"} objects.

Assertions Style

  • Live under definitions/core/assertions/.
  • Generally type: "table" (materialized) with clear tags and description.
  • Use DECLARE variables and set defaults.
  • Log results into a central table via assertion_logs.sqlx.

2. Dataform Code & Repository Conventions

Project Configuration & Variables

  • Centralized in workflow_settings.yaml.
  • All dataset names and labels as UPPER_SNAKE_CASE vars.
  • Models read schema and tags from dataform.projectConfig.vars.

Includes & Modularization

  • Core, custom, documentation, and tests under includes/.
  • Config merging pattern: helpers.getConfig() merges { ...coreConfig, ...customConfig }.

Tags

  • Always present and consistent:
    • First element: dataform.projectConfig.vars.GA4_DATASET.
    • Then domain/layer/feature tags.

JS Blocks inside .sqlx

js {
const { helpers } = require("includes/core/helpers");
const config = helpers.getConfig();
}

Pre-operation Blocks

  • Used to declare variables, set labels, compute date_checkpoint, and run DELETEs for incrementals.
  • Patterned use of when(incremental(), 'SQL for inc', 'SQL for full').

Helpers & Generated SQL

  • All reusable SQL is produced via helper functions (camelCase).
  • Feature toggles read from config.
  • Documentation columns augmented dynamically.

Premium Module Pattern

  • Premium files listed in installer_files_premium.yaml.
  • Models set disabled: based on a feature flag.

BigQuery Specifics

  • bigquery config always in model config {}.
  • Partitioning/clustering consistent and aligned to access patterns.
  • Labels enforced via helpers.

Dependencies & Assertions

  • Assertions depend on assertion_logs.
  • Use explicit dependencies in arrays.

Coding Style

  • JS/TS: camelCase for functions; UPPER_SNAKE_CASE for constants.
  • Variables in YAML: UPPER_SNAKE_CASE keys.
  • SQL: snake_case columns; staged CTEs; explicit SELECT lists.

Documentation Integration

  • Column docs centralized in JSON and attached via config.columns.
  • Custom parameters can augment docs at build time.

Packages & Versions

  • package.json pins @dataform/core.
  • Version labels stored in workflow_settings.yaml.

TL;DR – How to Imitate This Repo Perfectly

  • Start each model with a complete config {}.
  • Include a js {} block to load helpers and merged config.
  • For incrementals, use a pre_operations block to manage date_checkpoint and DELETE overlaps.
  • Write snake_case SQL with staged CTEs, dedupe + QUALIFY, and helper‑generated snippets.
  • Keep assertions as materialized tables with variables, dependencies, and logging.
  • Centralize all “knobs” in includes/custom/config.js; never hard‑code schema/project/labels.