SQL Style Guide
1. SQL Style in This Repo (BigQuery + Dataform)
File & Object Structure
- One model per file; file names are
snake_case.sqlx. - Layering and naming:
- sources declared in JS (
01_sources/declarations.js) - intermediate models prefixed with
int_... - outputs without
int_(e.g.,ga4_events,ga4_sessions,ga4_transactions) - assertions under
assertions_*.sqlx
- sources declared in JS (
- Top-of-file block comment reserved for legal text placeholder.
Config Block First, Always
config { ... }is always at the top with:type: "incremental"ortype: "table"(assertions sometimestype: "operations").schema:pulled fromdataform.projectConfig.vars.*(never hard‑coded).tags:always includesdataform.projectConfig.vars.GA4_DATASETplus logical tags like"events","sessions","outputs","intermediate","assertions", or module tags (e.g.,"gcp_cost").description:short, action‑oriented sentence.- BigQuery options set inside
bigquery: { ... }:partitionBy:a date column (e.g.,event_date,session_date,transaction_date).clusterBy:IDs or frequently filtered dimensions (e.g.,session_id,event_name).labels:always viarequire("includes/core/helpers.js").helpers.storageLabels().
Incremental Logic & Pre-operations
- Incremental models declare and set a
date_checkpointin apre_operations { ... }block:- Uses
when(incremental(), ..., ...)to compute:coalesce(max(<date>)+1, date('${config.GA4_START_DATE}'))fromself()for incremental runs,- or
date('${config.GA4_START_DATE}')for full builds.
- Uses
pre_operationsalso sets@@query_labelto${helpers.executionLabels()}.- For incrementals, they DELETE existing rows newer than
date_checkpointfromself()before re‑inserting.
Common Query Patterns
- Extensive use of CTEs (
WITH ... AS (...)) for readability and staged logic. - Column naming is snake_case.
- Deduplication with
QUALIFYand window functions. - Use of
COALESCE,IF,SAFE_CAST, and helper‑generated SQL snippets. - Unnesting / parameter extraction driven by helper functions.
- Avoids hard‑coding; relies on config variables.
- Documentation for columns attached via
columns: require("includes/core/documentation/helpers.js").ga4*.
Dependencies & Ordering
dependencies:specified where needed.- Mix of string names and
{name: "table_name"}objects.
Assertions Style
- Live under
definitions/core/assertions/. - Generally
type: "table"(materialized) with cleartagsanddescription. - Use
DECLAREvariables and set defaults. - Log results into a central table via
assertion_logs.sqlx.
2. Dataform Code & Repository Conventions
Project Configuration & Variables
- Centralized in
workflow_settings.yaml. - All dataset names and labels as UPPER_SNAKE_CASE vars.
- Models read schema and tags from
dataform.projectConfig.vars.
Includes & Modularization
- Core, custom, documentation, and tests under
includes/. - Config merging pattern:
helpers.getConfig()merges{ ...coreConfig, ...customConfig }.
Tags
- Always present and consistent:
- First element:
dataform.projectConfig.vars.GA4_DATASET. - Then domain/layer/feature tags.
- First element:
JS Blocks inside .sqlx
js {
const { helpers } = require("includes/core/helpers");
const config = helpers.getConfig();
}
Pre-operation Blocks
- Used to declare variables, set labels, compute
date_checkpoint, and run DELETEs for incrementals. - Patterned use of
when(incremental(), 'SQL for inc', 'SQL for full').
Helpers & Generated SQL
- All reusable SQL is produced via helper functions (camelCase).
- Feature toggles read from config.
- Documentation columns augmented dynamically.
Premium Module Pattern
- Premium files listed in
installer_files_premium.yaml. - Models set
disabled:based on a feature flag.
BigQuery Specifics
bigqueryconfig always in modelconfig {}.- Partitioning/clustering consistent and aligned to access patterns.
- Labels enforced via helpers.
Dependencies & Assertions
- Assertions depend on
assertion_logs. - Use explicit dependencies in arrays.
Coding Style
- JS/TS: camelCase for functions; UPPER_SNAKE_CASE for constants.
- Variables in YAML: UPPER_SNAKE_CASE keys.
- SQL: snake_case columns; staged CTEs; explicit SELECT lists.
Documentation Integration
- Column docs centralized in JSON and attached via
config.columns. - Custom parameters can augment docs at build time.
Packages & Versions
package.jsonpins@dataform/core.- Version labels stored in
workflow_settings.yaml.
TL;DR – How to Imitate This Repo Perfectly
- Start each model with a complete
config {}. - Include a
js {}block to load helpers and merged config. - For incrementals, use a
pre_operationsblock to managedate_checkpointand DELETE overlaps. - Write snake_case SQL with staged CTEs, dedupe + QUALIFY, and helper‑generated snippets.
- Keep assertions as materialized tables with variables, dependencies, and logging.
- Centralize all “knobs” in
includes/custom/config.js; never hard‑code schema/project/labels.