Export to Google Cloud Storage
This feature is available exclusively for Premium users.
GA4Dataform can export your output tables to a Google Cloud Storage (GCS) bucket as partitioned Parquet files. This makes the modelled GA4 data available outside BigQuery - load it into Snowflake, Databricks, Microsoft Fabric, BigLake, DuckDB, or any engine that reads Parquet, without copying through a third-party tool.
The export is off by default. When enabled, it runs as a post_operations step appended to each table build: right after a table finishes, its rows are written to GCS.
How It Works​
- Partitioned tables (most output tables, partitioned by date) are written as hive-partitioned folders - one file set per date partition, under a
partition_column=YYYY-MM-DD/path. This layout is recognised natively by Snowflake external tables, BigLake, Spark, and DuckDB. - Non-partitioned tables (dimension/lookup tables) are written as a single Parquet dataset that is fully overwritten on each run.
- Incremental tables export only the current load window (the partitions processed in that run, mirroring the
date_checkpoint). Full-refresh tables re-export every partition. - The GCS path is namespaced by the table's BigQuery dataset, so multiple installs or properties exporting to the same bucket and prefix do not collide.
The export is best-effort: it is wrapped so that a failure (wrong bucket, missing permission, region mismatch) surfaces the error but does not fail the table build or block downstream tables. Failed export jobs remain visible in BigQuery job history (INFORMATION_SCHEMA.JOBS, and the GCP Cost Monitoring tables).
BigQuery EXPORT DATA requires the destination bucket to be in the same region (or multi-region) as the BigQuery datasets being exported. Create the bucket in the matching location before enabling the export.
Cost Considerations​
The export is not free - it adds two ongoing costs on top of your normal pipeline:
- BigQuery processing. Each
EXPORT DATAstatement is a query that scans the rows it writes out, billed like any other query (on-demand bytes scanned, or slot time on a reservation). Incremental tables export only the current load window, so each run re-scans roughly the partitions that run touched. Full-refresh (non-partitioned) tables re-scan and rewrite the entire table on every run, which is the bigger cost to watch. Partitioned tables also issue one export job per date partition. - GCS storage. The Parquet files accrue standard Cloud Storage storage cost.
ZSTDcompression (the default) keeps them compact. Because partitioned tables overwrite per-date and non-partitioned tables overwrite in full, storage tracks the size of the exported tables rather than growing unbounded - but it is a full second copy of every exported table.
- Start with
GCS_EXPORT_LAYERS: outputs(the default) so intermediate tables are not exported -allroughly doubles the export volume. - Use
GCS_EXPORT_EXCLUDE, or a per-moduleGCS_EXPORT.enabled, to export only the tables you actually consume downstream rather than everything. - The export jobs are labelled and show up in BigQuery job history and the GCP Cost Monitoring module, so you can track exactly what the export adds.
- A GCS lifecycle rule on the bucket can expire old Parquet files if you only need a rolling window downstream.
Prerequisites​
- A GCS bucket in the same region as your GA4Dataform datasets.
- The Dataform service account needs write access to that bucket.
The installer does not provision GCS permissions for you. Before enabling the export, you must grant the query-execution service account write access to the destination bucket yourself - assign roles/storage.objectAdmin (or roles/storage.objectCreator + roles/storage.objectViewer) on the bucket. Without it, every export job fails. Because the export is best-effort the table builds still succeed, so the missing permission shows up only as failed EXPORT DATA jobs in BigQuery job history rather than a workflow error.
The EXPORT DATA statements run as the custom GA4Dataform service account used for BigQuery query execution that the installer created in your project (not the Dataform default service agent, which only orchestrates). See Installer Permissions for how to identify it.
Enabling the Export (Global)​
Set the global GCS_EXPORT_* variables in workflow_settings.yaml:
vars:
# --- Export to GCS (off by default) ---
GCS_EXPORT_ENABLED: "true" # master switch
GCS_EXPORT_BUCKET: "my-ga4-export" # bucket name only - no gs://, no trailing slash
GCS_EXPORT_PREFIX: ga4dataform # object path prefix under the bucket
GCS_EXPORT_FORMAT: PARQUET # PARQUET is validated; other formats are unsupported here
GCS_EXPORT_COMPRESSION: ZSTD # ZSTD (default) | SNAPPY | GZIP | NONE
GCS_EXPORT_EXCLUDE: "" # comma-separated table names to skip
GCS_EXPORT_LAYERS: outputs # "outputs" (default) or "all" (also export intermediate tables)
With GCS_EXPORT_ENABLED: "true" and a bucket set, every output table is exported on its next run.
Global Settings​
| Variable | Default | Description |
|---|---|---|
GCS_EXPORT_ENABLED | "false" | Master switch. Must be "true" (string) to turn the export on. |
GCS_EXPORT_BUCKET | "" | Destination bucket name only (no gs://, no trailing slash). Empty = no-op. |
GCS_EXPORT_PREFIX | ga4dataform | Object path prefix under the bucket. |
GCS_EXPORT_FORMAT | PARQUET | Export format. Only PARQUET is validated. |
GCS_EXPORT_COMPRESSION | ZSTD | ZSTD, SNAPPY, GZIP, or NONE. |
GCS_EXPORT_EXCLUDE | "" | Comma-separated list of table names to skip. |
GCS_EXPORT_LAYERS | outputs | outputs exports output tables only; all also exports intermediate tables. |
Per-Module Override​
Each module exposes a GCS_EXPORT object in its config (includes/custom/modules/<module>/config.*). It inherits the global settings; set any field to override the global value for that module only. Empty values inherit the global default.
// includes/custom/modules/ga4/config.js
GCS_EXPORT: {
enabled: false, // OR'd with the global toggle - true force-enables this module
bucket: "", // empty inherits the global bucket
prefix: "", // empty inherits the global prefix
format: "", // empty inherits the global format (PARQUET)
compression: "", // empty inherits the global compression (ZSTD)
layers: "", // empty inherits the global layers (outputs)
exclude: [], // tables to skip, added to the global exclude list
}
Two fields are additive rather than overriding:
enabledis OR'd with the global toggle. Leaving the global switch off and settingenabled: trueon a single module exports only that module.excludeis merged with the global exclude list (global skips plus per-module skips).
Every other field overrides the global only when set to a truthy value.
Output Layout​
Given GCS_EXPORT_BUCKET: my-ga4-export and GCS_EXPORT_PREFIX: ga4dataform, a partitioned table ga4_sessions in the outputs dataset superform_outputs is written as:
gs://my-ga4-export/ga4dataform/superform_outputs/ga4_sessions/
session_date=2026-06-01/data-000000000000.parquet
session_date=2026-06-02/data-000000000000.parquet
...
A non-partitioned table is written as a single overwritten dataset:
gs://my-ga4-export/ga4dataform/superform_outputs/<table_name>/
data-000000000000.parquet
Loading the Data Elsewhere​
Because tables are written as standard hive-partitioned Parquet, downstream engines can read them directly. For example, in Snowflake you can create an external table over gs://my-ga4-export/ga4dataform/superform_outputs/ga4_sessions/ with session_date recognised as a partition column from the folder names.
Disabling the Export​
Set GCS_EXPORT_ENABLED: "false" (or clear GCS_EXPORT_BUCKET) to turn the export off globally. Already-exported files in GCS are left untouched - remove them from the bucket directly if no longer needed.