Export to Google Cloud Storage

💎Premium Feature

This feature is available exclusively for Premium users.

GA4Dataform can export your output tables to a Google Cloud Storage (GCS) bucket as partitioned Parquet files. This makes the modelled GA4 data available outside BigQuery - load it into Snowflake, Databricks, Microsoft Fabric, BigLake, DuckDB, or any engine that reads Parquet, without copying through a third-party tool.

The export is off by default. When enabled, it runs as a post_operations step appended to each table build: right after a table finishes, its rows are written to GCS.

How It Works

Partitioned tables (most output tables, partitioned by date) are written as hive-partitioned folders - one file set per date partition, under a partition_column=YYYY-MM-DD/ path. This layout is recognised natively by Snowflake external tables, BigLake, Spark, and DuckDB.
Non-partitioned tables (dimension/lookup tables) are written as a single Parquet dataset that is fully overwritten on each run.
Incremental tables export only the current load window (the partitions processed in that run, mirroring the date_checkpoint). Full-refresh tables re-export every partition.
The GCS path is namespaced by the table's BigQuery dataset, so multiple installs or properties exporting to the same bucket and prefix do not collide.

The export is best-effort: it is wrapped so that a failure (wrong bucket, missing permission, region mismatch) surfaces the error but does not fail the table build or block downstream tables. Failed export jobs remain visible in BigQuery job history (INFORMATION_SCHEMA.JOBS, and the GCP Cost Monitoring tables).

Bucket must be co-located with your datasets

BigQuery EXPORT DATA requires the destination bucket to be in the same region (or multi-region) as the BigQuery datasets being exported. Create the bucket in the matching location before enabling the export.

Cost Considerations

The export is not free - it adds two ongoing costs on top of your normal pipeline:

BigQuery processing. Each EXPORT DATA statement is a query that scans the rows it writes out, billed like any other query (on-demand bytes scanned, or slot time on a reservation). Incremental tables export only the current load window, so each run re-scans roughly the partitions that run touched. Full-refresh (non-partitioned) tables re-scan and rewrite the entire table on every run, which is the bigger cost to watch. Partitioned tables also issue one export job per date partition.
GCS storage. The Parquet files accrue standard Cloud Storage storage cost. ZSTD compression (the default) keeps them compact. Because partitioned tables overwrite per-date and non-partitioned tables overwrite in full, storage tracks the size of the exported tables rather than growing unbounded - but it is a full second copy of every exported table.

Keep the cost in check

Start with GCS_EXPORT_LAYERS: outputs (the default) so intermediate tables are not exported - all roughly doubles the export volume.
Use GCS_EXPORT_EXCLUDE, or a per-module GCS_EXPORT.enabled, to export only the tables you actually consume downstream rather than everything.
The export jobs are labelled and show up in BigQuery job history and the GCP Cost Monitoring module, so you can track exactly what the export adds.
A GCS lifecycle rule on the bucket can expire old Parquet files if you only need a rolling window downstream.

Prerequisites

A GCS bucket in the same region as your GA4Dataform datasets.
The Dataform service account needs write access to that bucket.

Grant the storage role manually

The installer does not provision GCS permissions for you. Before enabling the export, you must grant the query-execution service account write access to the destination bucket yourself - assign roles/storage.objectAdmin (or roles/storage.objectCreator + roles/storage.objectViewer) on the bucket. Without it, every export job fails. Because the export is best-effort the table builds still succeed, so the missing permission shows up only as failed EXPORT DATA jobs in BigQuery job history rather than a workflow error.

The EXPORT DATA statements run as the custom GA4Dataform service account used for BigQuery query execution that the installer created in your project (not the Dataform default service agent, which only orchestrates). See Installer Permissions for how to identify it.

Enabling the Export (Global)

Set the global GCS_EXPORT_* variables in workflow_settings.yaml:

vars:
  # --- Export to GCS (off by default) ---
  GCS_EXPORT_ENABLED: "true"          # master switch
  GCS_EXPORT_BUCKET: "my-ga4-export"  # bucket name only - no gs://, no trailing slash
  GCS_EXPORT_PREFIX: ga4dataform      # object path prefix under the bucket
  GCS_EXPORT_FORMAT: PARQUET          # PARQUET is validated; other formats are unsupported here
  GCS_EXPORT_COMPRESSION: ZSTD        # ZSTD (default) | SNAPPY | GZIP | NONE
  GCS_EXPORT_EXCLUDE: ""              # comma-separated table names to skip
  GCS_EXPORT_LAYERS: outputs          # "outputs" (default) or "all" (also export intermediate tables)

With GCS_EXPORT_ENABLED: "true" and a bucket set, every output table is exported on its next run.

Global Settings

Variable	Default	Description
`GCS_EXPORT_ENABLED`	`"false"`	Master switch. Must be `"true"` (string) to turn the export on.
`GCS_EXPORT_BUCKET`	`""`	Destination bucket name only (no `gs://`, no trailing slash). Empty = no-op.
`GCS_EXPORT_PREFIX`	`ga4dataform`	Object path prefix under the bucket.
`GCS_EXPORT_FORMAT`	`PARQUET`	Export format. Only `PARQUET` is validated.
`GCS_EXPORT_COMPRESSION`	`ZSTD`	`ZSTD`, `SNAPPY`, `GZIP`, or `NONE`.
`GCS_EXPORT_EXCLUDE`	`""`	Comma-separated list of table names to skip.
`GCS_EXPORT_LAYERS`	`outputs`	`outputs` exports output tables only; `all` also exports intermediate tables.

Per-Module Override

Each module exposes a GCS_EXPORT object in its config (includes/custom/modules/<module>/config.*). It inherits the global settings; set any field to override the global value for that module only. Empty values inherit the global default.

// includes/custom/modules/ga4/config.js
GCS_EXPORT: {
  enabled: false,   // OR'd with the global toggle - true force-enables this module
  bucket: "",       // empty inherits the global bucket
  prefix: "",       // empty inherits the global prefix
  format: "",       // empty inherits the global format (PARQUET)
  compression: "",  // empty inherits the global compression (ZSTD)
  layers: "",       // empty inherits the global layers (outputs)
  exclude: [],      // tables to skip, added to the global exclude list
}

Two fields are additive rather than overriding:

enabled is OR'd with the global toggle. Leaving the global switch off and setting enabled: true on a single module exports only that module.
exclude is merged with the global exclude list (global skips plus per-module skips).

Every other field overrides the global only when set to a truthy value.

Output Layout

Given GCS_EXPORT_BUCKET: my-ga4-export and GCS_EXPORT_PREFIX: ga4dataform, a partitioned table ga4_sessions in the outputs dataset superform_outputs is written as:

gs://my-ga4-export/ga4dataform/superform_outputs/ga4_sessions/
  session_date=2026-06-01/data-000000000000.parquet
  session_date=2026-06-02/data-000000000000.parquet
  ...

A non-partitioned table is written as a single overwritten dataset:

gs://my-ga4-export/ga4dataform/superform_outputs/<table_name>/
  data-000000000000.parquet

Loading the Data Elsewhere

Because tables are written as standard hive-partitioned Parquet, downstream engines can read them directly. For example, in Snowflake you can create an external table over gs://my-ga4-export/ga4dataform/superform_outputs/ga4_sessions/ with session_date recognised as a partition column from the folder names.

Disabling the Export

Set GCS_EXPORT_ENABLED: "false" (or clear GCS_EXPORT_BUCKET) to turn the export off globally. Already-exported files in GCS are left untouched - remove them from the bucket directly if no longer needed.

How It Works​

Cost Considerations​

Prerequisites​

Enabling the Export (Global)​

Global Settings​

Per-Module Override​

Output Layout​

Loading the Data Elsewhere​

Disabling the Export​

How It Works

Cost Considerations

Prerequisites

Enabling the Export (Global)

Global Settings

Per-Module Override

Output Layout

Loading the Data Elsewhere

Disabling the Export