Labeling
Labels in Google Cloud Platform are key-value paires you assign to ressources to manage costs at scale. Labels need to be attached to the ressources you are using and can be then use for filtering. As desribed in Google Documentation: "Information about labels are forwarded to the billing system that lets you break down your billed charges by label. With built-in billing reports, you can filter and group costs by resource labels. You can also use labels to query billing data exports."
We've added 5 different labels: Operation, Brand, Licence type, Package version, Tool.
They are defined as compilation variables in workflow_settings.yaml
file:
LABEL_EXECUTION_OPERATION: daily_workflow
LABEL_GENERIC_BRAND: superformlabs
LABEL_GENERIC_LICENSE_TYPE: premium
LABEL_GENERIC_PACKAGE_VERSION: 2-0-0
LABEL_GENERIC_TOOL: ga4dataform
- Labels are passed to tables using Dataform's built-in label configuration in the config block at the beginning of each model:
config {
type: "incremental",
description: "Intermediate incremental sessions table with modeling helpers implemented. 1 row per unique session_id. Contains only valid sessions.",
schema: dataform.projectConfig.vars.TRANSFORMATIONS_DATASET,
tags:[dataform.projectConfig.vars.GA4_DATASET,"sessions","intermediate"],
dependencies: ["source_categories"],
bigquery: {
partitionBy: "session_date",
clusterBy: [ "session_id" ],
labels: {
brand: "superformlabs",
license_type: "premium",
package_version: "2-0-0",
tool: "ga4dataform"
}
},
}
- Labels are passed to job executions using
set @@query_label
in pre_operations block:
pre_operations {
declare date_checkpoint DATE;
set @@query_label = "operation:daily_workflow, brand:superformlabs, license_type:premium, package_version:2-0-0, tool:ga4dataform";
set date_checkpoint = (
${when(incremental(),
`select
coalesce(max(session_date)+1, date('${config.GA4_START_DATE}'))
from ${self()}
where is_final = true`,
`select date('${config.GA4_START_DATE}')`)} /* the default, when it's not incremental */
);
-- delete some older data, since this may be updated later by GA4
${
when(incremental(),
`delete from ${self()} where session_date >= date_checkpoint`
)
}
}
Add Labels to existing tables
To add Labels to existing tables, use DDL statement ALTER TABLE
:
ALTER TABLE `superform_outputs_31337.ga4_events`
SET OPTIONS (
labels = [('brand', 'superformlabs'), ('tool', 'ga4dataform'), ('licence_type', 'core'), ('package_version', 'v1-17-0')]);
To make this operation easier, we've added a dedicated file in definitions/core/utility/update_labels.sqlx
. By default this file is disable. You need to enable this file before executing it. You can run it from your Dataform workspace or copy the compiled Query and run it directly in BigQuery.