Dataform Basics
This guide introduces the essential elements of the Dataform UI, tailored specifically for GA4Dataform users. If you are new to Dataform, we recommend starting with the overview section. If you’re already familiar with the GA4Dataform interface, feel free to skip ahead to the sections that interest you.
If you’ve installed GA4Dataform, authorized data processing, and scheduled your workflow using the optional next steps, you can benefit from GA4Dataform without ever opening the Dataform interface (theoricaly!). However, if you’re looking to customize GA4Dataform or are simply curious to learn more, this guide is here to help you.
Overview
Dataform is seamlessly integrated into Google Cloud Platform (GCP), providing a user-friendly interface to manage your Dataform project (including GA4Dataform project). Here’s how you can access Dataform from the GCP console:
- Use the search bar: Type "Dataform" into the search bar at the top of the console.
- Navigate via the menu: Open the top-left burger menu, go to BigQuery under "Products," and click on Dataform.
When you open Dataform, you’ll see a list of repositories—these are essentially your projects. If you’ve just installed GA4Dataform, you should find a repository named something like superform_analytics_*
.
Click on the repository you want to work on. Once inside, you’ll find 4 main sections:
-
Development Workspaces: Write your models and scripts, define data sources, set up lineage, configure table materialization and more...
-
Workflow Execution Logs: Track and review the execution history of your project.
-
Release and Scheduling: Schedule when your BigQuery tables ared refreshed and manage project compilation settings.
-
Settings: Configure project variables and connect your repository to GitHub.
This guide will take you through these sections to perform specific tasks and make the most of GA4Dataform.
Check worflow execution
In Dataform, a workflow defines the set of SQL queries that make up a data model. These queries are executed in a specific order to build tables in BigQuery. The complete execution of all these queries is referred to as a "workflow."
If you're not seeing the expected tables in BigQuery, or if tables appear incomplete, have missing dates, or other issues, the first thing to check is whether your workflows have been executed correctly and without errors.
Above is an example of workflows executed over time. Some executions are marked in green, indicating success, while others are red, indicating errors. By examining the Source column, you can determine for eah row, whether a single model, an entire workspace, or a full workflow was executed. You can drill down into each execution to review the detailed execution of all the models of the workflow you have selected.
Lastly, by clicking on a specific query, you can access detailed information about its process and failure raisons is thee is one. This is particularly useful for troubleshooting when a model hasn’t run as expected.
Check workflow execution after running GA4 installer
This section is specifically for users who have authorized data processing for their historical GA4 data via the optional steps after using the GA4 installer. If you’re new to Dataform, we recommend reviewing the previous Overview and How to Check Workflow Execution sections first.
If you’re allowing GA4Dataform to process a relatively large GA4 dataset, running the Superform workflow may take several minutes, dozens of minutes, or even hours. If you can't see the table outputs in BigQuery, follow these steps to check your workflow execution:
- Access Dataform.
- Open the
superform_analytics_*
repository. - Navigate to the Workflow Execution Logs section.
If the workflow status is not green yet, you can click the Refresh button. Feel free to click it multiple times until the workflow completes successfully and turns green.
Delete Dataform repository
Before being able to delete a repository you will have to delete Workspaces, Release and Worflow configurations:
Delete Release configurations:
Delete Workflow configurations:
Delete Workspaces:
Delete Repository:
finally come back to the Dataform main page where you will be able to delete the repository using the 3 dots under action on the right:
Rebuild tables
One of the core features of GA4Dataform is to build tables incrementally. This means that every day, new rows of GA4 data are appended to existing tables instead of rebuilding everything from scratch. However, there are situations where you may need to reprocess some tables due to bug fixes, metric definition changes, or updates in attribution logic that should be retroactively applied to your dataset. Rebuilding tables is typically a one-time action to address a specific need.
Outside of advanced use cases, you may occasionally need to rebuild specific tables after a GA4Dataform update. If this is necessary, we will mention it in our update communications. If you're unsure, feel free to reach out to our support team at support@ga4dataform.com
There are three ways to manage full rebuild (or full refresh) from Dataform UI:
- Within a workspace using Execute side panel.
- From the Release configurations section via Execute manual workflow side panel
- By creating a new workflow with the Create workflow configuration side panel
Overview
Using Dataform UI, the process of managing a full rebuild follows a common logic:
Model(s) selection
Select model(s) you want to rebuild by choosing individual model with Selection of Actions, selecting Model(s) with specific tag(s) via Selection of tags or selecting all models with All actions
Full refresh execution: Ensure that Run with full refresh is selected, regardless of where you are managing the rebuild.
Detailed Review of Side Panels
Rebuild tables from developement workspaces
You can open the Execute
side panel from a development Workspace by clicking the Start execution
drop-down menu.
Any selection from this menu will open the same Execute
side panel.
Once the side panel is open as seen in the overview:
- Select model(s) you want to rebuild using one of the three options: Select of Actions, Selection of tags or All actions
- Ensure that Run with full refresh is checked.
Run the workflow by clicking Start execution. Selected model(s) will be rebuild. Check workflow execution to track progress and verify completion.
Rebuild tables from Release configurations section
From the Release and Scheduling section, click Start execution:
This opens the Execute manual workflow side panel:
Follow these steps:
- Click Release configuration drop-down menu and choose production.
- Select model(s) you want to rebuild using one of the three options: Select of Actions, Selection of tags or All actions
- Ensure that Run with full refresh is checked.
Run the workflow by clicking Start execution. Selected model(s) will be rebuild. Check workflow execution to track progress and verify completion.
Rebuild tables from Workflow configurations
In the Release and Scheduling section, in Workflow configurations sub-category click Create:
This opens the Edit workflow configuration panel:
Be careful not to edit an existing workflow configuration. The GA4Dataform installer automatically creates a default workflow configuration called 'daily.' Do not modify this workflow unless you are certain of what you're doing, as changes may disrupt the daily data refresh.
The configuration process is similar to the Execute and Execute manual workflow side panels but includes additional options.
Similarities:
- Release configuration drop-down menu where you have to select production as in the Execute manual workflow
- Same way of selecting model(s) you want to rebuild - Select of Actions, Selection of tags, All actions
- Checking Run with full refresh if we want to execute a rebuild
Differences:
Schedule frequency:
You can choose to repeat the workflow automaticaly on specific schedule or run it "On-demand." For most rebuild scenarios as explain in the introduction of this chapter, select "On-demand."
Configuration ID:
The configuration ID serves as the name of the workflow. For example, if you're creating a workflow to rebuild session tables on demand, you could name it rebuild-sessions-on-demand.
Running an On-Demand Workflow
After saving your workflow configuration, since it is not scheduled, you will need to run it manually. Click the three dots on the right of the workflow configuration and select Run now.
Selected model(s) will be rebuild. Check workflow execution to track progress and verify completion.
Which one should I chose?
Now that you understand how to manage table rebuilds in Dataform, here is a summary of when to use each method:
-
Execute Side Panel (Workspace Development): Best for advanced users modifying models. If you are not working in custom folders, you likely won’t need to use this option frequently.
-
Execute Manual Workflow (Release Configurations): Ideal for one-time operations. However, configurations are not saved, meaning you must repeat the steps each time you need to rebuild tables.
-
Create Workflow Configuration: Similar to Execute manual workflow but allows you to save configurations. This is useful for users with custom queries or advanced GA4Dataform usage who need to perform rebuilds frequently.