Skip to main content

Dataform Basics

This guide introduces the essential elements of the Dataform UI, tailored specifically for GA4Dataform users. If you are new to Dataform, we recommend starting with the overview section. If you’re already familiar with the GA4Dataform interface, feel free to skip ahead to the sections that interest you.

note

If you’ve installed GA4Dataform, authorized data processing, and scheduled your workflow using the optional next steps, you can benefit from GA4Dataform without ever opening the Dataform interface (theoricaly!). However, if you’re looking to customize GA4Dataform or are simply curious to learn more, this guide is here to help you.

Overview

Dataform is seamlessly integrated into Google Cloud Platform (GCP), providing a user-friendly interface to manage your Dataform project (including GA4Dataform project). Here’s how you can access Dataform from the GCP console:

  • Use the search bar: Type "Dataform" into the search bar at the top of the console.
  • Navigate via the menu: Open the top-left burger menu, go to BigQuery under "Products," and click on Dataform.

screenshot

When you open Dataform, you’ll see a list of repositories—these are essentially your projects. If you’ve just installed GA4Dataform, you should find a repository named something like superform_analytics_*.

screenshot

Click on the repository you want to work on. Once inside, you’ll find 4 main sections:

screenshot

  • Development Workspaces: Write your models and scripts, define data sources, set up lineage, configure table materialization and more...

  • Workflow Execution Logs: Track and review the execution history of your project.

  • Release and Scheduling: Schedule when your BigQuery tables ared refreshed and manage project compilation settings.

  • Settings: Configure project variables and connect your repository to GitHub.

This guide will take you through these sections to perform specific tasks and make the most of GA4Dataform.

Check worflow execution

In Dataform, a workflow defines the set of SQL queries that make up a data model. These queries are executed in a specific order to build tables in BigQuery. The complete execution of all these queries is referred to as a "workflow."

If you're not seeing the expected tables in BigQuery, or if tables appear incomplete, have missing dates, or other issues, the first thing to check is whether your workflows have been executed correctly and without errors.

screenshot

Above is an example of workflows executed over time. Some executions are marked in green, indicating success, while others are red, indicating errors. By examining the Source column, you can determine for eah row, whether a single model, an entire workspace, or a full workflow was executed. You can drill down into each execution to review the detailed execution of all the models of the workflow you have selected.

screenshot

Lastly, by clicking on a specific query, you can access detailed information about its process and failure raisons is thee is one. This is particularly useful for troubleshooting when a model hasn’t run as expected.

Check workflow execution after running GA4 installer

note

This section is specifically for users who have authorized data processing for their historical GA4 data via the optional steps after using the GA4 installer. If you’re new to Dataform, we recommend reviewing the previous Overview and How to Check Workflow Execution sections first.

If you’re allowing GA4Dataform to process a relatively large GA4 dataset, running the Superform workflow may take several minutes, dozens of minutes, or even hours. If you can't see the table outputs in BigQuery, follow these steps to check your workflow execution:

  1. Access Dataform.
  2. Open the superform_analytics_* repository.
  3. Navigate to the Workflow Execution Logs section.

screenshot

If the workflow status is not green yet, you can click the Refresh button. Feel free to click it multiple times until the workflow completes successfully and turns green.

Delete Dataform repository

Before being able to delete a repository you will have to delete Workspaces, Release and Worflow configurations:

Delete Release configurations: screenshot

Delete Workflow configurations: screenshot

Delete Workspaces: screenshot

Delete Repository: finally come back to the Dataform main page where you will be able to delete the repository using the 3 dots under action on the right: screenshot

Rebuild tables

One of the core features of GA4Dataform is to build tables incrementally. This means that every day, new rows of GA4 data are appended to existing tables instead of rebuilding everything from scratch. However, there are situations where you may need to reprocess some tables due to bug fixes, metric definition changes, or updates in attribution logic that should be retroactively applied to your dataset. Rebuilding tables is typically a one-time action to address a specific need.

note

Outside of advanced use cases, you may occasionally need to rebuild specific tables after a GA4Dataform update. If this is necessary, we will mention it in our update communications. If you're unsure, feel free to reach out to our support team at support@ga4dataform.com

There are three ways to manage full rebuild (or full refresh) from Dataform UI:

  • Within a workspace using Execute side panel.
  • From the Release configurations section via Execute manual workflow side panel
  • By creating a new workflow with the Create workflow configuration side panel

Overview

Using Dataform UI, the process of managing a full rebuild follows a common logic:

Model(s) selection

Select model(s) you want to rebuild by choosing individual model with Selection of Actions, selecting Model(s) with specific tag(s) via Selection of tags or selecting all models with All actions

screenshot

Full refresh execution: Ensure that Run with full refresh is selected, regardless of where you are managing the rebuild.

screenshot

Detailed Review of Side Panels

Rebuild tables from developement workspaces

You can open the Execute side panel from a development Workspace by clicking the Start execution drop-down menu.

screenshot

Any selection from this menu will open the same Execute side panel.

screenshot

Once the side panel is open as seen in the overview:

  • Select model(s) you want to rebuild using one of the three options: Select of Actions, Selection of tags or All actions
  • Ensure that Run with full refresh is checked.

Run the workflow by clicking Start execution. Selected model(s) will be rebuild. Check workflow execution to track progress and verify completion.

Rebuild tables from Release configurations section

From the Release and Scheduling section, click Start execution:

screenshot

This opens the Execute manual workflow side panel:

screenshot

Follow these steps:

  • Click Release configuration drop-down menu and choose production.
  • Select model(s) you want to rebuild using one of the three options: Select of Actions, Selection of tags or All actions
  • Ensure that Run with full refresh is checked.

Run the workflow by clicking Start execution. Selected model(s) will be rebuild. Check workflow execution to track progress and verify completion.

Rebuild tables from Workflow configurations

In the Release and Scheduling section, in Workflow configurations sub-category click Create:

screenshot

This opens the Edit workflow configuration panel:

screenshot

warning

Be careful not to edit an existing workflow configuration. The GA4Dataform installer automatically creates a default workflow configuration called 'daily.' Do not modify this workflow unless you are certain of what you're doing, as changes may disrupt the daily data refresh.

The configuration process is similar to the Execute and Execute manual workflow side panels but includes additional options.

Similarities:
  • Release configuration drop-down menu where you have to select production as in the Execute manual workflow
  • Same way of selecting model(s) you want to rebuild - Select of Actions, Selection of tags, All actions
  • Checking Run with full refresh if we want to execute a rebuild
Differences:

Schedule frequency:

screenshot

You can choose to repeat the workflow automaticaly on specific schedule or run it "On-demand." For most rebuild scenarios as explain in the introduction of this chapter, select "On-demand."

Configuration ID:

screenshot

The configuration ID serves as the name of the workflow. For example, if you're creating a workflow to rebuild session tables on demand, you could name it rebuild-sessions-on-demand.

Running an On-Demand Workflow

After saving your workflow configuration, since it is not scheduled, you will need to run it manually. Click the three dots on the right of the workflow configuration and select Run now.

screenshot

Selected model(s) will be rebuild. Check workflow execution to track progress and verify completion.

Which one should I chose?

Now that you understand how to manage table rebuilds in Dataform, here is a summary of when to use each method:

  • Execute Side Panel (Workspace Development): Best for advanced users modifying models. If you are not working in custom folders, you likely won’t need to use this option frequently.

  • Execute Manual Workflow (Release Configurations): Ideal for one-time operations. However, configurations are not saved, meaning you must repeat the steps each time you need to rebuild tables.

  • Create Workflow Configuration: Similar to Execute manual workflow but allows you to save configurations. This is useful for users with custom queries or advanced GA4Dataform usage who need to perform rebuilds frequently.