Skip to content

Automation

BigQuery Native

The simplest way to automatically synchronise the output EVENTS date-partitioned table is to use the native flow runner (RUN_FLOW) function, which is deployed with your core resources by default.

Automation Logic

The RUN_FLOW function takes two arguments: start_date DATE and end_date DATE, and the behaviour of the function varies depending on the combination of values for each:

Case start_date end_date Action Description
Case 1 NULL NULL Rebuild Full Table
Case 2 [VALID DATE] [VALID DATE] Replace Date Partition Range
Case 3 NULL [VALID DATE] Replace All Date Partitions to end_date
Case 4 [VALID_DATE] NULL Incremental Refresh from start_date

When the deployment function is initially run, a new EVENTS table is created as a full rebuild (Case 1), which will contain modelled event data corresponding to all inbound event_* data (i.e. the row count will be identical).

Case 1: Rebuild

A full table rebuild is executed of the destination EVENTS table from the GA4_EVENTS date-bounded table function, between the first observed inbound date shard and the CURRENT_DATE. Any schema changes require this rebuild to be run. However note that this will query all source data and should therefore be used sparingly to manage costs.

RUN_FLOW: Rebuild Full Table

CALL [dataset_id].RUN_FLOW (NULL, NULL);
Case 2: Range

The destination EVENTS table date partitions for a specific date range are replaced with the date partitions returned from the GA4_EVENTS date-bounded table function for the same date range. This provides an efficient mechanism for updating and testing logic changes which do not impact the schema.

RUN_FLOW: Replace Date Partition Range (specified range)

CALL [dataset_id].RUN_FLOW ('2024-01-01', '2024-01-15');

RUN_FLOW: Replace Date Partition Range (specified start_date to CURRENT_DATE)

CALL [dataset_id].RUN_FLOW ('2024-01-01', CURRENT_DATE);

RUN_FLOW: Replace Date Partition Range (14 days to CURRENT_DATE)

CALL [dataset_id].RUN_FLOW (CURRENT_DATE - 14, CURRENT_DATE);
Case 3: Start Range

The destination EVENTS table date partitions between the first observed inbound date shard and the specified end_date are replaced with the partitions returned from the GA4_EVENTS date-bounded table function for the same date range. This provides a quick mechanism for replacing a start partition range.

RUN_FLOW: Replace All Date Partitions to end_date

CALL [dataset_id].RUN_FLOW (NULL, '2020-12-31');

RUN_FLOW: Replace All Date Partitions to CURRENT_DATE

CALL [dataset_id].RUN_FLOW (NULL, CURRENT_DATE);
Case 4: Incremental

The incremental refresh queries source and destination metadata to identify whether any new date partitions have been received, and inserts new date partitions into the destination EVENTS table from the GA4_EVENTS date-bounded table function. If no new date partitions are identified the function does not execute any queries and therefore does not incur costs beyond metadata queries, which are extremely inexpensive.

Incremental mode is used to control for unpredictable inbound data timing, as they can be run periodically (e.g. every hour or 30 mins) and will execute the incremental refresh only when new source data is identified.

The start_date is used to define the observation window in which the new date partitions will be identified.

RUN_FLOW: Incremental, 7 day observation window

CALL [dataset_id].RUN_FLOW (CURRENT_DATE - 7, NULL);

RUN_FLOW: Incremental, 14 day observation window

CALL [dataset_id].RUN_FLOW (CURRENT_DATE - 14, NULL);

Automation Deployment

Native automation is achieved using BigQuery Scheduled Queries, which needs to be enabled for the project. Incremental refresh should be used to minimise costs and control for unpredictable inbound data timing, and it is recommended to schedule the incremental refresh every hour (and not more frequently than every 30 minutes).

It is also recommended to use the query label scheduled_query_id to support integration with job-level cost data, enabling granular cost tracking by GA4 property.

RUN_FLOW: Incremental, 7 day observation window with query_label

SET @@query_label = "scheduled_query_id:ga4_dataset_id";

CALL [dataset_id].RUN_FLOW (CURRENT_DATE - 7, NULL);