Note that this site is now deprecated and will be removed shortly. The latest information and documentation on the
flowfunctions library and framework can be found at the flowfunctions.io site.
Flow configuration takes place inside the main orchestrator function for simplicity, although it could technically reside in a BigQuery table, external table (e.g. Google Sheet) or anywhere else the data can be accessed from BigQuery (e.g. JSON files in a GCS bucket or even an external database).
There are two configurations required:
The flow input configuration (
flow_input_config) defines each source table (
source_table_ref), source table partitioning column (
source_table_partitioning) and the view ref (
input_view_ref) of the entry point of the source data subset into into the transformation flow. It is represented by an ARRAY of STRUCTS with the following schema:
ARRAY<STRUCT<source_table_ref STRING, source_table_partitioning STRING, input_view_ref STRING>>
DECLARE flow_input_config ARRAY<STRUCT<source_table_ref STRING, source_table_partitioning STRING, input_view_ref STRING>> DEFAULT [ ('project_id.firebase_dataset.events_*', '_TABLE_SUFFIX', 'project_id.events_flow.001_inbound_events') ];
Note that as the example is a Firebase sharded table, in the configuration declaration we set the
_TABLE_SUFFIX. For partitioned source tables we would simply use the partition column name. For non-partitioned tables we pass an empty string
The flow output configuration (
flow_output_config) defines the target output tables (
output_table_ref), partitioning (
output_table_partitioning) and the source table or view from which to feed the output table (
output_source_ref). It is represented by an ARRAY of STRUCTS with the following schema.
ARRAY<STRUCT<output_table_ref STRING, output_table_partitioning STRING, output_source_ref STRING>>
DECLARE flow_output_config ARRAY<STRUCT<output_table_ref STRING, output_table_partitioning STRING, output_source_ref STRING>> DEFAULT [ ('project_id.events_flow._bi_events_with_flags', 'event_date','project_id.events_flow.402_events_with_flags'), ('project_id.events_flow._bi_summary_by_week', '','project_id.events_flow.601_summary_by_week') ];
This output configuration will feed two tables which the downstream dashboards can use as data sources, one partitioned by
event_date and one unpartitioned summary. Note that in this case all output tables are prefixed with a
_bi, which is a naming convention chosen to identify tables to which downstream Business Intelligence tools should connect.
Also note that to leave
output_table_partitioning underfined, you need to pass an empty string (
'') as the parameter value.
Flow Orchestrator Function
In order to keep the management and operation as simple and consistent as possible, there is only one function used to run the flow, which takes a single integer parameter depending on the number of days in addition to today which are to be processed:
CALL `project_id.firebase_dataset.run_transformation_flow`(last_n_days_to_replace INT64)
Flow Orchestrator Logic
The flow orchestrator function
run_transformation_flow executes the flow according the the following logical sequence: