Profiling
DEPRECATED
Note that this site is now deprecated and will be removed shortly. The latest information and documentation on the flowfunctions
library and framework can be found at the flowfunctions.io site.
Although the INFORMATION_SCHEMA provides us with some useful mechanisms for querying resource metadata, the following functions address more complex use-cases which go beyond the basic foundations.
Row Counts
Computation of the row counts in tables and views is a simple high-level metadata analysis approach which enables potential issues to be located, and which can be extremely helpful in data QA.
All Dataset Tables and Views
The metadata query SELECT * FROM project_id.dataset_name.__TABLES__
returns row counts for BASE TABLES
, but it returns row counts of zero for VIEWS
or EXTERNAL TABLES
. This function iterates around all of the tables and views one by one and computes the row count of each. When it has finished computing, the result can be viewd by clicking on the final VIEW RESULTS
action in the results pane.
CALL flowfunctions.profiling.get_dataset_row_counts (
dataset_ref -- STRING
);
CALL flowfunctionseu.profiling.get_dataset_row_counts (
dataset_ref -- STRING
);
If you want to use these row counts as part of a subsequent calulation or action, you have to use the function variation with an _out
suffix, as well as declare the variable which will hold the function response:
DECLARE row_counts ARRAY<STRUCT<table_ref STRING, records INT64>>;
CALL flowfunctions.profiling.get_dataset_row_counts (
dataset_ref, -- STRING
row_counts -- ARRAY<STRUCT<table_ref STRING, records INT64>>
);
DECLARE row_counts ARRAY<STRUCT<table_ref STRING, records INT64>>;
CALL flowfunctionseu.profiling.get_dataset_row_counts (
dataset_ref, -- STRING
row_counts -- ARRAY<STRUCT<table_ref STRING, records INT64>>
);
After function execution, the row_counts
variable will be populated with the response data and can be UNNESTED
from the array structure with the query syntax SELECT * FROM UNNEST(row_counts)
.