dataform

Protocol Documentation

Table of Contents

Top

configs.proto

ActionConfig

Action config defines the contents of actions.yaml configuration files.

Field Type Label Description
table ActionConfig.TableConfig    
view ActionConfig.ViewConfig    
incremental_table ActionConfig.IncrementalTableConfig    
assertion ActionConfig.AssertionConfig    
operation ActionConfig.OperationConfig    
declaration ActionConfig.DeclarationConfig    
notebook ActionConfig.NotebookConfig    

ActionConfig.AssertionConfig

Field Type Label Description
name string   The name of the assertion.
dataset string   The dataset (schema) of the assertion.
project string   The Google Cloud project (database) of the assertion.
dependency_targets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string   Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool   If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
description string   Description of the assertion.

ActionConfig.ColumnDescriptor

Field Type Label Description
path string repeated The identifier for the column, using multiple parts for nested records.
description string   A text description of the column.
bigquery_policy_tags string repeated A list of BigQuery policy tags that will be applied to the column.

ActionConfig.DeclarationConfig

Field Type Label Description
name string   The name of the declaration.
dataset string   The dataset (schema) of the declaration.
project string   The Google Cloud project (database) of the declaration.
description string   Description of the declaration.
columns ActionConfig.ColumnDescriptor repeated Descriptions of columns within the declaration.

ActionConfig.IncrementalTableConfig

Field Type Label Description
name string   The name of the incremental table.
dataset string   The dataset (schema) of the incremental table.
project string   The Google Cloud project (database) of the incremental table.
dependency_targets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string   Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool   If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
pre_operations string repeated Queries to run before query. This can be useful for granting permissions.
post_operations string repeated Queries to run after query.
protected bool   If true, prevents the dataset from being rebuilt from scratch.
unique_key string repeated If set, unique key represents a set of names of columns that will act as a the unique key. To enforce this, when updating the incremental table, Dataform merges rows with uniqueKey instead of appending them.
description string   Description of the incremental table.
columns ActionConfig.ColumnDescriptor repeated Descriptions of columns within the table.
partition_by string   The key by which to partition the table. Typically the name of a timestamp or the date column. See https://cloud.google.com/dataform/docs/partitions-clusters.
partition_expiration_days int32   The number of days for which BigQuery stores data in each partition. The setting applies to all partitions in a table, but is calculated independently for each partition based on the partition time.
require_partition_filter bool   Declares whether the partitioned table requires a WHERE clause predicate filter that filters the partitioning column.
update_partition_filter string   SQL-based filter for when incremental updates are applied.
cluster_by string repeated The keys by which to cluster partitions by. See https://cloud.google.com/dataform/docs/partitions-clusters.
labels ActionConfig.IncrementalTableConfig.LabelsEntry repeated Key-value pairs for BigQuery labels. If the label name contains special characters, e.g. hyphens, then quote its name, e.g. labels: { "label-name": "value" }.
additional_options ActionConfig.IncrementalTableConfig.AdditionalOptionsEntry repeated Key-value pairs of additional options to pass to the BigQuery API.

Some options, for example, partitionExpirationDays, have dedicated type/validity checked fields. For such options, use the dedicated fields.

String values must be encapsulated in double-quotes, for example: additionalOptions: {numeric_option: "5", string_option: '"string-value"'}

If the option name contains special characters, encapsulate the name in quotes, for example: additionalOptions: { "option-name": "value" }.

ActionConfig.IncrementalTableConfig.AdditionalOptionsEntry

Field Type Label Description
key string    
value string    

ActionConfig.IncrementalTableConfig.LabelsEntry

Field Type Label Description
key string    
value string    

ActionConfig.NotebookConfig

Field Type Label Description
name string   The name of the notebook.
location string   The Google Cloud location of the notebook.
project string   The Google Cloud project (database) of the notebook.
dependency_targets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string   Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool   If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
description string   Description of the notebook.

ActionConfig.OperationConfig

Field Type Label Description
name string   The name of the operation.
dataset string   The dataset (schema) of the operation.
project string   The Google Cloud project (database) of the operation.
dependency_targets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string   Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool   If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
has_output bool   Declares that this action creates a dataset which should be referenceable as a dependency target, for example by using the ref function.
description string   Description of the operation.
columns ActionConfig.ColumnDescriptor repeated Descriptions of columns within the operation. Can only be set if hasOutput is true.

ActionConfig.TableConfig

Field Type Label Description
name string   The name of the table.
dataset string   The dataset (schema) of the table.
project string   The Google Cloud project (database) of the table.
dependency_targets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string   Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool   If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
pre_operations string repeated Queries to run before query. This can be useful for granting permissions.
post_operations string repeated Queries to run after query.
description string   Description of the table.
columns ActionConfig.ColumnDescriptor repeated Descriptions of columns within the table.
partition_by string   The key by which to partition the table. Typically the name of a timestamp or the date column. See https://cloud.google.com/dataform/docs/partitions-clusters.
partition_expiration_days int32   The number of days for which BigQuery stores data in each partition. The setting applies to all partitions in a table, but is calculated independently for each partition based on the partition time.
require_partition_filter bool   Declares whether the partitioned table requires a WHERE clause predicate filter that filters the partitioning column.
cluster_by string repeated The keys by which to cluster partitions by. See https://cloud.google.com/dataform/docs/partitions-clusters.
labels ActionConfig.TableConfig.LabelsEntry repeated Key-value pairs for BigQuery labels. If the label name contains special characters, e.g. hyphens, then quote its name, e.g. labels: { "label-name": "value" }.
additional_options ActionConfig.TableConfig.AdditionalOptionsEntry repeated Key-value pairs of additional options to pass to the BigQuery API.

Some options, for example, partitionExpirationDays, have dedicated type/validity checked fields. For such options, use the dedicated fields.

String values must be encapsulated in double-quotes, for example: additionalOptions: {numeric_option: "5", string_option: '"string-value"'}

If the option name contains special characters, encapsulate the name in quotes, for example: additionalOptions: { "option-name": "value" }.

ActionConfig.TableConfig.AdditionalOptionsEntry

Field Type Label Description
key string    
value string    

ActionConfig.TableConfig.LabelsEntry

Field Type Label Description
key string    
value string    

ActionConfig.Target

Target represents a unique action identifier.

Field Type Label Description
project string   The Google Cloud project (database) of the action.
dataset string   The dataset (schema) of the action. For notebooks, this is the location.
name string   The name of the action.

ActionConfig.ViewConfig

Field Type Label Description
name string   The name of the view.
dataset string   The dataset (schema) of the view.
project string   The Google Cloud project (database) of the view.
dependency_targets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string   Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool   If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
pre_operations string repeated Queries to run before query. This can be useful for granting permissions.
post_operations string repeated Queries to run after query.
materialized bool   Applies the materialized view optimization, see https://cloud.google.com/bigquery/docs/materialized-views-intro.
description string   Description of the view.
columns ActionConfig.ColumnDescriptor repeated Descriptions of columns within the table.
labels ActionConfig.ViewConfig.LabelsEntry repeated Key-value pairs for BigQuery labels. If the label name contains special characters, e.g. hyphens, then quote its name, e.g. labels: { "label-name": "value" }.
additional_options ActionConfig.ViewConfig.AdditionalOptionsEntry repeated Key-value pairs of additional options to pass to the BigQuery API.

Some options, for example, partitionExpirationDays, have dedicated type/validity checked fields. For such options, use the dedicated fields.

String values must be encapsulated in double-quotes, for example: additionalOptions: {numeric_option: "5", string_option: '"string-value"'}

If the option name contains special characters, encapsulate the name in quotes, for example: additionalOptions: { "option-name": "value" }.

ActionConfig.ViewConfig.AdditionalOptionsEntry

Field Type Label Description
key string    
value string    

ActionConfig.ViewConfig.LabelsEntry

Field Type Label Description
key string    
value string    

ActionConfigs

Action configs defines the contents of actions.yaml configuration files. TODO(ekrekr): consolidate these configuration options in the JS API.

Field Type Label Description
actions ActionConfig repeated  

NotebookRuntimeOptionsConfig

Field Type Label Description
output_bucket string   Storage bucket to output notebooks to after their execution.

WorkflowSettings

Workflow Settings defines the contents of the workflow_settings.yaml configuration file.

Field Type Label Description
dataform_core_version string   The desired dataform core version to compile against.
default_project string   Required. The default Google Cloud project (database).
default_dataset string   Required. The default dataset (schema).
default_location string   Required. The default BigQuery location to use. For more information on BigQuery locations, see https://cloud.google.com/bigquery/docs/locations.
default_assertion_dataset string   Required. The default dataset (schema) for assertions.
vars WorkflowSettings.VarsEntry repeated Optional. User-defined variables that are made available to project code during compilation. An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.
project_suffix string   Optional. The suffix to append to all Google Cloud project references.
dataset_suffix string   Optional. The suffix to append to all dataset references.
name_prefix string   Optional. The prefix to append to all action names.
default_notebook_runtime_options NotebookRuntimeOptionsConfig   Optional. Default runtime options for Notebook actions.

WorkflowSettings.VarsEntry

Field Type Label Description
key string    
value string    

Scalar Value Types

.proto Type Notes C++ Java Python Go C# PHP Ruby
double   double double float float64 double float Float
float   float float float float32 float float Float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int int32 int integer Bignum or Fixnum (as required)
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long int64 long integer/string Bignum
uint32 Uses variable-length encoding. uint32 int int/long uint32 uint integer Bignum or Fixnum (as required)
uint64 Uses variable-length encoding. uint64 long int/long uint64 ulong integer/string Bignum or Fixnum (as required)
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int int32 int integer Bignum or Fixnum (as required)
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long int64 long integer/string Bignum
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int int uint32 uint integer Bignum or Fixnum (as required)
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 long int/long uint64 ulong integer/string Bignum
sfixed32 Always four bytes. int32 int int int32 int integer Bignum or Fixnum (as required)
sfixed64 Always eight bytes. int64 long int/long int64 long integer/string Bignum
bool   bool boolean boolean bool bool boolean TrueClass/FalseClass
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode string string string String (UTF-8)
bytes May contain any arbitrary sequence of bytes. string ByteString str []byte ByteString string String (ASCII-8BIT)