dataform

Protocol Documentation

Table of Contents

Top

configs.proto

ActionConfig

Action config defines the contents of actions.yaml configuration files.

Field Type Label Description
table ActionConfig.TableConfig    
view ActionConfig.ViewConfig    
incrementalTable ActionConfig.IncrementalTableConfig    
assertion ActionConfig.AssertionConfig    
operation ActionConfig.OperationConfig    
declaration ActionConfig.DeclarationConfig    
notebook ActionConfig.NotebookConfig    
dataPreparation ActionConfig.DataPreparationConfig    

ActionConfig.AssertionConfig

Field Type Label Description
name string   The name of the assertion.
dataset string   The dataset (schema) of the assertion.
project string   The Google Cloud project (database) of the assertion.
dependencyTargets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string   Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool   If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
description string   Description of the assertion.
hermetic bool   If true, this indicates that the action only depends on data from explicitly-declared dependencies. Otherwise if false, it indicates that the action depends on data from a source which has not been declared as a dependency.
dependOnDependencyAssertions bool   If true, assertions dependent upon any of the dependencies are added as dependencies as well.

ActionConfig.ColumnDescriptor

Field Type Label Description
path string repeated The identifier for the column, using multiple parts for nested records.
description string   A text description of the column.
bigqueryPolicyTags string repeated A list of BigQuery policy tags that will be applied to the column.
tags string repeated A list of tags for this column which will be applied.

ActionConfig.DataPreparationConfig

Field Type Label Description
name string   The name of the data preparation.
dependencyTargets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string   Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool   If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
description string   Description of the data preparation.

ActionConfig.DeclarationConfig

Field Type Label Description
name string   The name of the declaration.
dataset string   The dataset (schema) of the declaration.
project string   The Google Cloud project (database) of the declaration.
description string   Description of the declaration.
columns ActionConfig.ColumnDescriptor repeated Descriptions of columns within the declaration.

ActionConfig.IncrementalTableConfig

Field Type Label Description
name string   The name of the incremental table.
dataset string   The dataset (schema) of the incremental table.
project string   The Google Cloud project (database) of the incremental table.
dependencyTargets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string   Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool   If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
preOperations string repeated Queries to run before query. This can be useful for granting permissions.
postOperations string repeated Queries to run after query.
protected bool   If true, prevents the dataset from being rebuilt from scratch.
uniqueKey string repeated If set, unique key represents a set of names of columns that will act as a the unique key. To enforce this, when updating the incremental table, Dataform merges rows with uniqueKey instead of appending them.
description string   Description of the incremental table.
columns ActionConfig.ColumnDescriptor repeated Descriptions of columns within the table.
partitionBy string   The key by which to partition the table. Typically the name of a timestamp or the date column. See https://cloud.google.com/dataform/docs/partitions-clusters.
partitionExpirationDays int32   The number of days for which BigQuery stores data in each partition. The setting applies to all partitions in a table, but is calculated independently for each partition based on the partition time.
requirePartitionFilter bool   Declares whether the partitioned table requires a WHERE clause predicate filter that filters the partitioning column.
updatePartitionFilter string   SQL-based filter for when incremental updates are applied.
clusterBy string repeated The keys by which to cluster partitions by. See https://cloud.google.com/dataform/docs/partitions-clusters.
labels ActionConfig.IncrementalTableConfig.LabelsEntry repeated Key-value pairs for BigQuery labels.
additionalOptions ActionConfig.IncrementalTableConfig.AdditionalOptionsEntry repeated Key-value pairs of additional options to pass to the BigQuery API. Some options, for example, partitionExpirationDays, have dedicated type/validity checked fields. For such options, use the dedicated fields.
dependOnDependencyAssertions bool   When set to true, assertions dependent upon any dependency will be add as dedpendency to this action
assertions ActionConfig.TableAssertionsConfig   Assertions to be run on the dataset. If configured, relevant assertions will automatically be created and run as a dependency of this dataset.
hermetic bool   If true, this indicates that the action only depends on data from explicitly-declared dependencies. Otherwise if false, it indicates that the action depends on data from a source which has not been declared as a dependency.

ActionConfig.IncrementalTableConfig.AdditionalOptionsEntry

Field Type Label Description
key string    
value string    

ActionConfig.IncrementalTableConfig.LabelsEntry

Field Type Label Description
key string    
value string    

ActionConfig.NotebookConfig

Field Type Label Description
name string   The name of the notebook.
location string   The Google Cloud location of the notebook.
project string   The Google Cloud project (database) of the notebook.
dependencyTargets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string   Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool   If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
description string   Description of the notebook.
dependOnDependencyAssertions bool   When set to true, assertions dependent upon any dependency will be add as dedpendency to this action

ActionConfig.OperationConfig

Field Type Label Description
name string   The name of the operation.
dataset string   The dataset (schema) of the operation.
project string   The Google Cloud project (database) of the operation.
dependencyTargets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string   Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool   If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
hasOutput bool   Declares that this action creates a dataset which should be referenceable as a dependency target, for example by using the ref function.
description string   Description of the operation.
columns ActionConfig.ColumnDescriptor repeated Descriptions of columns within the operation. Can only be set if hasOutput is true.
dependOnDependencyAssertions bool   When set to true, assertions dependent upon any dependency will be add as dedpendency to this action
hermetic bool   If true, this indicates that the action only depends on data from explicitly-declared dependencies. Otherwise if false, it indicates that the action depends on data from a source which has not been declared as a dependency.

ActionConfig.TableAssertionsConfig

Options for shorthand specifying assertions, useable for some table-based action types.

Field Type Label Description
uniqueKey string repeated Column(s) which constitute the dataset's unique key index. If set, the resulting assertion will fail if there is more than one row in the dataset with the same values for all of these column(s).
uniqueKeys ActionConfig.TableAssertionsConfig.UniqueKey repeated  
nonNull string repeated Column(s) which may never be NULL. If set, the resulting assertion will fail if any row contains NULL values for these column(s).
rowConditions string repeated General condition(s) which should hold true for all rows in the dataset. If set, the resulting assertion will fail if any row violates any of these condition(s).

ActionConfig.TableAssertionsConfig.UniqueKey

Combinations of column(s), each of which should constitute a unique key index for the dataset. If set, the resulting assertion(s) will fail if there is more than one row in the dataset with the same values for all of the column(s) in the unique key(s).

Field Type Label Description
uniqueKey string repeated  

ActionConfig.TableConfig

Field Type Label Description
name string   The name of the table.
dataset string   The dataset (schema) of the table.
project string   The Google Cloud project (database) of the table.
dependencyTargets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string   Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool   If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
preOperations string repeated Queries to run before query. This can be useful for granting permissions.
postOperations string repeated Queries to run after query.
description string   Description of the table.
columns ActionConfig.ColumnDescriptor repeated Descriptions of columns within the table.
partitionBy string   The key by which to partition the table. Typically the name of a timestamp or the date column. See https://cloud.google.com/dataform/docs/partitions-clusters.
partitionExpirationDays int32   The number of days for which BigQuery stores data in each partition. The setting applies to all partitions in a table, but is calculated independently for each partition based on the partition time.
requirePartitionFilter bool   Declares whether the partitioned table requires a WHERE clause predicate filter that filters the partitioning column.
clusterBy string repeated The keys by which to cluster partitions by. See https://cloud.google.com/dataform/docs/partitions-clusters.
labels ActionConfig.TableConfig.LabelsEntry repeated Key-value pairs for BigQuery labels.
additionalOptions ActionConfig.TableConfig.AdditionalOptionsEntry repeated Key-value pairs of additional options to pass to the BigQuery API. Some options, for example, partitionExpirationDays, have dedicated type/validity checked fields. For such options, use the dedicated fields.
dependOnDependencyAssertions bool   When set to true, assertions dependent upon any dependency will be add as dedpendency to this action
assertions ActionConfig.TableAssertionsConfig   Assertions to be run on the dataset. If configured, relevant assertions will automatically be created and run as a dependency of this dataset.
hermetic bool   If true, this indicates that the action only depends on data from explicitly-declared dependencies. Otherwise if false, it indicates that the action depends on data from a source which has not been declared as a dependency.

ActionConfig.TableConfig.AdditionalOptionsEntry

Field Type Label Description
key string    
value string    

ActionConfig.TableConfig.LabelsEntry

Field Type Label Description
key string    
value string    

ActionConfig.Target

Target represents a unique action identifier.

Field Type Label Description
project string   The Google Cloud project (database) of the action.
dataset string   The dataset (schema) of the action. For notebooks, this is the location.
name string   The name of the action.
includeDependentAssertions bool   flag for when we want to add assertions of this dependency in dependency_targets as well.

ActionConfig.ViewConfig

Field Type Label Description
name string   The name of the view.
dataset string   The dataset (schema) of the view.
project string   The Google Cloud project (database) of the view.
dependencyTargets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string   Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool   If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
preOperations string repeated Queries to run before query. This can be useful for granting permissions.
postOperations string repeated Queries to run after query.
materialized bool   Applies the materialized view optimization, see https://cloud.google.com/bigquery/docs/materialized-views-intro.
description string   Description of the view.
columns ActionConfig.ColumnDescriptor repeated Descriptions of columns within the table.
labels ActionConfig.ViewConfig.LabelsEntry repeated Key-value pairs for BigQuery labels.
additionalOptions ActionConfig.ViewConfig.AdditionalOptionsEntry repeated Key-value pairs of additional options to pass to the BigQuery API. Some options, for example, partitionExpirationDays, have dedicated type/validity checked fields. For such options, use the dedicated fields.
dependOnDependencyAssertions bool   When set to true, assertions dependent upon any dependency will be add as dedpendency to this action
hermetic bool   If true, this indicates that the action only depends on data from explicitly-declared dependencies. Otherwise if false, it indicates that the action depends on data from a source which has not been declared as a dependency.
assertions ActionConfig.TableAssertionsConfig   Assertions to be run on the dataset. If configured, relevant assertions will automatically be created and run as a dependency of this dataset.

ActionConfig.ViewConfig.AdditionalOptionsEntry

Field Type Label Description
key string    
value string    

ActionConfig.ViewConfig.LabelsEntry

Field Type Label Description
key string    
value string    

ActionConfigs

Action configs defines the contents of actions.yaml configuration files.

Field Type Label Description
actions ActionConfig repeated  

NotebookRuntimeOptionsConfig

Field Type Label Description
outputBucket string   Storage bucket to output notebooks to after their execution.

WorkflowSettings

Workflow Settings defines the contents of the workflow_settings.yaml configuration file.

Field Type Label Description
dataformCoreVersion string   The desired dataform core version to compile against.
defaultProject string   Required. The default Google Cloud project (database).
defaultDataset string   Required. The default dataset (schema).
defaultLocation string   Required. The default BigQuery location to use. For more information on BigQuery locations, see https://cloud.google.com/bigquery/docs/locations.
defaultAssertionDataset string   Required. The default dataset (schema) for assertions.
vars WorkflowSettings.VarsEntry repeated Optional. User-defined variables that are made available to project code during compilation. An object containing a list of "key": value pairs.
projectSuffix string   Optional. The suffix to append to all Google Cloud project references.
datasetSuffix string   Optional. The suffix to append to all dataset references.
namePrefix string   Optional. The prefix to append to all action names.
defaultNotebookRuntimeOptions NotebookRuntimeOptionsConfig   Optional. Default runtime options for Notebook actions.

WorkflowSettings.VarsEntry

Field Type Label Description
key string    
value string    

Scalar Value Types

.proto Type Notes C++ Java Python Go C# PHP Ruby
double   double double float float64 double float Float
float   float float float float32 float float Float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int int32 int integer Bignum or Fixnum (as required)
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long int64 long integer/string Bignum
uint32 Uses variable-length encoding. uint32 int int/long uint32 uint integer Bignum or Fixnum (as required)
uint64 Uses variable-length encoding. uint64 long int/long uint64 ulong integer/string Bignum or Fixnum (as required)
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int int32 int integer Bignum or Fixnum (as required)
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long int64 long integer/string Bignum
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int int uint32 uint integer Bignum or Fixnum (as required)
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 long int/long uint64 ulong integer/string Bignum
sfixed32 Always four bytes. int32 int int int32 int integer Bignum or Fixnum (as required)
sfixed64 Always eight bytes. int64 long int/long int64 long integer/string Bignum
bool   bool boolean boolean bool bool boolean TrueClass/FalseClass
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode string string string String (UTF-8)
bytes May contain any arbitrary sequence of bytes. string ByteString str []byte ByteString string String (ASCII-8BIT)