blob: da88983ee72c5be1784209adca33662b8510c14a [file] [log] [blame]
<html><body>
<style>
body, h1, h2, h3, div, span, p, pre, a {
margin: 0;
padding: 0;
border: 0;
font-weight: inherit;
font-style: inherit;
font-size: 100%;
font-family: inherit;
vertical-align: baseline;
}
body {
font-size: 13px;
padding: 1em;
}
h1 {
font-size: 26px;
margin-bottom: 1em;
}
h2 {
font-size: 24px;
margin-bottom: 1em;
}
h3 {
font-size: 20px;
margin-bottom: 1em;
margin-top: 1em;
}
pre, code {
line-height: 1.5;
font-family: Monaco, 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Lucida Console', monospace;
}
pre {
margin-top: 0.5em;
}
h1, h2, h3, p {
font-family: Arial, sans serif;
}
h1, h2, h3 {
border-bottom: solid #CCC 1px;
}
.toc_element {
margin-top: 0.5em;
}
.firstline {
margin-left: 2 em;
}
.method {
margin-top: 1em;
border: solid 1px #CCC;
padding: 1em;
background: #EEE;
}
.details {
font-weight: bold;
font-size: 14px;
}
</style>
<h1><a href="dataflow_v1b3.html">Dataflow API</a> . <a href="dataflow_v1b3.projects.html">projects</a> . <a href="dataflow_v1b3.projects.jobs.html">jobs</a> . <a href="dataflow_v1b3.projects.jobs.workItems.html">workItems</a></h1>
<h2>Instance Methods</h2>
<p class="toc_element">
<code><a href="#lease">lease(projectId, jobId, body=None, x__xgafv=None)</a></code></p>
<p class="firstline">Leases a dataflow WorkItem to run.</p>
<p class="toc_element">
<code><a href="#reportStatus">reportStatus(projectId, jobId, body=None, x__xgafv=None)</a></code></p>
<p class="firstline">Reports the status of dataflow WorkItems leased by a worker.</p>
<h3>Method Details</h3>
<div class="method">
<code class="details" id="lease">lease(projectId, jobId, body=None, x__xgafv=None)</code>
<pre>Leases a dataflow WorkItem to run.
Args:
projectId: string, Identifies the project this worker belongs to. (required)
jobId: string, Identifies the workflow job this worker belongs to. (required)
body: object, The request body.
The object takes the form of:
{ # Request to lease WorkItems.
&quot;unifiedWorkerRequest&quot;: { # Untranslated bag-of-bytes WorkRequest from UnifiedWorker.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
},
&quot;currentWorkerTime&quot;: &quot;A String&quot;, # The current timestamp at the worker.
&quot;workerCapabilities&quot;: [ # Worker capabilities. WorkItems might be limited to workers with specific
# capabilities.
&quot;A String&quot;,
],
&quot;requestedLeaseDuration&quot;: &quot;A String&quot;, # The initial lease period.
&quot;workerId&quot;: &quot;A String&quot;, # Identifies the worker leasing work -- typically the ID of the
# virtual machine running the worker.
&quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
# contains the WorkItem&#x27;s job.
&quot;workItemTypes&quot;: [ # Filter for WorkItem type.
&quot;A String&quot;,
],
}
x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
2 - v2 error format
Returns:
An object of the form:
{ # Response to a request to lease WorkItems.
&quot;workItems&quot;: [ # A list of the leased WorkItems.
{ # WorkItem represents basic information about a WorkItem to be executed
# in the cloud.
&quot;projectId&quot;: &quot;A String&quot;, # Identifies the cloud project this WorkItem belongs to.
&quot;shellTask&quot;: { # A task which consists of a shell command for the worker to execute. # Additional information for ShellTask WorkItems.
&quot;exitCode&quot;: 42, # Exit code for the task.
&quot;command&quot;: &quot;A String&quot;, # The shell command to run.
},
&quot;sourceOperationTask&quot;: { # A work item that represents the different operations that can be # Additional information for source operation WorkItems.
# performed on a user-defined Source specification.
&quot;originalName&quot;: &quot;A String&quot;, # System-defined name for the Read instruction for this source
# in the original workflow graph.
&quot;split&quot;: { # Represents the operation to split a high-level Source specification # Information about a request to split a source.
# into bundles (parts for parallel processing).
#
# At a high level, splitting of a source into bundles happens as follows:
# SourceSplitRequest is applied to the source. If it returns
# SOURCE_SPLIT_OUTCOME_USE_CURRENT, no further splitting happens and the source
# is used &quot;as is&quot;. Otherwise, splitting is applied recursively to each
# produced DerivedSource.
#
# As an optimization, for any Source, if its does_not_need_splitting is
# true, the framework assumes that splitting this source would return
# SOURCE_SPLIT_OUTCOME_USE_CURRENT, and doesn&#x27;t initiate a SourceSplitRequest.
# This applies both to the initial source being split and to bundles
# produced from it.
&quot;source&quot;: { # A source that records can be read and decoded from. # Specification of the source to be split.
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
&quot;options&quot;: { # Hints for splitting a Source into bundles (parts for parallel # Hints for tuning the splitting process.
# processing) using SourceSplitRequest.
&quot;desiredShardSizeBytes&quot;: &quot;A String&quot;, # DEPRECATED in favor of desired_bundle_size_bytes.
&quot;desiredBundleSizeBytes&quot;: &quot;A String&quot;, # The source should be split into a set of bundles where the estimated size
# of each is approximately this many bytes.
},
},
&quot;systemName&quot;: &quot;A String&quot;, # System-defined name of the Read instruction for this source.
# Unique across the workflow.
&quot;stageName&quot;: &quot;A String&quot;, # System-defined name of the stage containing the source operation.
# Unique across the workflow.
&quot;name&quot;: &quot;A String&quot;, # User-provided name of the Read instruction for this source.
&quot;getMetadata&quot;: { # A request to compute the SourceMetadata of a Source. # Information about a request to get metadata about a source.
&quot;source&quot;: { # A source that records can be read and decoded from. # Specification of the source whose metadata should be computed.
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
},
},
&quot;initialReportIndex&quot;: &quot;A String&quot;, # The initial index to use when reporting the status of the WorkItem.
&quot;id&quot;: &quot;A String&quot;, # Identifies this WorkItem.
&quot;packages&quot;: [ # Any required packages that need to be fetched in order to execute
# this WorkItem.
{ # The packages that must be installed in order for a worker to run the
# steps of the Cloud Dataflow job that will be assigned to its worker
# pool.
#
# This is the mechanism by which the Cloud Dataflow SDK causes code to
# be loaded onto the workers. For example, the Cloud Dataflow Java SDK
# might use this to install jars containing the user&#x27;s code and all of the
# various dependencies (libraries, data files, etc.) required in order
# for that code to run.
&quot;name&quot;: &quot;A String&quot;, # The name of the package.
&quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
#
# Google Cloud Storage:
#
# storage.googleapis.com/{bucket}
# bucket.storage.googleapis.com/
},
],
&quot;configuration&quot;: &quot;A String&quot;, # Work item-specific configuration as an opaque blob.
&quot;seqMapTask&quot;: { # Describes a particular function to invoke. # Additional information for SeqMapTask WorkItems.
&quot;systemName&quot;: &quot;A String&quot;, # System-defined name of the SeqDo operation.
# Unique across the workflow.
&quot;inputs&quot;: [ # Information about each of the inputs.
{ # Information about a side input of a DoFn or an input of a SeqDoFn.
&quot;kind&quot;: { # How to interpret the source element(s) as a side input value.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;sources&quot;: [ # The source(s) to read element(s) from to get the value of this side input.
# If more than one source, then the elements are taken from the
# sources, in the specified order if order matters.
# At least one source is required.
{ # A source that records can be read and decoded from.
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
],
&quot;tag&quot;: &quot;A String&quot;, # The id of the tag the user code will access this side input by;
# this should correspond to the tag of some MultiOutputInfo.
},
],
&quot;userFn&quot;: { # The user function to invoke.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;stageName&quot;: &quot;A String&quot;, # System-defined name of the stage containing the SeqDo operation.
# Unique across the workflow.
&quot;name&quot;: &quot;A String&quot;, # The user-provided name of the SeqDo operation.
&quot;outputInfos&quot;: [ # Information about each of the outputs.
{ # Information about an output of a SeqMapTask.
&quot;tag&quot;: &quot;A String&quot;, # The id of the TupleTag the user code will tag the output value by.
&quot;sink&quot;: { # A sink that records can be encoded and written to. # The sink to write the output value to.
&quot;codec&quot;: { # The codec to use to encode data written to the sink.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;spec&quot;: { # The sink to write to, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
},
},
],
},
&quot;streamingSetupTask&quot;: { # A task which initializes part of a streaming Dataflow job. # Additional information for StreamingSetupTask WorkItems.
&quot;snapshotConfig&quot;: { # Streaming appliance snapshot configuration. # Configures streaming appliance snapshot.
&quot;importStateEndpoint&quot;: &quot;A String&quot;, # Indicates which endpoint is used to import appliance state.
&quot;snapshotId&quot;: &quot;A String&quot;, # If set, indicates the snapshot id for the snapshot being performed.
},
&quot;streamingComputationTopology&quot;: { # Global topology of the streaming Dataflow job, including all # The global topology of the streaming Dataflow job.
# computations and their sharded locations.
&quot;persistentStateVersion&quot;: 42, # Version number for persistent state.
&quot;dataDiskAssignments&quot;: [ # The disks assigned to a streaming Dataflow job.
{ # Data disk assignment for a given VM instance.
&quot;vmInstance&quot;: &quot;A String&quot;, # VM instance name the data disks mounted to, for example
# &quot;myproject-1014-104817-4c2-harness-0&quot;.
&quot;dataDisks&quot;: [ # Mounted data disks. The order is important a data disk&#x27;s 0-based index in
# this list defines which persistent directory the disk is mounted to, for
# example the list of { &quot;myproject-1014-104817-4c2-harness-0-disk-0&quot; },
# { &quot;myproject-1014-104817-4c2-harness-0-disk-1&quot; }.
&quot;A String&quot;,
],
},
],
&quot;forwardingKeyBits&quot;: 42, # The size (in bits) of keys that will be assigned to source messages.
&quot;computations&quot;: [ # The computations associated with a streaming Dataflow job.
{ # All configuration data for a particular Computation.
&quot;keyRanges&quot;: [ # The key ranges processed by the computation.
{ # Location information for a specific key-range of a sharded computation.
# Currently we only support UTF-8 character splits to simplify encoding into
# JSON.
&quot;deliveryEndpoint&quot;: &quot;A String&quot;, # The physical location of this range assignment to be used for
# streaming computation cross-worker message delivery.
&quot;end&quot;: &quot;A String&quot;, # The end (exclusive) of the key range.
&quot;deprecatedPersistentDirectory&quot;: &quot;A String&quot;, # DEPRECATED. The location of the persistent state for this range, as a
# persistent directory in the worker local filesystem.
&quot;start&quot;: &quot;A String&quot;, # The start (inclusive) of the key range.
&quot;dataDisk&quot;: &quot;A String&quot;, # The name of the data disk where data for this range is stored.
# This name is local to the Google Cloud Platform project and uniquely
# identifies the disk within that project, for example
# &quot;myproject-1014-104817-4c2-harness-0-disk-1&quot;.
},
],
&quot;systemStageName&quot;: &quot;A String&quot;, # The system stage name.
&quot;stateFamilies&quot;: [ # The state family values.
{ # State family configuration.
&quot;stateFamily&quot;: &quot;A String&quot;, # The state family value.
&quot;isRead&quot;: True or False, # If true, this family corresponds to a read operation.
},
],
&quot;outputs&quot;: [ # The outputs from the computation.
{ # Describes a stream of data, either as input to be processed or as
# output of a streaming Dataflow job.
&quot;streamingStageLocation&quot;: { # Identifies the location of a streaming computation stage, for # The stream is part of another computation within the current
# streaming Dataflow job.
# stage-to-stage communication.
&quot;streamId&quot;: &quot;A String&quot;, # Identifies the particular stream within the streaming Dataflow
# job.
},
&quot;pubsubLocation&quot;: { # Identifies a pubsub location to use for transferring data into or # The stream is a pubsub stream.
# out of a streaming Dataflow job.
&quot;topic&quot;: &quot;A String&quot;, # A pubsub topic, in the form of
# &quot;pubsub.googleapis.com/topics/&lt;project-id&gt;/&lt;topic-name&gt;&quot;
&quot;subscription&quot;: &quot;A String&quot;, # A pubsub subscription, in the form of
# &quot;pubsub.googleapis.com/subscriptions/&lt;project-id&gt;/&lt;subscription-name&gt;&quot;
&quot;trackingSubscription&quot;: &quot;A String&quot;, # If set, specifies the pubsub subscription that will be used for tracking
# custom time timestamps for watermark estimation.
&quot;dropLateData&quot;: True or False, # Indicates whether the pipeline allows late-arriving data.
&quot;timestampLabel&quot;: &quot;A String&quot;, # If set, contains a pubsub label from which to extract record timestamps.
# If left empty, record timestamps will be generated upon arrival.
&quot;withAttributes&quot;: True or False, # If true, then the client has requested to get pubsub attributes.
&quot;idLabel&quot;: &quot;A String&quot;, # If set, contains a pubsub label from which to extract record ids.
# If left empty, record deduplication will be strictly best effort.
},
&quot;customSourceLocation&quot;: { # Identifies the location of a custom souce. # The stream is a custom source.
&quot;stateful&quot;: True or False, # Whether this source is stateful.
},
&quot;sideInputLocation&quot;: { # Identifies the location of a streaming side input. # The stream is a streaming side input.
&quot;tag&quot;: &quot;A String&quot;, # Identifies the particular side input within the streaming Dataflow job.
&quot;stateFamily&quot;: &quot;A String&quot;, # Identifies the state family where this side input is stored.
},
},
],
&quot;computationId&quot;: &quot;A String&quot;, # The ID of the computation.
&quot;inputs&quot;: [ # The inputs to the computation.
{ # Describes a stream of data, either as input to be processed or as
# output of a streaming Dataflow job.
&quot;streamingStageLocation&quot;: { # Identifies the location of a streaming computation stage, for # The stream is part of another computation within the current
# streaming Dataflow job.
# stage-to-stage communication.
&quot;streamId&quot;: &quot;A String&quot;, # Identifies the particular stream within the streaming Dataflow
# job.
},
&quot;pubsubLocation&quot;: { # Identifies a pubsub location to use for transferring data into or # The stream is a pubsub stream.
# out of a streaming Dataflow job.
&quot;topic&quot;: &quot;A String&quot;, # A pubsub topic, in the form of
# &quot;pubsub.googleapis.com/topics/&lt;project-id&gt;/&lt;topic-name&gt;&quot;
&quot;subscription&quot;: &quot;A String&quot;, # A pubsub subscription, in the form of
# &quot;pubsub.googleapis.com/subscriptions/&lt;project-id&gt;/&lt;subscription-name&gt;&quot;
&quot;trackingSubscription&quot;: &quot;A String&quot;, # If set, specifies the pubsub subscription that will be used for tracking
# custom time timestamps for watermark estimation.
&quot;dropLateData&quot;: True or False, # Indicates whether the pipeline allows late-arriving data.
&quot;timestampLabel&quot;: &quot;A String&quot;, # If set, contains a pubsub label from which to extract record timestamps.
# If left empty, record timestamps will be generated upon arrival.
&quot;withAttributes&quot;: True or False, # If true, then the client has requested to get pubsub attributes.
&quot;idLabel&quot;: &quot;A String&quot;, # If set, contains a pubsub label from which to extract record ids.
# If left empty, record deduplication will be strictly best effort.
},
&quot;customSourceLocation&quot;: { # Identifies the location of a custom souce. # The stream is a custom source.
&quot;stateful&quot;: True or False, # Whether this source is stateful.
},
&quot;sideInputLocation&quot;: { # Identifies the location of a streaming side input. # The stream is a streaming side input.
&quot;tag&quot;: &quot;A String&quot;, # Identifies the particular side input within the streaming Dataflow job.
&quot;stateFamily&quot;: &quot;A String&quot;, # Identifies the state family where this side input is stored.
},
},
],
},
],
&quot;userStageToComputationNameMap&quot;: { # Maps user stage names to stable computation names.
&quot;a_key&quot;: &quot;A String&quot;,
},
},
&quot;drain&quot;: True or False, # The user has requested drain.
&quot;workerHarnessPort&quot;: 42, # The TCP port used by the worker to communicate with the Dataflow
# worker harness.
&quot;receiveWorkPort&quot;: 42, # The TCP port on which the worker should listen for messages from
# other streaming computation workers.
},
&quot;leaseExpireTime&quot;: &quot;A String&quot;, # Time when the lease on this Work will expire.
&quot;mapTask&quot;: { # MapTask consists of an ordered set of instructions, each of which # Additional information for MapTask WorkItems.
# describes one particular low-level operation for the worker to
# perform in order to accomplish the MapTask&#x27;s WorkItem.
#
# Each instruction must appear in the list before any instructions which
# depends on its output.
&quot;instructions&quot;: [ # The instructions in the MapTask.
{ # Describes a particular operation comprising a MapTask.
&quot;parDo&quot;: { # An instruction that does a ParDo operation. # Additional information for ParDo instructions.
# Takes one main input and zero or more side inputs, and produces
# zero or more outputs.
# Runs user code.
&quot;numOutputs&quot;: 42, # The number of outputs.
&quot;sideInputs&quot;: [ # Zero or more side inputs.
{ # Information about a side input of a DoFn or an input of a SeqDoFn.
&quot;kind&quot;: { # How to interpret the source element(s) as a side input value.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;sources&quot;: [ # The source(s) to read element(s) from to get the value of this side input.
# If more than one source, then the elements are taken from the
# sources, in the specified order if order matters.
# At least one source is required.
{ # A source that records can be read and decoded from.
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
],
&quot;tag&quot;: &quot;A String&quot;, # The id of the tag the user code will access this side input by;
# this should correspond to the tag of some MultiOutputInfo.
},
],
&quot;input&quot;: { # An input of an instruction, as a reference to an output of a # The input.
# producer instruction.
&quot;outputNum&quot;: 42, # The output index (origin zero) within the producer.
&quot;producerInstructionIndex&quot;: 42, # The index (origin zero) of the parallel instruction that produces
# the output to be consumed by this input. This index is relative
# to the list of instructions in this input&#x27;s instruction&#x27;s
# containing MapTask.
},
&quot;userFn&quot;: { # The user function to invoke.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;multiOutputInfos&quot;: [ # Information about each of the outputs, if user_fn is a MultiDoFn.
{ # Information about an output of a multi-output DoFn.
&quot;tag&quot;: &quot;A String&quot;, # The id of the tag the user code will emit to this output by; this
# should correspond to the tag of some SideInputInfo.
},
],
},
&quot;systemName&quot;: &quot;A String&quot;, # System-defined name of this operation.
# Unique across the workflow.
&quot;flatten&quot;: { # An instruction that copies its inputs (zero or more) to its (single) output. # Additional information for Flatten instructions.
&quot;inputs&quot;: [ # Describes the inputs to the flatten instruction.
{ # An input of an instruction, as a reference to an output of a
# producer instruction.
&quot;outputNum&quot;: 42, # The output index (origin zero) within the producer.
&quot;producerInstructionIndex&quot;: 42, # The index (origin zero) of the parallel instruction that produces
# the output to be consumed by this input. This index is relative
# to the list of instructions in this input&#x27;s instruction&#x27;s
# containing MapTask.
},
],
},
&quot;read&quot;: { # An instruction that reads records. # Additional information for Read instructions.
# Takes no inputs, produces one output.
&quot;source&quot;: { # A source that records can be read and decoded from. # The source to read from.
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
},
&quot;originalName&quot;: &quot;A String&quot;, # System-defined name for the operation in the original workflow graph.
&quot;partialGroupByKey&quot;: { # An instruction that does a partial group-by-key. # Additional information for PartialGroupByKey instructions.
# One input and one output.
&quot;originalCombineValuesInputStoreName&quot;: &quot;A String&quot;, # If this instruction includes a combining function this is the name of the
# intermediate store between the GBK and the CombineValues.
&quot;valueCombiningFn&quot;: { # The value combining function to invoke.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;inputElementCodec&quot;: { # The codec to use for interpreting an element in the input PTable.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;originalCombineValuesStepName&quot;: &quot;A String&quot;, # If this instruction includes a combining function, this is the name of the
# CombineValues instruction lifted into this instruction.
&quot;sideInputs&quot;: [ # Zero or more side inputs.
{ # Information about a side input of a DoFn or an input of a SeqDoFn.
&quot;kind&quot;: { # How to interpret the source element(s) as a side input value.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;sources&quot;: [ # The source(s) to read element(s) from to get the value of this side input.
# If more than one source, then the elements are taken from the
# sources, in the specified order if order matters.
# At least one source is required.
{ # A source that records can be read and decoded from.
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
],
&quot;tag&quot;: &quot;A String&quot;, # The id of the tag the user code will access this side input by;
# this should correspond to the tag of some MultiOutputInfo.
},
],
&quot;input&quot;: { # An input of an instruction, as a reference to an output of a # Describes the input to the partial group-by-key instruction.
# producer instruction.
&quot;outputNum&quot;: 42, # The output index (origin zero) within the producer.
&quot;producerInstructionIndex&quot;: 42, # The index (origin zero) of the parallel instruction that produces
# the output to be consumed by this input. This index is relative
# to the list of instructions in this input&#x27;s instruction&#x27;s
# containing MapTask.
},
},
&quot;write&quot;: { # An instruction that writes records. # Additional information for Write instructions.
# Takes one input, produces no outputs.
&quot;sink&quot;: { # A sink that records can be encoded and written to. # The sink to write to.
&quot;codec&quot;: { # The codec to use to encode data written to the sink.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;spec&quot;: { # The sink to write to, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
},
&quot;input&quot;: { # An input of an instruction, as a reference to an output of a # The input.
# producer instruction.
&quot;outputNum&quot;: 42, # The output index (origin zero) within the producer.
&quot;producerInstructionIndex&quot;: 42, # The index (origin zero) of the parallel instruction that produces
# the output to be consumed by this input. This index is relative
# to the list of instructions in this input&#x27;s instruction&#x27;s
# containing MapTask.
},
},
&quot;name&quot;: &quot;A String&quot;, # User-provided name of this operation.
&quot;outputs&quot;: [ # Describes the outputs of the instruction.
{ # An output of an instruction.
&quot;codec&quot;: { # The codec to use to encode data being written via this output.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;systemName&quot;: &quot;A String&quot;, # System-defined name of this output.
# Unique across the workflow.
&quot;originalName&quot;: &quot;A String&quot;, # System-defined name for this output in the original workflow graph.
# Outputs that do not contribute to an original instruction do not set this.
&quot;name&quot;: &quot;A String&quot;, # The user-provided name of this output.
&quot;onlyCountKeyBytes&quot;: True or False, # For system-generated byte and mean byte metrics, certain instructions
# should only report the key size.
&quot;onlyCountValueBytes&quot;: True or False, # For system-generated byte and mean byte metrics, certain instructions
# should only report the value size.
},
],
},
],
&quot;systemName&quot;: &quot;A String&quot;, # System-defined name of this MapTask.
# Unique across the workflow.
&quot;counterPrefix&quot;: &quot;A String&quot;, # Counter prefix that can be used to prefix counters. Not currently used in
# Dataflow.
&quot;stageName&quot;: &quot;A String&quot;, # System-defined name of the stage containing this MapTask.
# Unique across the workflow.
},
&quot;streamingConfigTask&quot;: { # A task that carries configuration information for streaming computations. # Additional information for StreamingConfigTask WorkItems.
&quot;streamingComputationConfigs&quot;: [ # Set of computation configuration information.
{ # Configuration information for a single streaming computation.
&quot;stageName&quot;: &quot;A String&quot;, # Stage name of this computation.
&quot;instructions&quot;: [ # Instructions that comprise the computation.
{ # Describes a particular operation comprising a MapTask.
&quot;parDo&quot;: { # An instruction that does a ParDo operation. # Additional information for ParDo instructions.
# Takes one main input and zero or more side inputs, and produces
# zero or more outputs.
# Runs user code.
&quot;numOutputs&quot;: 42, # The number of outputs.
&quot;sideInputs&quot;: [ # Zero or more side inputs.
{ # Information about a side input of a DoFn or an input of a SeqDoFn.
&quot;kind&quot;: { # How to interpret the source element(s) as a side input value.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;sources&quot;: [ # The source(s) to read element(s) from to get the value of this side input.
# If more than one source, then the elements are taken from the
# sources, in the specified order if order matters.
# At least one source is required.
{ # A source that records can be read and decoded from.
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
],
&quot;tag&quot;: &quot;A String&quot;, # The id of the tag the user code will access this side input by;
# this should correspond to the tag of some MultiOutputInfo.
},
],
&quot;input&quot;: { # An input of an instruction, as a reference to an output of a # The input.
# producer instruction.
&quot;outputNum&quot;: 42, # The output index (origin zero) within the producer.
&quot;producerInstructionIndex&quot;: 42, # The index (origin zero) of the parallel instruction that produces
# the output to be consumed by this input. This index is relative
# to the list of instructions in this input&#x27;s instruction&#x27;s
# containing MapTask.
},
&quot;userFn&quot;: { # The user function to invoke.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;multiOutputInfos&quot;: [ # Information about each of the outputs, if user_fn is a MultiDoFn.
{ # Information about an output of a multi-output DoFn.
&quot;tag&quot;: &quot;A String&quot;, # The id of the tag the user code will emit to this output by; this
# should correspond to the tag of some SideInputInfo.
},
],
},
&quot;systemName&quot;: &quot;A String&quot;, # System-defined name of this operation.
# Unique across the workflow.
&quot;flatten&quot;: { # An instruction that copies its inputs (zero or more) to its (single) output. # Additional information for Flatten instructions.
&quot;inputs&quot;: [ # Describes the inputs to the flatten instruction.
{ # An input of an instruction, as a reference to an output of a
# producer instruction.
&quot;outputNum&quot;: 42, # The output index (origin zero) within the producer.
&quot;producerInstructionIndex&quot;: 42, # The index (origin zero) of the parallel instruction that produces
# the output to be consumed by this input. This index is relative
# to the list of instructions in this input&#x27;s instruction&#x27;s
# containing MapTask.
},
],
},
&quot;read&quot;: { # An instruction that reads records. # Additional information for Read instructions.
# Takes no inputs, produces one output.
&quot;source&quot;: { # A source that records can be read and decoded from. # The source to read from.
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
},
&quot;originalName&quot;: &quot;A String&quot;, # System-defined name for the operation in the original workflow graph.
&quot;partialGroupByKey&quot;: { # An instruction that does a partial group-by-key. # Additional information for PartialGroupByKey instructions.
# One input and one output.
&quot;originalCombineValuesInputStoreName&quot;: &quot;A String&quot;, # If this instruction includes a combining function this is the name of the
# intermediate store between the GBK and the CombineValues.
&quot;valueCombiningFn&quot;: { # The value combining function to invoke.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;inputElementCodec&quot;: { # The codec to use for interpreting an element in the input PTable.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;originalCombineValuesStepName&quot;: &quot;A String&quot;, # If this instruction includes a combining function, this is the name of the
# CombineValues instruction lifted into this instruction.
&quot;sideInputs&quot;: [ # Zero or more side inputs.
{ # Information about a side input of a DoFn or an input of a SeqDoFn.
&quot;kind&quot;: { # How to interpret the source element(s) as a side input value.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;sources&quot;: [ # The source(s) to read element(s) from to get the value of this side input.
# If more than one source, then the elements are taken from the
# sources, in the specified order if order matters.
# At least one source is required.
{ # A source that records can be read and decoded from.
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
],
&quot;tag&quot;: &quot;A String&quot;, # The id of the tag the user code will access this side input by;
# this should correspond to the tag of some MultiOutputInfo.
},
],
&quot;input&quot;: { # An input of an instruction, as a reference to an output of a # Describes the input to the partial group-by-key instruction.
# producer instruction.
&quot;outputNum&quot;: 42, # The output index (origin zero) within the producer.
&quot;producerInstructionIndex&quot;: 42, # The index (origin zero) of the parallel instruction that produces
# the output to be consumed by this input. This index is relative
# to the list of instructions in this input&#x27;s instruction&#x27;s
# containing MapTask.
},
},
&quot;write&quot;: { # An instruction that writes records. # Additional information for Write instructions.
# Takes one input, produces no outputs.
&quot;sink&quot;: { # A sink that records can be encoded and written to. # The sink to write to.
&quot;codec&quot;: { # The codec to use to encode data written to the sink.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;spec&quot;: { # The sink to write to, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
},
&quot;input&quot;: { # An input of an instruction, as a reference to an output of a # The input.
# producer instruction.
&quot;outputNum&quot;: 42, # The output index (origin zero) within the producer.
&quot;producerInstructionIndex&quot;: 42, # The index (origin zero) of the parallel instruction that produces
# the output to be consumed by this input. This index is relative
# to the list of instructions in this input&#x27;s instruction&#x27;s
# containing MapTask.
},
},
&quot;name&quot;: &quot;A String&quot;, # User-provided name of this operation.
&quot;outputs&quot;: [ # Describes the outputs of the instruction.
{ # An output of an instruction.
&quot;codec&quot;: { # The codec to use to encode data being written via this output.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;systemName&quot;: &quot;A String&quot;, # System-defined name of this output.
# Unique across the workflow.
&quot;originalName&quot;: &quot;A String&quot;, # System-defined name for this output in the original workflow graph.
# Outputs that do not contribute to an original instruction do not set this.
&quot;name&quot;: &quot;A String&quot;, # The user-provided name of this output.
&quot;onlyCountKeyBytes&quot;: True or False, # For system-generated byte and mean byte metrics, certain instructions
# should only report the key size.
&quot;onlyCountValueBytes&quot;: True or False, # For system-generated byte and mean byte metrics, certain instructions
# should only report the value size.
},
],
},
],
&quot;systemName&quot;: &quot;A String&quot;, # System defined name for this computation.
&quot;transformUserNameToStateFamily&quot;: { # Map from user name of stateful transforms in this stage to their state
# family.
&quot;a_key&quot;: &quot;A String&quot;,
},
&quot;computationId&quot;: &quot;A String&quot;, # Unique identifier for this computation.
},
],
&quot;userStepToStateFamilyNameMap&quot;: { # Map from user step names to state families.
&quot;a_key&quot;: &quot;A String&quot;,
},
&quot;maxWorkItemCommitBytes&quot;: &quot;A String&quot;, # Maximum size for work item commit supported windmill storage layer.
&quot;windmillServicePort&quot;: &quot;A String&quot;, # If present, the worker must use this port to communicate with Windmill
# Service dispatchers. Only applicable when windmill_service_endpoint is
# specified.
&quot;getDataStreamChunkSizeBytes&quot;: &quot;A String&quot;, # Chunk size for get data streams from the harness to windmill.
&quot;commitStreamChunkSizeBytes&quot;: &quot;A String&quot;, # Chunk size for commit streams from the harness to windmill.
&quot;windmillServiceEndpoint&quot;: &quot;A String&quot;, # If present, the worker must use this endpoint to communicate with Windmill
# Service dispatchers, otherwise the worker must continue to use whatever
# endpoint it had been using.
},
&quot;jobId&quot;: &quot;A String&quot;, # Identifies the workflow job this WorkItem belongs to.
&quot;reportStatusInterval&quot;: &quot;A String&quot;, # Recommended reporting interval.
&quot;streamingComputationTask&quot;: { # A task which describes what action should be performed for the specified # Additional information for StreamingComputationTask WorkItems.
# streaming computation ranges.
&quot;computationRanges&quot;: [ # Contains ranges of a streaming computation this task should apply to.
{ # Describes full or partial data disk assignment information of the computation
# ranges.
&quot;rangeAssignments&quot;: [ # Data disk assignments for ranges from this computation.
{ # Data disk assignment information for a specific key-range of a sharded
# computation.
# Currently we only support UTF-8 character splits to simplify encoding into
# JSON.
&quot;start&quot;: &quot;A String&quot;, # The start (inclusive) of the key range.
&quot;dataDisk&quot;: &quot;A String&quot;, # The name of the data disk where data for this range is stored.
# This name is local to the Google Cloud Platform project and uniquely
# identifies the disk within that project, for example
# &quot;myproject-1014-104817-4c2-harness-0-disk-1&quot;.
&quot;end&quot;: &quot;A String&quot;, # The end (exclusive) of the key range.
},
],
&quot;computationId&quot;: &quot;A String&quot;, # The ID of the computation.
},
],
&quot;dataDisks&quot;: [ # Describes the set of data disks this task should apply to.
{ # Describes mounted data disk.
&quot;dataDisk&quot;: &quot;A String&quot;, # The name of the data disk.
# This name is local to the Google Cloud Platform project and uniquely
# identifies the disk within that project, for example
# &quot;myproject-1014-104817-4c2-harness-0-disk-1&quot;.
},
],
&quot;taskType&quot;: &quot;A String&quot;, # A type of streaming computation task.
},
},
],
&quot;unifiedWorkerResponse&quot;: { # Untranslated bag-of-bytes WorkResponse for UnifiedWorker.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
},
}</pre>
</div>
<div class="method">
<code class="details" id="reportStatus">reportStatus(projectId, jobId, body=None, x__xgafv=None)</code>
<pre>Reports the status of dataflow WorkItems leased by a worker.
Args:
projectId: string, The project which owns the WorkItem&#x27;s job. (required)
jobId: string, The job which the WorkItem is part of. (required)
body: object, The request body.
The object takes the form of:
{ # Request to report the status of WorkItems.
&quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker reporting the WorkItem status. If this
# does not match the ID of the worker which the Dataflow service
# believes currently has the lease on the WorkItem, the report
# will be dropped (with an error response).
&quot;unifiedWorkerRequest&quot;: { # Untranslated bag-of-bytes WorkProgressUpdateRequest from UnifiedWorker.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
},
&quot;currentWorkerTime&quot;: &quot;A String&quot;, # The current timestamp at the worker.
&quot;workItemStatuses&quot;: [ # The order is unimportant, except that the order of the
# WorkItemServiceState messages in the ReportWorkItemStatusResponse
# corresponds to the order of WorkItemStatus messages here.
{ # Conveys a worker&#x27;s progress through the work described by a WorkItem.
&quot;completed&quot;: True or False, # True if the WorkItem was completed (successfully or unsuccessfully).
&quot;requestedLeaseDuration&quot;: &quot;A String&quot;, # Amount of time the worker requests for its lease.
&quot;reportedProgress&quot;: { # A progress measurement of a WorkItem by a worker. # The worker&#x27;s progress through this WorkItem.
&quot;remainingParallelism&quot;: { # Represents the level of parallelism in a WorkItem&#x27;s input, # Total amount of parallelism in the input of this task that remains,
# (i.e. can be delegated to this task and any new tasks via dynamic
# splitting). Always at least 1 for non-finished work items and 0 for
# finished.
#
# &quot;Amount of parallelism&quot; refers to how many non-empty parts of the input
# can be read in parallel. This does not necessarily equal number
# of records. An input that can be read in parallel down to the
# individual records is called &quot;perfectly splittable&quot;.
# An example of non-perfectly parallelizable input is a block-compressed
# file format where a block of records has to be read as a whole,
# but different blocks can be read in parallel.
#
# Examples:
# * If we are processing record #30 (starting at 1) out of 50 in a perfectly
# splittable 50-record input, this value should be 21 (20 remaining + 1
# current).
# * If we are reading through block 3 in a block-compressed file consisting
# of 5 blocks, this value should be 3 (since blocks 4 and 5 can be
# processed in parallel by new tasks via dynamic splitting and the current
# task remains processing block 3).
# * If we are reading through the last block in a block-compressed file,
# or reading or processing the last record in a perfectly splittable
# input, this value should be 1, because apart from the current task, no
# additional remainder can be split off.
# reported by the worker.
&quot;isInfinite&quot;: True or False, # Specifies whether the parallelism is infinite. If true, &quot;value&quot; is
# ignored.
# Infinite parallelism means the service will assume that the work item
# can always be split into more non-empty work items by dynamic splitting.
# This is a work-around for lack of support for infinity by the current
# JSON-based Java RPC stack.
&quot;value&quot;: 3.14, # Specifies the level of parallelism in case it is finite.
},
&quot;position&quot;: { # Position defines a position within a collection of data. The value # A Position within the work to represent a progress.
# can be either the end position, a key (used with ordered
# collections), a byte offset, or a record index.
&quot;recordIndex&quot;: &quot;A String&quot;, # Position is a record index.
&quot;byteOffset&quot;: &quot;A String&quot;, # Position is a byte offset.
&quot;shufflePosition&quot;: &quot;A String&quot;, # CloudPosition is a base64 encoded BatchShufflePosition (with FIXED
# sharding).
&quot;concatPosition&quot;: { # A position that encapsulates an inner position and an index for the inner # CloudPosition is a concat position.
# position. A ConcatPosition can be used by a reader of a source that
# encapsulates a set of other sources.
&quot;index&quot;: 42, # Index of the inner source.
&quot;position&quot;: # Object with schema name: Position # Position within the inner source.
},
&quot;end&quot;: True or False, # Position is past all other positions. Also useful for the end
# position of an unbounded range.
&quot;key&quot;: &quot;A String&quot;, # Position is a string key, ordered lexicographically.
},
&quot;consumedParallelism&quot;: { # Represents the level of parallelism in a WorkItem&#x27;s input, # Total amount of parallelism in the portion of input of this task that has
# already been consumed and is no longer active. In the first two examples
# above (see remaining_parallelism), the value should be 29 or 2
# respectively. The sum of remaining_parallelism and consumed_parallelism
# should equal the total amount of parallelism in this work item. If
# specified, must be finite.
# reported by the worker.
&quot;isInfinite&quot;: True or False, # Specifies whether the parallelism is infinite. If true, &quot;value&quot; is
# ignored.
# Infinite parallelism means the service will assume that the work item
# can always be split into more non-empty work items by dynamic splitting.
# This is a work-around for lack of support for infinity by the current
# JSON-based Java RPC stack.
&quot;value&quot;: 3.14, # Specifies the level of parallelism in case it is finite.
},
&quot;fractionConsumed&quot;: 3.14, # Completion as fraction of the input consumed, from 0.0 (beginning, nothing
# consumed), to 1.0 (end of the input, entire input consumed).
},
&quot;metricUpdates&quot;: [ # DEPRECATED in favor of counter_updates.
{ # Describes the state of a metric.
&quot;distribution&quot;: &quot;&quot;, # A struct value describing properties of a distribution of numeric values.
&quot;kind&quot;: &quot;A String&quot;, # Metric aggregation kind. The possible metric aggregation kinds are
# &quot;Sum&quot;, &quot;Max&quot;, &quot;Min&quot;, &quot;Mean&quot;, &quot;Set&quot;, &quot;And&quot;, &quot;Or&quot;, and &quot;Distribution&quot;.
# The specified aggregation kind is case-insensitive.
#
# If omitted, this is not an aggregated value but instead
# a single metric sample value.
&quot;gauge&quot;: &quot;&quot;, # A struct value describing properties of a Gauge.
# Metrics of gauge type show the value of a metric across time, and is
# aggregated based on the newest value.
&quot;updateTime&quot;: &quot;A String&quot;, # Timestamp associated with the metric value. Optional when workers are
# reporting work progress; it will be filled in responses from the
# metrics API.
&quot;scalar&quot;: &quot;&quot;, # Worker-computed aggregate value for aggregation kinds &quot;Sum&quot;, &quot;Max&quot;, &quot;Min&quot;,
# &quot;And&quot;, and &quot;Or&quot;. The possible value types are Long, Double, and Boolean.
&quot;cumulative&quot;: True or False, # True if this metric is reported as the total cumulative aggregate
# value accumulated since the worker started working on this WorkItem.
# By default this is false, indicating that this metric is reported
# as a delta that is not associated with any WorkItem.
&quot;name&quot;: { # Identifies a metric, by describing the source which generated the # Name of the metric.
# metric.
&quot;context&quot;: { # Zero or more labeled fields which identify the part of the job this
# metric is associated with, such as the name of a step or collection.
#
# For example, built-in counters associated with steps will have
# context[&#x27;step&#x27;] = &lt;step-name&gt;. Counters associated with PCollections
# in the SDK will have context[&#x27;pcollection&#x27;] = &lt;pcollection-name&gt;.
&quot;a_key&quot;: &quot;A String&quot;,
},
&quot;name&quot;: &quot;A String&quot;, # Worker-defined metric name.
&quot;origin&quot;: &quot;A String&quot;, # Origin (namespace) of metric name. May be blank for user-define metrics;
# will be &quot;dataflow&quot; for metrics defined by the Dataflow service or SDK.
},
&quot;meanCount&quot;: &quot;&quot;, # Worker-computed aggregate value for the &quot;Mean&quot; aggregation kind.
# This holds the count of the aggregated values and is used in combination
# with mean_sum above to obtain the actual mean aggregate value.
# The only possible value type is Long.
&quot;meanSum&quot;: &quot;&quot;, # Worker-computed aggregate value for the &quot;Mean&quot; aggregation kind.
# This holds the sum of the aggregated values and is used in combination
# with mean_count below to obtain the actual mean aggregate value.
# The only possible value types are Long and Double.
&quot;set&quot;: &quot;&quot;, # Worker-computed aggregate value for the &quot;Set&quot; aggregation kind. The only
# possible value type is a list of Values whose type can be Long, Double,
# or String, according to the metric&#x27;s type. All Values in the list must
# be of the same type.
&quot;internal&quot;: &quot;&quot;, # Worker-computed aggregate value for internal use by the Dataflow
# service.
},
],
&quot;errors&quot;: [ # Specifies errors which occurred during processing. If errors are
# provided, and completed = true, then the WorkItem is considered
# to have failed.
{ # The `Status` type defines a logical error model that is suitable for
# different programming environments, including REST APIs and RPC APIs. It is
# used by [gRPC](https://github.com/grpc). Each `Status` message contains
# three pieces of data: error code, error message, and error details.
#
# You can find out more about this error model and how to work with it in the
# [API Design Guide](https://cloud.google.com/apis/design/errors).
&quot;details&quot;: [ # A list of messages that carry the error details. There is a common set of
# message types for APIs to use.
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
},
],
&quot;code&quot;: 42, # The status code, which should be an enum value of google.rpc.Code.
&quot;message&quot;: &quot;A String&quot;, # A developer-facing error message, which should be in English. Any
# user-facing error message should be localized and sent in the
# google.rpc.Status.details field, or localized by the client.
},
],
&quot;stopPosition&quot;: { # Position defines a position within a collection of data. The value # A worker may split an active map task in two parts, &quot;primary&quot; and
# &quot;residual&quot;, continuing to process the primary part and returning the
# residual part into the pool of available work.
# This event is called a &quot;dynamic split&quot; and is critical to the dynamic
# work rebalancing feature. The two obtained sub-tasks are called
# &quot;parts&quot; of the split.
# The parts, if concatenated, must represent the same input as would
# be read by the current task if the split did not happen.
# The exact way in which the original task is decomposed into the two
# parts is specified either as a position demarcating them
# (stop_position), or explicitly as two DerivedSources, if this
# task consumes a user-defined source type (dynamic_source_split).
#
# The &quot;current&quot; task is adjusted as a result of the split: after a task
# with range [A, B) sends a stop_position update at C, its range is
# considered to be [A, C), e.g.:
# * Progress should be interpreted relative to the new range, e.g.
# &quot;75% completed&quot; means &quot;75% of [A, C) completed&quot;
# * The worker should interpret proposed_stop_position relative to the
# new range, e.g. &quot;split at 68%&quot; should be interpreted as
# &quot;split at 68% of [A, C)&quot;.
# * If the worker chooses to split again using stop_position, only
# stop_positions in [A, C) will be accepted.
# * Etc.
# dynamic_source_split has similar semantics: e.g., if a task with
# source S splits using dynamic_source_split into {P, R}
# (where P and R must be together equivalent to S), then subsequent
# progress and proposed_stop_position should be interpreted relative
# to P, and in a potential subsequent dynamic_source_split into {P&#x27;, R&#x27;},
# P&#x27; and R&#x27; must be together equivalent to P, etc.
# can be either the end position, a key (used with ordered
# collections), a byte offset, or a record index.
&quot;recordIndex&quot;: &quot;A String&quot;, # Position is a record index.
&quot;byteOffset&quot;: &quot;A String&quot;, # Position is a byte offset.
&quot;shufflePosition&quot;: &quot;A String&quot;, # CloudPosition is a base64 encoded BatchShufflePosition (with FIXED
# sharding).
&quot;concatPosition&quot;: { # A position that encapsulates an inner position and an index for the inner # CloudPosition is a concat position.
# position. A ConcatPosition can be used by a reader of a source that
# encapsulates a set of other sources.
&quot;index&quot;: 42, # Index of the inner source.
&quot;position&quot;: # Object with schema name: Position # Position within the inner source.
},
&quot;end&quot;: True or False, # Position is past all other positions. Also useful for the end
# position of an unbounded range.
&quot;key&quot;: &quot;A String&quot;, # Position is a string key, ordered lexicographically.
},
&quot;progress&quot;: { # Obsolete in favor of ApproximateReportedProgress and ApproximateSplitRequest. # DEPRECATED in favor of reported_progress.
&quot;position&quot;: { # Position defines a position within a collection of data. The value # Obsolete.
# can be either the end position, a key (used with ordered
# collections), a byte offset, or a record index.
&quot;recordIndex&quot;: &quot;A String&quot;, # Position is a record index.
&quot;byteOffset&quot;: &quot;A String&quot;, # Position is a byte offset.
&quot;shufflePosition&quot;: &quot;A String&quot;, # CloudPosition is a base64 encoded BatchShufflePosition (with FIXED
# sharding).
&quot;concatPosition&quot;: { # A position that encapsulates an inner position and an index for the inner # CloudPosition is a concat position.
# position. A ConcatPosition can be used by a reader of a source that
# encapsulates a set of other sources.
&quot;index&quot;: 42, # Index of the inner source.
&quot;position&quot;: # Object with schema name: Position # Position within the inner source.
},
&quot;end&quot;: True or False, # Position is past all other positions. Also useful for the end
# position of an unbounded range.
&quot;key&quot;: &quot;A String&quot;, # Position is a string key, ordered lexicographically.
},
&quot;percentComplete&quot;: 3.14, # Obsolete.
&quot;remainingTime&quot;: &quot;A String&quot;, # Obsolete.
},
&quot;sourceFork&quot;: { # DEPRECATED in favor of DynamicSourceSplit. # DEPRECATED in favor of dynamic_source_split.
&quot;residual&quot;: { # DEPRECATED in favor of DerivedSource. # DEPRECATED
&quot;derivationMode&quot;: &quot;A String&quot;, # DEPRECATED
&quot;source&quot;: { # A source that records can be read and decoded from. # DEPRECATED
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
},
&quot;residualSource&quot;: { # Specification of one of the bundles produced as a result of splitting # DEPRECATED
# a Source (e.g. when executing a SourceSplitRequest, or when
# splitting an active task using WorkItemStatus.dynamic_source_split),
# relative to the source being split.
&quot;derivationMode&quot;: &quot;A String&quot;, # What source to base the produced source on (if any).
&quot;source&quot;: { # A source that records can be read and decoded from. # Specification of the source.
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
},
&quot;primary&quot;: { # DEPRECATED in favor of DerivedSource. # DEPRECATED
&quot;derivationMode&quot;: &quot;A String&quot;, # DEPRECATED
&quot;source&quot;: { # A source that records can be read and decoded from. # DEPRECATED
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
},
&quot;primarySource&quot;: { # Specification of one of the bundles produced as a result of splitting # DEPRECATED
# a Source (e.g. when executing a SourceSplitRequest, or when
# splitting an active task using WorkItemStatus.dynamic_source_split),
# relative to the source being split.
&quot;derivationMode&quot;: &quot;A String&quot;, # What source to base the produced source on (if any).
&quot;source&quot;: { # A source that records can be read and decoded from. # Specification of the source.
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
},
},
&quot;counterUpdates&quot;: [ # Worker output counters for this WorkItem.
{ # An update to a Counter sent from a worker.
&quot;floatingPointList&quot;: { # A metric value representing a list of floating point numbers. # List of floating point numbers, for Set.
&quot;elements&quot;: [ # Elements of the list.
3.14,
],
},
&quot;structuredNameAndMetadata&quot;: { # A single message which encapsulates structured name and metadata for a given # Counter structured name and metadata.
# counter.
&quot;metadata&quot;: { # CounterMetadata includes all static non-name non-value counter attributes. # Metadata associated with a counter
&quot;description&quot;: &quot;A String&quot;, # Human-readable description of the counter semantics.
&quot;otherUnits&quot;: &quot;A String&quot;, # A string referring to the unit type.
&quot;standardUnits&quot;: &quot;A String&quot;, # System defined Units, see above enum.
&quot;kind&quot;: &quot;A String&quot;, # Counter aggregation kind.
},
&quot;name&quot;: { # Identifies a counter within a per-job namespace. Counters whose structured # Structured name of the counter.
# names are the same get merged into a single value for the job.
&quot;executionStepName&quot;: &quot;A String&quot;, # Name of the stage. An execution step contains multiple component steps.
&quot;workerId&quot;: &quot;A String&quot;, # ID of a particular worker.
&quot;portion&quot;: &quot;A String&quot;, # Portion of this counter, either key or value.
&quot;name&quot;: &quot;A String&quot;, # Counter name. Not necessarily globally-unique, but unique within the
# context of the other fields.
# Required.
&quot;componentStepName&quot;: &quot;A String&quot;, # Name of the optimized step being executed by the workers.
&quot;originNamespace&quot;: &quot;A String&quot;, # A string containing a more specific namespace of the counter&#x27;s origin.
&quot;originalRequestingStepName&quot;: &quot;A String&quot;, # The step name requesting an operation, such as GBK.
# I.e. the ParDo causing a read/write from shuffle to occur, or a
# read from side inputs.
&quot;inputIndex&quot;: 42, # Index of an input collection that&#x27;s being read from/written to as a side
# input.
# The index identifies a step&#x27;s side inputs starting by 1 (e.g. the first
# side input has input_index 1, the third has input_index 3).
# Side inputs are identified by a pair of (original_step_name, input_index).
# This field helps uniquely identify them.
&quot;origin&quot;: &quot;A String&quot;, # One of the standard Origins defined above.
&quot;originalStepName&quot;: &quot;A String&quot;, # System generated name of the original step in the user&#x27;s graph, before
# optimization.
},
},
&quot;internal&quot;: &quot;&quot;, # Value for internally-defined counters used by the Dataflow service.
&quot;integerMean&quot;: { # A representation of an integer mean metric contribution. # Integer mean aggregation value for Mean.
&quot;count&quot;: { # A representation of an int64, n, that is immune to precision loss when # The number of values being aggregated.
# encoded in JSON.
&quot;highBits&quot;: 42, # The high order bits, including the sign: n &gt;&gt; 32.
&quot;lowBits&quot;: 42, # The low order bits: n &amp; 0xffffffff.
},
&quot;sum&quot;: { # A representation of an int64, n, that is immune to precision loss when # The sum of all values being aggregated.
# encoded in JSON.
&quot;highBits&quot;: 42, # The high order bits, including the sign: n &gt;&gt; 32.
&quot;lowBits&quot;: 42, # The low order bits: n &amp; 0xffffffff.
},
},
&quot;nameAndKind&quot;: { # Basic metadata about a counter. # Counter name and aggregation type.
&quot;kind&quot;: &quot;A String&quot;, # Counter aggregation kind.
&quot;name&quot;: &quot;A String&quot;, # Name of the counter.
},
&quot;integerList&quot;: { # A metric value representing a list of integers. # List of integers, for Set.
&quot;elements&quot;: [ # Elements of the list.
{ # A representation of an int64, n, that is immune to precision loss when
# encoded in JSON.
&quot;highBits&quot;: 42, # The high order bits, including the sign: n &gt;&gt; 32.
&quot;lowBits&quot;: 42, # The low order bits: n &amp; 0xffffffff.
},
],
},
&quot;floatingPointMean&quot;: { # A representation of a floating point mean metric contribution. # Floating point mean aggregation value for Mean.
&quot;sum&quot;: 3.14, # The sum of all values being aggregated.
&quot;count&quot;: { # A representation of an int64, n, that is immune to precision loss when # The number of values being aggregated.
# encoded in JSON.
&quot;highBits&quot;: 42, # The high order bits, including the sign: n &gt;&gt; 32.
&quot;lowBits&quot;: 42, # The low order bits: n &amp; 0xffffffff.
},
},
&quot;shortId&quot;: &quot;A String&quot;, # The service-generated short identifier for this counter.
# The short_id -&gt; (name, metadata) mapping is constant for the lifetime of
# a job.
&quot;cumulative&quot;: True or False, # True if this counter is reported as the total cumulative aggregate
# value accumulated since the worker started working on this WorkItem.
# By default this is false, indicating that this counter is reported
# as a delta.
&quot;integerGauge&quot;: { # A metric value representing temporal values of a variable. # Gauge data
&quot;timestamp&quot;: &quot;A String&quot;, # The time at which this value was measured. Measured as msecs from epoch.
&quot;value&quot;: { # A representation of an int64, n, that is immune to precision loss when # The value of the variable represented by this gauge.
# encoded in JSON.
&quot;highBits&quot;: 42, # The high order bits, including the sign: n &gt;&gt; 32.
&quot;lowBits&quot;: 42, # The low order bits: n &amp; 0xffffffff.
},
},
&quot;stringList&quot;: { # A metric value representing a list of strings. # List of strings, for Set.
&quot;elements&quot;: [ # Elements of the list.
&quot;A String&quot;,
],
},
&quot;boolean&quot;: True or False, # Boolean value for And, Or.
&quot;floatingPoint&quot;: 3.14, # Floating point value for Sum, Max, Min.
&quot;distribution&quot;: { # A metric value representing a distribution. # Distribution data
&quot;sumOfSquares&quot;: 3.14, # Use a double since the sum of squares is likely to overflow int64.
&quot;max&quot;: { # A representation of an int64, n, that is immune to precision loss when # The maximum value present in the distribution.
# encoded in JSON.
&quot;highBits&quot;: 42, # The high order bits, including the sign: n &gt;&gt; 32.
&quot;lowBits&quot;: 42, # The low order bits: n &amp; 0xffffffff.
},
&quot;sum&quot;: { # A representation of an int64, n, that is immune to precision loss when # Use an int64 since we&#x27;d prefer the added precision. If overflow is a common
# problem we can detect it and use an additional int64 or a double.
# encoded in JSON.
&quot;highBits&quot;: 42, # The high order bits, including the sign: n &gt;&gt; 32.
&quot;lowBits&quot;: 42, # The low order bits: n &amp; 0xffffffff.
},
&quot;histogram&quot;: { # Histogram of value counts for a distribution. # (Optional) Histogram of value counts for the distribution.
#
# Buckets have an inclusive lower bound and exclusive upper bound and use
# &quot;1,2,5 bucketing&quot;: The first bucket range is from [0,1) and all subsequent
# bucket boundaries are powers of ten multiplied by 1, 2, or 5. Thus, bucket
# boundaries are 0, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, ...
# Negative values are not supported.
&quot;firstBucketOffset&quot;: 42, # Starting index of first stored bucket. The non-inclusive upper-bound of
# the ith bucket is given by:
# pow(10,(i-first_bucket_offset)/3) * (1,2,5)[(i-first_bucket_offset)%3]
&quot;bucketCounts&quot;: [ # Counts of values in each bucket. For efficiency, prefix and trailing
# buckets with count = 0 are elided. Buckets can store the full range of
# values of an unsigned long, with ULLONG_MAX falling into the 59th bucket
# with range [1e19, 2e19).
&quot;A String&quot;,
],
},
&quot;count&quot;: { # A representation of an int64, n, that is immune to precision loss when # The count of the number of elements present in the distribution.
# encoded in JSON.
&quot;highBits&quot;: 42, # The high order bits, including the sign: n &gt;&gt; 32.
&quot;lowBits&quot;: 42, # The low order bits: n &amp; 0xffffffff.
},
&quot;min&quot;: { # A representation of an int64, n, that is immune to precision loss when # The minimum value present in the distribution.
# encoded in JSON.
&quot;highBits&quot;: 42, # The high order bits, including the sign: n &gt;&gt; 32.
&quot;lowBits&quot;: 42, # The low order bits: n &amp; 0xffffffff.
},
},
&quot;integer&quot;: { # A representation of an int64, n, that is immune to precision loss when # Integer value for Sum, Max, Min.
# encoded in JSON.
&quot;highBits&quot;: 42, # The high order bits, including the sign: n &gt;&gt; 32.
&quot;lowBits&quot;: 42, # The low order bits: n &amp; 0xffffffff.
},
},
],
&quot;totalThrottlerWaitTimeSeconds&quot;: 3.14, # Total time the worker spent being throttled by external systems.
&quot;workItemId&quot;: &quot;A String&quot;, # Identifies the WorkItem.
&quot;dynamicSourceSplit&quot;: { # When a task splits using WorkItemStatus.dynamic_source_split, this # See documentation of stop_position.
# message describes the two parts of the split relative to the
# description of the current task&#x27;s input.
&quot;residual&quot;: { # Specification of one of the bundles produced as a result of splitting # Residual part (returned to the pool of work).
# Specified relative to the previously-current source.
# a Source (e.g. when executing a SourceSplitRequest, or when
# splitting an active task using WorkItemStatus.dynamic_source_split),
# relative to the source being split.
&quot;derivationMode&quot;: &quot;A String&quot;, # What source to base the produced source on (if any).
&quot;source&quot;: { # A source that records can be read and decoded from. # Specification of the source.
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
},
&quot;primary&quot;: { # Specification of one of the bundles produced as a result of splitting # Primary part (continued to be processed by worker).
# Specified relative to the previously-current source.
# Becomes current.
# a Source (e.g. when executing a SourceSplitRequest, or when
# splitting an active task using WorkItemStatus.dynamic_source_split),
# relative to the source being split.
&quot;derivationMode&quot;: &quot;A String&quot;, # What source to base the produced source on (if any).
&quot;source&quot;: { # A source that records can be read and decoded from. # Specification of the source.
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
},
},
&quot;sourceOperationResponse&quot;: { # The result of a SourceOperationRequest, specified in # If the work item represented a SourceOperationRequest, and the work
# is completed, contains the result of the operation.
# ReportWorkItemStatusRequest.source_operation when the work item
# is completed.
&quot;split&quot;: { # The response to a SourceSplitRequest. # A response to a request to split a source.
&quot;outcome&quot;: &quot;A String&quot;, # Indicates whether splitting happened and produced a list of bundles.
# If this is USE_CURRENT_SOURCE_AS_IS, the current source should
# be processed &quot;as is&quot; without splitting. &quot;bundles&quot; is ignored in this case.
# If this is SPLITTING_HAPPENED, then &quot;bundles&quot; contains a list of
# bundles into which the source was split.
&quot;shards&quot;: [ # DEPRECATED in favor of bundles.
{ # DEPRECATED in favor of DerivedSource.
&quot;derivationMode&quot;: &quot;A String&quot;, # DEPRECATED
&quot;source&quot;: { # A source that records can be read and decoded from. # DEPRECATED
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
},
],
&quot;bundles&quot;: [ # If outcome is SPLITTING_HAPPENED, then this is a list of bundles
# into which the source was split. Otherwise this field is ignored.
# This list can be empty, which means the source represents an empty input.
{ # Specification of one of the bundles produced as a result of splitting
# a Source (e.g. when executing a SourceSplitRequest, or when
# splitting an active task using WorkItemStatus.dynamic_source_split),
# relative to the source being split.
&quot;derivationMode&quot;: &quot;A String&quot;, # What source to base the produced source on (if any).
&quot;source&quot;: { # A source that records can be read and decoded from. # Specification of the source.
&quot;spec&quot;: { # The source to read from, plus its parameters.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;codec&quot;: { # The codec to use to decode data read from the source.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # Optionally, metadata for this source can be supplied right away,
# avoiding a SourceGetMetadataOperation roundtrip
# (see SourceOperationRequest).
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
&quot;doesNotNeedSplitting&quot;: True or False, # Setting this value to true hints to the framework that the source
# doesn&#x27;t need splitting, and using SourceSplitRequest on it would
# yield SOURCE_SPLIT_OUTCOME_USE_CURRENT.
#
# E.g. a file splitter may set this to true when splitting a single file
# into a set of byte ranges of appropriate size, and set this
# to false when splitting a filepattern into individual files.
# However, for efficiency, a file splitter may decide to produce
# file subranges directly from the filepattern to avoid a splitting
# round-trip.
#
# See SourceSplitRequest for an overview of the splitting process.
#
# This field is meaningful only in the Source objects populated
# by the user (e.g. when filling in a DerivedSource).
# Source objects supplied by the framework to the user don&#x27;t have
# this field populated.
&quot;baseSpecs&quot;: [ # While splitting, sources may specify the produced bundles
# as differences against another source, in order to save backend-side
# memory and allow bigger jobs. For details, see SourceSplitRequest.
# To support this use case, the full set of parameters of the source
# is logically obtained by taking the latest explicitly specified value
# of each parameter in the order:
# base_specs (later items win), spec (overrides anything in base_specs).
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
],
},
},
],
},
&quot;getMetadata&quot;: { # The result of a SourceGetMetadataOperation. # A response to a request to get metadata about a source.
&quot;metadata&quot;: { # Metadata about a Source useful for automatically optimizing # The computed metadata.
# and tuning the pipeline, etc.
&quot;estimatedSizeBytes&quot;: &quot;A String&quot;, # An estimate of the total size (in bytes) of the data that would be
# read from this source. This estimate is in terms of external storage
# size, before any decompression or other processing done by the reader.
&quot;producesSortedKeys&quot;: True or False, # Whether this source is known to produce key/value pairs with
# the (encoded) keys in lexicographically sorted order.
&quot;infinite&quot;: True or False, # Specifies that the size of this source is known to be infinite
# (this is a streaming source).
},
},
},
&quot;reportIndex&quot;: &quot;A String&quot;, # The report index. When a WorkItem is leased, the lease will
# contain an initial report index. When a WorkItem&#x27;s status is
# reported to the system, the report should be sent with
# that report index, and the response will contain the index the
# worker should use for the next report. Reports received with
# unexpected index values will be rejected by the service.
#
# In order to preserve idempotency, the worker should not alter the
# contents of a report, even if the worker must submit the same
# report multiple times before getting back a response. The worker
# should not submit a subsequent report until the response for the
# previous report had been received from the service.
},
],
&quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
# contains the WorkItem&#x27;s job.
}
x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
2 - v2 error format
Returns:
An object of the form:
{ # Response from a request to report the status of WorkItems.
&quot;unifiedWorkerResponse&quot;: { # Untranslated bag-of-bytes WorkProgressUpdateResponse for UnifiedWorker.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
},
&quot;workItemServiceStates&quot;: [ # A set of messages indicating the service-side state for each
# WorkItem whose status was reported, in the same order as the
# WorkItemStatus messages in the ReportWorkItemStatusRequest which
# resulting in this response.
{ # The Dataflow service&#x27;s idea of the current state of a WorkItem
# being processed by a worker.
&quot;harnessData&quot;: { # Other data returned by the service, specific to the particular
# worker harness.
&quot;a_key&quot;: &quot;&quot;, # Properties of the object.
},
&quot;completeWorkStatus&quot;: { # The `Status` type defines a logical error model that is suitable for # If set, a request to complete the work item with the given status. This
# will not be set to OK, unless supported by the specific kind of WorkItem.
# It can be used for the backend to indicate a WorkItem must terminate, e.g.,
# for aborting work.
# different programming environments, including REST APIs and RPC APIs. It is
# used by [gRPC](https://github.com/grpc). Each `Status` message contains
# three pieces of data: error code, error message, and error details.
#
# You can find out more about this error model and how to work with it in the
# [API Design Guide](https://cloud.google.com/apis/design/errors).
&quot;details&quot;: [ # A list of messages that carry the error details. There is a common set of
# message types for APIs to use.
{
&quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
},
],
&quot;code&quot;: 42, # The status code, which should be an enum value of google.rpc.Code.
&quot;message&quot;: &quot;A String&quot;, # A developer-facing error message, which should be in English. Any
# user-facing error message should be localized and sent in the
# google.rpc.Status.details field, or localized by the client.
},
&quot;suggestedStopPoint&quot;: { # Obsolete in favor of ApproximateReportedProgress and ApproximateSplitRequest. # DEPRECATED in favor of split_request.
&quot;position&quot;: { # Position defines a position within a collection of data. The value # Obsolete.
# can be either the end position, a key (used with ordered
# collections), a byte offset, or a record index.
&quot;recordIndex&quot;: &quot;A String&quot;, # Position is a record index.
&quot;byteOffset&quot;: &quot;A String&quot;, # Position is a byte offset.
&quot;shufflePosition&quot;: &quot;A String&quot;, # CloudPosition is a base64 encoded BatchShufflePosition (with FIXED
# sharding).
&quot;concatPosition&quot;: { # A position that encapsulates an inner position and an index for the inner # CloudPosition is a concat position.
# position. A ConcatPosition can be used by a reader of a source that
# encapsulates a set of other sources.
&quot;index&quot;: 42, # Index of the inner source.
&quot;position&quot;: # Object with schema name: Position # Position within the inner source.
},
&quot;end&quot;: True or False, # Position is past all other positions. Also useful for the end
# position of an unbounded range.
&quot;key&quot;: &quot;A String&quot;, # Position is a string key, ordered lexicographically.
},
&quot;percentComplete&quot;: 3.14, # Obsolete.
&quot;remainingTime&quot;: &quot;A String&quot;, # Obsolete.
},
&quot;nextReportIndex&quot;: &quot;A String&quot;, # The index value to use for the next report sent by the worker.
# Note: If the report call fails for whatever reason, the worker should
# reuse this index for subsequent report attempts.
&quot;suggestedStopPosition&quot;: { # Position defines a position within a collection of data. The value # Obsolete, always empty.
# can be either the end position, a key (used with ordered
# collections), a byte offset, or a record index.
&quot;recordIndex&quot;: &quot;A String&quot;, # Position is a record index.
&quot;byteOffset&quot;: &quot;A String&quot;, # Position is a byte offset.
&quot;shufflePosition&quot;: &quot;A String&quot;, # CloudPosition is a base64 encoded BatchShufflePosition (with FIXED
# sharding).
&quot;concatPosition&quot;: { # A position that encapsulates an inner position and an index for the inner # CloudPosition is a concat position.
# position. A ConcatPosition can be used by a reader of a source that
# encapsulates a set of other sources.
&quot;index&quot;: 42, # Index of the inner source.
&quot;position&quot;: # Object with schema name: Position # Position within the inner source.
},
&quot;end&quot;: True or False, # Position is past all other positions. Also useful for the end
# position of an unbounded range.
&quot;key&quot;: &quot;A String&quot;, # Position is a string key, ordered lexicographically.
},
&quot;hotKeyDetection&quot;: { # Proto describing a hot key detected on a given WorkItem. # A hot key is a symptom of poor data distribution in which there are enough
# elements mapped to a single key to impact pipeline performance. When
# present, this field includes metadata associated with any hot key.
&quot;userStepName&quot;: &quot;A String&quot;, # User-provided name of the step that contains this hot key.
&quot;hotKeyAge&quot;: &quot;A String&quot;, # The age of the hot key measured from when it was first detected.
&quot;systemName&quot;: &quot;A String&quot;, # System-defined name of the step containing this hot key.
# Unique across the workflow.
},
&quot;metricShortId&quot;: [ # The short ids that workers should use in subsequent metric updates.
# Workers should strive to use short ids whenever possible, but it is ok
# to request the short_id again if a worker lost track of it
# (e.g. if the worker is recovering from a crash).
# NOTE: it is possible that the response may have short ids for a subset
# of the metrics.
{ # The metric short id is returned to the user alongside an offset into
# ReportWorkItemStatusRequest
&quot;shortId&quot;: &quot;A String&quot;, # The service-generated short identifier for the metric.
&quot;metricIndex&quot;: 42, # The index of the corresponding metric in
# the ReportWorkItemStatusRequest. Required.
},
],
&quot;splitRequest&quot;: { # A suggestion by the service to the worker to dynamically split the WorkItem. # The progress point in the WorkItem where the Dataflow service
# suggests that the worker truncate the task.
&quot;fractionConsumed&quot;: 3.14, # A fraction at which to split the work item, from 0.0 (beginning of the
# input) to 1.0 (end of the input).
&quot;fractionOfRemainder&quot;: 3.14, # The fraction of the remainder of work to split the work item at, from 0.0
# (split at the current position) to 1.0 (end of the input).
&quot;position&quot;: { # Position defines a position within a collection of data. The value # A Position at which to split the work item.
# can be either the end position, a key (used with ordered
# collections), a byte offset, or a record index.
&quot;recordIndex&quot;: &quot;A String&quot;, # Position is a record index.
&quot;byteOffset&quot;: &quot;A String&quot;, # Position is a byte offset.
&quot;shufflePosition&quot;: &quot;A String&quot;, # CloudPosition is a base64 encoded BatchShufflePosition (with FIXED
# sharding).
&quot;concatPosition&quot;: { # A position that encapsulates an inner position and an index for the inner # CloudPosition is a concat position.
# position. A ConcatPosition can be used by a reader of a source that
# encapsulates a set of other sources.
&quot;index&quot;: 42, # Index of the inner source.
&quot;position&quot;: # Object with schema name: Position # Position within the inner source.
},
&quot;end&quot;: True or False, # Position is past all other positions. Also useful for the end
# position of an unbounded range.
&quot;key&quot;: &quot;A String&quot;, # Position is a string key, ordered lexicographically.
},
},
&quot;leaseExpireTime&quot;: &quot;A String&quot;, # Time at which the current lease will expire.
&quot;reportStatusInterval&quot;: &quot;A String&quot;, # New recommended reporting interval.
},
],
}</pre>
</div>
</body></html>