Building Pipelines
Dependencies
A Disdat PipeTask consists of two functions: pipe_requires
and pipe_run
. If you know Luigi, these are analagous to requires
and run
. Let's first describe what they do, and then we can describe how they are different from those analogs.
pipe_requires
: Here you declare the tasks that must run before this task. To do so you write statements like:self.add_dependency("input_gzs", GetDataGz, {})
. This says that the current task needs aGetDataGz
instance to run with no parameters. It also says that Disdat should setup the output of that task as a named parameter topipe_run
called'input_gzs'
.Users may optionally name the output bundle in this function with
set_bundle_name(<your name>)
.pipe_run
: Here you perform the task's main work. This function gets a set of named parameters.pipeline_input
: This contains the input bundle data.: Disdat prepares named parameters for each upstream dependency. The variable will be they same type as the upstream task returned.
Last updated
Was this helpful?