Building Pipelines
Dependencies
A Disdat PipeTask consists of two functions: pipe_requires and pipe_run. If you know Luigi, these are analagous to requires and run. Let's first describe what they do, and then we can describe how they are different from those analogs.
pipe_requires: Here you declare the tasks that must run before this task. To do so you write statements like:self.add_dependency("input_gzs", GetDataGz, {}). This says that the current task needs aGetDataGzinstance to run with no parameters. It also says that Disdat should setup the output of that task as a named parameter topipe_runcalled'input_gzs'.Users may optionally name the output bundle in this function with
set_bundle_name(<your name>).pipe_run: Here you perform the task's main work. This function gets a set of named parameters.pipeline_input: This contains the input bundle data.: Disdat prepares named parameters for each upstream dependency. The variable will be they same type as the upstream task returned.
Last updated
Was this helpful?