Building Pipelines

Dependencies

A Disdat PipeTask consists of two functions: pipe_requires and pipe_run. If you know Luigi, these are analagous to requires and run. Let's first describe what they do, and then we can describe how they are different from those analogs.

pipe_requires: Here you declare the tasks that must run before this task. To do so you write statements like: self.add_dependency("input_gzs", GetDataGz, {}). This says that the current task needs a GetDataGz instance to run with no parameters. It also says that Disdat should setup the output of that task as a named parameter to pipe_run called 'input_gzs'.
Users may optionally name the output bundle in this function with set_bundle_name(<your name>).
pipe_run: Here you perform the task's main work. This function gets a set of named parameters.
- pipeline_input: This contains the input bundle data.
- : Disdat prepares named parameters for each upstream dependency. The variable will be they same type as the upstream task returned.

PreviousPython API NextRunning Pipelines on AWS

Last updated 5 years ago

Was this helpful?