Building Pipelines

Dependencies

A Disdat PipeTask consists of two functions: pipe_requires and pipe_run. If you know Luigi, these are analagous to requires and run. Let's first describe what they do, and then we can describe how they are different from those analogs.

  • pipe_requires: Here you declare the tasks that must run before this task. To do so you write statements like: self.add_dependency("input_gzs", GetDataGz, {}). This says that the current task needs a GetDataGz instance to run with no parameters. It also says that Disdat should setup the output of that task as a named parameter to pipe_run called 'input_gzs'.

    Users may optionally name the output bundle in this function with set_bundle_name(<your name>).

  • pipe_run: Here you perform the task's main work. This function gets a set of named parameters.

    • pipeline_input: This contains the input bundle data.

    • : Disdat prepares named parameters for each upstream dependency. The variable will be they same type as the upstream task returned.

Last updated