Simple Pipeline
Using examples from https://github.com/seanr15/disdat-examples
Last updated
Was this helpful?
Using examples from https://github.com/seanr15/disdat-examples
Last updated
Was this helpful?
Writing simple pipelines using PipeTask
classes and pipe_requires()
and pipe_run()
methods.
How to programmatically set the bundles human name in pipe_requires()
How to specify dependencies (both normal and external)
Disdat supported Task return types, or how bundles present to pipeline tasks
Clone our examples github repo (https://github.com/seanr15/disdat-examples)
We'll assume you've installed it in $CODE
Change directories into your project:cd $CODE
Assuming you are in your virtual environment, install the example project: pip install -e .
We will look at a simple two-step pipeline (the file is on github ). Here Task B runs after task A. In , task B requires task A.
Disdat defines tasks using instances of PipeTask
Each Task declares the upstream Tasks that must complete before it runs. They do so by adding dependencies in `pipe_requires()`
In pipe_requires()
one can:
Add dependencies via:
self.add_dependency(<arg name>, PipeTask class, params dictionary)
Semantic: If the output bundle from A does not exist, re-run task A
self.add_external_dependency(<arg name>, PipeTask class, params dictionary)
Set the bundle's human name via: self.set_bundle_name('b')
pipe_run()
one can:
Do whatever you want
Return values from which Disdat will create a bundle
Note: the return type of a task is called the output bundle's presentation.
Here all the bundles present as integer literals.
Semantic: Only look for this upstream bundle, but fail if it does not exist. Equivalent to a .
See more about .