dsdt apply
Run a pipeline (not containerized)
Description
Assuming you have defined a Disdat DAG of one or more tasks where:
package
: The Python package (directory with a__init__.py
file) containing your modulemodule
: The module containing the Disdat class you wish to runclass
: The name of the class that defines yrou Disdat task
Usage
usage: dsdt apply [-h] [-cs] [-w WORKERS] [-it INPUT_TAG] [-ot OUTPUT_TAG]
[-o OUTPUT_BUNDLE] [-f] [--force-all] [--incremental-push]
[--incremental-pull]
pipe_cls ...
Options
positional arguments:
pipe_cls User-defined transform, e.g., 'module.PipeClass'
params Optional set of parameters for this pipe '--parameter
value'
optional arguments:
-h, --help show this help message and exit
-cs, --central-scheduler
Use a central Luigi scheduler (defaults to local
scheduler)
-w WORKERS, --workers WORKERS
Number of Luigi workers on this node
-it INPUT_TAG, --input-tag INPUT_TAG
Input bundle tags: '-it authoritative:True -it
version:0.7.1'
-ot OUTPUT_TAG, --output-tag OUTPUT_TAG
Output bundle tags: '-ot authoritative:True -ot
version:0.7.1'
-o OUTPUT_BUNDLE, --output-bundle OUTPUT_BUNDLE
Name output bundle: '-o my.output.bundle'. Default
name is '<TaskName>_<param_hash>'
-f, --force Force re-computation of only this task.
--force-all Force re-computation of ALL upstream tasks.
--incremental-push Commit and push each task's bundle as it is produced
to the remote.
--incremental-pull Localize bundles as they are needed by downstream
tasks from the remote.
Examples
Tutorial: running the pipeline
$dsdt apply pipelines.mnist.Train
Forcing the last task to execute:
$dsdt apply -f pipelines.mnist.Train
Naming the output bundle and adding tags:
$dsdt apply -o mnist.trained -ot org:data_science pipelines.mnist.Train
Last updated
Was this helpful?