dsdt apply

Run a pipeline (not containerized)

Description

Assuming you have defined a Disdat DAG of one or more tasks where:

  • package: The Python package (directory with a __init__.py file) containing your module

  • module: The module containing the Disdat class you wish to run

  • class: The name of the class that defines yrou Disdat task

Usage

usage: dsdt apply [-h] [-cs] [-w WORKERS] [-it INPUT_TAG] [-ot OUTPUT_TAG]
                  [-o OUTPUT_BUNDLE] [-f] [--force-all] [--incremental-push]
                  [--incremental-pull]
                  pipe_cls ...

Options

positional arguments:
  pipe_cls              User-defined transform, e.g., 'module.PipeClass'
  params                Optional set of parameters for this pipe '--parameter
                        value'

optional arguments:
  -h, --help            show this help message and exit
  -cs, --central-scheduler
                        Use a central Luigi scheduler (defaults to local
                        scheduler)
  -w WORKERS, --workers WORKERS
                        Number of Luigi workers on this node
  -it INPUT_TAG, --input-tag INPUT_TAG
                        Input bundle tags: '-it authoritative:True -it
                        version:0.7.1'
  -ot OUTPUT_TAG, --output-tag OUTPUT_TAG
                        Output bundle tags: '-ot authoritative:True -ot
                        version:0.7.1'
  -o OUTPUT_BUNDLE, --output-bundle OUTPUT_BUNDLE
                        Name output bundle: '-o my.output.bundle'. Default
                        name is '<TaskName>_<param_hash>'
  -f, --force           Force re-computation of only this task.
  --force-all           Force re-computation of ALL upstream tasks.
  --incremental-push    Commit and push each task's bundle as it is produced
                        to the remote.
  --incremental-pull    Localize bundles as they are needed by downstream
                        tasks from the remote.

Examples

$dsdt apply pipelines.mnist.Train

Forcing the last task to execute:

$dsdt apply -f pipelines.mnist.Train

Naming the output bundle and adding tags:

$dsdt apply -o mnist.trained -ot org:data_science pipelines.mnist.Train

Last updated