LogoLogo
  • Overview
  • Setup and Configuration
  • Other Data Versioning Systems
  • Examples
    • Tutorial
      • Creating Bundles with the Python API
      • Push/Pull using S3
      • Simple Pipeline
      • Run the Pipeline
      • Dockerize a Pipeline
      • Run the Pipeline Container (locally)
      • Run the Pipeline Container (AWS)
    • Examples
      • MNIST and TensorFlow
      • Spacy Task
  • Basic Concepts
    • Bundles
      • Naming
      • Bundle Data Types
      • Tags and Parameters
      • Lineage (or Bundle Metadata)
    • Data Contexts
  • Reference
    • CLI Reference
      • dsdt add
      • dsdt apply
      • dsdt cat
      • dsdt context
      • dsdt commit
      • dsdt dockerize
      • dsdt init
      • dsdt lineage
      • dsdt ls
      • dsdt pull
      • dsdt push
      • dsdt remote
      • dsdt rm
      • dsdt rmr
      • dsdt switch
    • Python API
  • Details
  • Building Pipelines
  • Running Pipelines on AWS
  • Admin
    • Contact / Slack
Powered by GitBook
On this page

Was this helpful?

Building Pipelines

Dependencies

A Disdat PipeTask consists of two functions: pipe_requires and pipe_run. If you know Luigi, these are analagous to requires and run. Let's first describe what they do, and then we can describe how they are different from those analogs.

  • pipe_requires: Here you declare the tasks that must run before this task. To do so you write statements like: self.add_dependency("input_gzs", GetDataGz, {}). This says that the current task needs a GetDataGz instance to run with no parameters. It also says that Disdat should setup the output of that task as a named parameter to pipe_run called 'input_gzs'.

    Users may optionally name the output bundle in this function with set_bundle_name(<your name>).

  • pipe_run: Here you perform the task's main work. This function gets a set of named parameters.

    • pipeline_input: This contains the input bundle data.

    • : Disdat prepares named parameters for each upstream dependency. The variable will be they same type as the upstream task returned.

PreviousPython APINextRunning Pipelines on AWS

Last updated 5 years ago

Was this helpful?