LogoLogo
  • Overview
  • Setup and Configuration
  • Other Data Versioning Systems
  • Examples
    • Tutorial
      • Creating Bundles with the Python API
      • Push/Pull using S3
      • Simple Pipeline
      • Run the Pipeline
      • Dockerize a Pipeline
      • Run the Pipeline Container (locally)
      • Run the Pipeline Container (AWS)
    • Examples
      • MNIST and TensorFlow
      • Spacy Task
  • Basic Concepts
    • Bundles
      • Naming
      • Bundle Data Types
      • Tags and Parameters
      • Lineage (or Bundle Metadata)
    • Data Contexts
  • Reference
    • CLI Reference
      • dsdt add
      • dsdt apply
      • dsdt cat
      • dsdt context
      • dsdt commit
      • dsdt dockerize
      • dsdt init
      • dsdt lineage
      • dsdt ls
      • dsdt pull
      • dsdt push
      • dsdt remote
      • dsdt rm
      • dsdt rmr
      • dsdt switch
    • Python API
  • Details
  • Building Pipelines
  • Running Pipelines on AWS
  • Admin
    • Contact / Slack
Powered by GitBook
On this page
  • Bundle Metadata
  • Accessing lineage

Was this helpful?

  1. Basic Concepts
  2. Bundles

Lineage (or Bundle Metadata)

Backward tracing bundle history

PreviousTags and ParametersNextData Contexts

Last updated 5 years ago

Was this helpful?

Bundle Metadata

Beyond human, processing, and UUId names, each bundle also tracks:

  • Creation date

  • Git repo, branch, and latest commit hash

  • Task start and complete timestamps

  • Bundle dependency set: the input bundles taken as input

It's that last set, that allows us to trace lineage.

Accessing lineage

Given the UUID of the output bundle, you can access lineage via the CLI.

dsdt lineage 564cfb28-14ac-412f-a09b-2a5bdb835f48
------ Lineage @ depth 0 -----
processing name: GetFeatures_2019_09_10____2016_12_01_8159b5bded
uuid: 564cfb28-14ac-412f-a09b-2a5bdb835f48
creation date: 2019-09-10 09:01:39.106816
code repo: https://github.intuit.com/data-science/iep-tt-pipeline
code hash: 2b07f8dd02bdea160481673fde6e5abb111a6ae5
code method:
git commit URL: https://github.intuit.com/data-science/iep-tt-pipeline/commit/2b07f8dd02bdea160481673fde6e5abb111a6ae5
code branch: HEAD
Start 1568131299.0983887 Stop 1568131299.100352 Duration 0.001963376998901367

	------ Lineage @ depth 1 -----
	processing name: GoogTrendTurbotaxLive_2019_09_10____2016_12_01_2fdd4e0539
	uuid: e57dd790-0d5b-434f-b4f8-65bbda2dfa7d
	creation date: 2019-09-10 09:01:15.001553
	code repo: https://github.intuit.com/data-science/iep-tt-pipeline
	code hash: 2b07f8dd02bdea160481673fde6e5abb111a6ae5
	code method:
	git commit URL: https://github.intuit.com/data-science/iep-tt-pipeline/commit/2b07f8dd02bdea160481673fde6e5abb111a6ae5
	code branch: HEAD
	Start 1568131273.2171564 Stop 1568131275.0006177 Duration 1.783461332321167

	------ Lineage @ depth 1 -----
	processing name: GoogTrendTurboTaxLive_2019_09_10____2016_12_01_3239f44071
	uuid: a359e040-f326-4df2-9c6e-7ab3769d36dd
	creation date: 2019-09-10 09:01:14.987869
	code repo: https://github.intuit.com/data-science/iep-tt-pipeline
	code hash: 2b07f8dd02bdea160481673fde6e5abb111a6ae5
	code method:
	git commit URL: https://github.intuit.com/data-science/iep-tt-pipeline/commit/2b07f8dd02bdea160481673fde6e5abb111a6ae5
	code branch: HEAD
	Start 1568131273.2121537 Stop 1568131274.986895 Duration 1.7747414112091064