Tutorial

From bundles, to remote S3, to pipelines, to execution

This first step of the tutorial will teach you about:

  • How to create a local data context to store bundles

  • Use the CLI to create a simple bundle that contains a single file

  • Make multiple versions of the bundle and inspect the bundle using the CLI

Create a bundle using the CLI

Here we create a bundle and inspect it. I assume you followed the instructions in the overview and you have Disdat installed and initialized!

  1. Create a new local data context. $dsdt context examples

  2. Switch into that local data context. $dsdt switch examples

The commands dsdt context and dsdt switch are kind of like git branch and git checkout. However, we use different terms because contexts don't behave like code repositories. The last command dsdt context shows you all the local contexts you have on your machine.

$ dsdt context examples
Disdat created data context None/examples at object dir /Users/kyocum/.disdat/context/examples/objects.
$ dsdt switch examples
Switched to context examples
$ dsdt context
*	examples	[None@None]

Now let's add some data. Disdat wraps up collections of literals and files into a bundle. Using the CLI , you can make bundles from files or directories. We'll refer to README.md but you can choose any file you wish.

  1. Create a bundle called my.bundle that will contain the file README.md and add it to the local context.

  2. List out all the bundles in our local context.

  3. cat the bundle to show its contents.

$ dsdt add my.bundle README.md
$ dsdt ls -v
NAME                	PROC_NAME           	OWNER   	DATE              	COMMITTED	UUID                                    	TAGS
my.bundle           	BundleWrapperTask_my_bundle____e1908ea6e8	kyocum  	01-12-20 21:30:45 	False   	386f13cf-5b51-4237-b649-8549eff30004
$ dsdt cat my.bundle   
/Users/kyocum/.disdat/context/examples/objects/386f13cf-5b51-4237-b649-8549eff30004/README.md

Great! You've created bundle that just contains one file, README.md. Now lets make another version with the same name:

$ dsdt add my.bundle README.md
$ dsdt ls -v
NAME                	PROC_NAME           	OWNER   	DATE              	COMMITTED	UUID                                    	TAGS
my.bundle           	BundleWrapperTask_my_bundle____e1908ea6e8	kyocum  	01-12-20 21:36:42 	False   	c1f9085f-8bb5-4417-8b65-804a1ae7e451
my.bundle           	BundleWrapperTask_my_bundle____e1908ea6e8	kyocum  	01-12-20 21:30:45 	False   	386f13cf-5b51-4237-b649-8549eff30004

Now you have two versions of the same data. They share the same NAME , so any time you ask Disdat for my.bundle you will always get the most recent (unless you ask for it by PROC_NAME or UUIDas well like dsdt cat -u c1f9085f-8bb5-4417-8b65-804a1ae7e451 )

Congrats! You've created your first data context and bundle. In the rest of the tutorial we'll look at how how you can push/pull your bundles to/from AWS S3 to share data with colleagues and as inputs/outputs from pipelines.

Last updated

Was this helpful?