Creating Bundles with the Python API
Install examples, use API to version your data
This step of the tutorial will teach you about:
How to use the Disdat Python API to create and clear a local context
How to use the API to create simple bundles that store scalars, lists, or dictionaries
How to use Disdat to create managed output paths so you can store output files in your bundles.
Set up our example Python project
Clone the examples github repo (https://github.com/seanr15/disdat-examples) here.
We'll assume you've installed it in
$CODE
Change directories into your project:
cd $CODE
Assuming you are in your virtual environment, install the example project:
pip install -e .
Bundle basics
Bundles collect a set of literals and files as a versioned unit
The Python Bundle API can present your bundle as basic Python types, see here.
Bundles can store any number of user tags
Bundles can also track parameters, git information, timing, and lineage
Creating bundles
Create example data and a context
import disdat.api as api
from disdat.api import Bundle
data_context = 'example-context'
bundle_name = "example_data"
api.context(data_context)
Create a bundle to hold our dictionary
with Bundle(data_context, name=bundle_name) as b:
b.add_data({'jumping':[3.0,4.8], 'jack': [6,8,10]})
b.add_tags({'info':'storing a dict'})
Create a bundle that holds existing files
Bundles can also hold links to files. These are strings that look like file paths. You can add external files by simply adding the path to the bundle. Note that this will make a copy of the file -- you're versioning this particular output.
local_fp = tempfile.NamedTemporaryFile()
local_fp.write(b'an external local file')
local_fp.flush()
with Bundle(data_context, name=bundle_name) as b:
b.add_data(local_fp.name)
b.add_tags({'info':'added a local file'})
local_fp.close()
Use managed paths to version files with zero copies
But Disdat can create managed output paths -- you only need to provide the name of the output file, not where it's stored! Here we version two output files in a bundle.
with Bundle(data_context, name=bundle_name) as b:
f1 = b.get_file("file_1.txt")
f2 = b.get_file("file_2.txt")
with f1.open(mode='w') as f:
f.write("This is our first file!")
with f2.open(mode='w') as f:
f.write("This is our second file!")
b.add_data([f1,f2])
b.add_tags({'info':'adding two files'})
Search for versioned data!
Here we use the api search method to find all versions of the bundle "example_data" . We print out the creation date as well as the Bundle.data
field.
for b in api.search(data_context, bundle_name):
print('{}\t{}'.format(b.name, datetime.utcfromtimestamp(b.creation_date)))
print('\tdata: {}'.format(b.data))
print()
example_data 2020-05-16 01:08:31.208431
data: ['/Users/kyocum/.disdat/context/example-context/objects/aaf3d71c-51a2-4a45-94a1-301ad6465a87/file_1.txt'
'/Users/kyocum/.disdat/context/example-context/objects/aaf3d71c-51a2-4a45-94a1-301ad6465a87/file_2.txt']
example_data 2020-05-16 01:08:23.983418
data: {'jumping': array([3. , 4.8]), 'jack': array([ 6, 8, 10])}
Check out another notebook ("5_store_models_using_bundle_api") that shows how to use the API to store pickled scikit learn models, retrieve them, and use them for prediction.
Last updated
Was this helpful?