# Creating Bundles with the Python API

## This step of the tutorial will teach you about:

* How to use the Disdat Python API to create and clear a local context
* How to use the API to create simple bundles that store scalars, lists, or dictionaries
* How to use Disdat to create *managed output paths* so you can store output files in your bundles.

## Set up our example Python project

1. Clone the examples github repo (<https://github.com/seanr15/disdat-examples>) [here.](https://github.com/seanr15/disdat-examples)
2. We'll assume you've installed it in `$CODE`
3. Change directories into your project:`cd $CODE`
4. Assuming you are in your virtual environment, install the example project: `pip install -e .`

## Bundle basics

* Bundles collect a set of literals and files as a versioned unit
* The Python Bundle API can present your bundle as basic Python types, see [here](https://disdat.gitbook.io/disdat-documentation/basic-concepts/bundles/untitled).
* Bundles can store any number of user tags&#x20;
* Bundles can also track parameters, git information, timing, and lineage

## Creating bundles

{% hint style="info" %}
This [notebook](https://github.com/seanr15/disdat-examples/blob/master/notebooks/0_simple_bundle_api.ipynb) (from the git repo above) has more examples than just the ones below.  Follow along and explore those as well! &#x20;
{% endhint %}

### Create example data and a context

```python
import disdat.api as api
from disdat.api import Bundle

data_context = 'example-context'
bundle_name = "example_data"

api.context(data_context)
```

### Create a bundle to hold our dictionary

```python
with Bundle(data_context, name=bundle_name) as b:
    b.add_data({'jumping':[3.0,4.8], 'jack': [6,8,10]})
    b.add_tags({'info':'storing a dict'})
```

### Create a bundle that holds existing files

Bundles can also hold links to files.   These are strings that look like file paths.   You can add external files by simply adding the path to the bundle.   Note that this will make a copy of the file -- you're versioning this particular output.&#x20;

```python
local_fp = tempfile.NamedTemporaryFile()
local_fp.write(b'an external local file')
local_fp.flush()

with Bundle(data_context, name=bundle_name) as b:
    b.add_data(local_fp.name)
    b.add_tags({'info':'added a local file'})

local_fp.close()
```

### Use managed paths to version files with zero copies

But Disdat can create *managed output paths* -- you only need to provide the name of the output file, not where it's stored!    Here we version two output files in a bundle.&#x20;

```python
with Bundle(data_context, name=bundle_name) as b:
    f1 = b.get_file("file_1.txt")
    f2 = b.get_file("file_2.txt")
    with f1.open(mode='w') as f:
        f.write("This is our first file!")
    with f2.open(mode='w') as f:
        f.write("This is our second file!")
    b.add_data([f1,f2])
    b.add_tags({'info':'adding two files'})
```

{% hint style="info" %}
Disdat can also manage S3 files.  If you [bind your local context to a remote](https://disdat.gitbook.io/disdat-documentation/reference/dsdt-the-cli/dsdt-remote#dsdt-remote), you can place S3 files into a bundle in the same way as local files.   You can either add an S3 path directly, or you can call `Bundle.get_remote_file` to get a path on S3 at which to store your file.   You can also ask for and add local and remote directories by using `Bundle.get_directory` and `Bundle.get_remote_directory` &#x20;
{% endhint %}

## Search for versioned data!

Here we use the api search method to find all versions of the bundle "example\_data" .  We print out the creation date as well as the `Bundle.data` field.&#x20;

```python
for b in api.search(data_context, bundle_name):    
    print('{}\t{}'.format(b.name, datetime.utcfromtimestamp(b.creation_date)))
    print('\tdata: {}'.format(b.data))
    print()
```

```python
example_data	2020-05-16 01:08:31.208431
	data: ['/Users/kyocum/.disdat/context/example-context/objects/aaf3d71c-51a2-4a45-94a1-301ad6465a87/file_1.txt'
 '/Users/kyocum/.disdat/context/example-context/objects/aaf3d71c-51a2-4a45-94a1-301ad6465a87/file_2.txt']

example_data	2020-05-16 01:08:23.983418
	data: {'jumping': array([3. , 4.8]), 'jack': array([ 6,  8, 10])}
```

{% hint style="success" %}
Check out another [notebook ](https://github.com/seanr15/disdat-examples/blob/master/notebooks/5_store_models_using_bundle_api.ipynb)("5\_store\_models\_using\_bundle\_api") that shows how to use the API to store pickled scikit learn models, retrieve them, and use them for prediction.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://disdat.gitbook.io/disdat-documentation/examples/short-test-drive/creating-bundles-with-the-python-api.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
