> For the complete documentation index, see [llms.txt](https://disdat.gitbook.io/disdat-documentation/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://disdat.gitbook.io/disdat-documentation/reference/dsdt-the-cli/dsdt-dockerize.md).

# dsdt dockerize

### Description

Disdat can package your pipeline into a Docker container, wrapping up code dependencies so that it runs  the same almost anywhere.   For data science, this can mean packaging up feature creation, model training, and prediction into one or more containers and running those containers on a compute cluster (such as AWS Batch).    Disdat gives you the ability to run your pipeline in the container through the CLI with `dsdt run` or with the API with `disdat.api.run`.   &#x20;

In addition to standard Python support via setuptools, the dockerizer supports Disdat pipelines that use:

* Custom Python packages (not available via pip)
* Linux packages via either a list of `.deb` packages in a `deb.txt` file or custom `yourpkg.deb` files.
* R packages

#### Create a setup.py file:

In Disdat, dockerization is based on having a `setup.py`file that can be used to build a source distribution for your code (an `sdist`).    Here is the `setup.py` file from our [disdat-examples git repo](https://github.com/seanr15/disdat-examples).  &#x20;

```
from setuptools import setup, find_packages


setup(
    name='disdat-examples',
    version='0.0.1rc0',

    packages=find_packages(),
    include_package_data=True,

    install_requires=[
        'disdat>=0.9.0',
        'pandas==0.25.3',
        'jupyter',
        'spacy',
        'tensorflow==1.14.0'
    ]
)

```

#### Optional MANIFEST.in

{% hint style="info" %}
Sometimes you need to add auxillery data to your source distribution.   You can do so using a `MANIFEST.in` file.  The disdat-examples git repo has a `MANIFEST.in` file you can look at as an example of how to include a spacy English-based model.   &#x20;
{% endhint %}

### Optional Installs via the Config Directory

One can add R, Linux, and other source Python packages (sdists) to your image by creating a configuration directory.   The directory can have any name (here we call it `config`).   One places the relevant directives under `$PROJECT_HOME/config/`&#x20;

#### Optional R installs

Create a file `$PROJECT_HOME/config/python-3.6.8-slim/r.txt`  For example, this file could list  `MASS quantreg forecast dplyr tidyr data.table timetk lubridate` as packages to install.&#x20;

#### Optional Linux installs

List your Debian packages in a \`deb.txt\` like`$PROJECT_HOME/config/python-3.6.8-slim/deb.txt` Or you may place Debian packages directly in `$PROJECT_HOME/config/python-3.6.8-slim/`&#x20;

#### Optional source Python packages

Place sdists under `$PROJECT_HOME/config/python-sdist/` so that they are included in the Docker images virtual environment.&#x20;

### Usage

```
usage: dsdt dockerize [-h] [--config-dir CONFIG_DIR] [--os-type OS_TYPE]
                      [--os-version OS_VERSION] [--push] [--get-id]
                      [--sagemaker] [--no-build]
                      pipeline_root
```

### Options

```
positional arguments:
  pipeline_root         Root of the Python source tree containing the user-
                        defined transform; must have a setuptools-style
                        setup.py file

optional arguments:
  -h, --help            show this help message and exit
  --config-dir CONFIG_DIR
                        A directory containing configuration files for the
                        operating system within the Docker image
  --os-type OS_TYPE     The base operating system type for the Docker image
  --os-version OS_VERSION
                        The base operating system version for the Docker image
  --push                Push the image to a remote Docker registry (default is
                        to not push; must set 'docker_registry' in Disdat
                        config)
  --get-id              Do not build, only return latest container image ID
  --sagemaker           Create a Docker image executable as a SageMaker
                        container.
  --no-build            Do not build an image (only copy files into the Docker
                        build context)
```

### Examples

#### Create a container without a config directory

```bash
dsdt dockerize .
```

#### Create a container with optional installs specified from a config directory

```bash
dsdt dockerize . --config-dir config
```

#### Push a container to AWS ECR without building it

Users can push Disdat Docker images to AWS Elastic Container Registry (ECR) using the `--push` option to the `dockerize` command:

```bash
dsdt dockerize . --push --no-build
```

To push images to Docker, the user needs to specify the registry prefix in the Disdat configuration file. We assume that the user has first followed the instructions in the AWS ECR documentation for setting up AWS credentials and profiles required to access AWS services, as the dockerizer uses the current AWS profile to determine the ECR URL and obtain Docker server authentication tokens. The user then sets the following configuration options in the Disdat configuration file in the `[docker]` stanza:

* `registry`: Set to `*ECR*`.
* `repository_prefix`: An optional prefix of the form `a/b/[...]` that

  `dockerize` will prepend to the image name. Given a `name` in the `setup.py` file like `disdat-examples` and a prefix `a/b`, `dockerize` will push

  an image named `a/b/disdat-examples:latest`.

##


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://disdat.gitbook.io/disdat-documentation/reference/dsdt-the-cli/dsdt-dockerize.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
