dsdt dockerize
Build the container that holds your pipeline
Last updated
Was this helpful?
Build the container that holds your pipeline
Last updated
Was this helpful?
Disdat can package your pipeline into a Docker container, wrapping up code dependencies so that it runs the same almost anywhere. For data science, this can mean packaging up feature creation, model training, and prediction into one or more containers and running those containers on a compute cluster (such as AWS Batch). Disdat gives you the ability to run your pipeline in the container through the CLI with dsdt run
or with the API with disdat.api.run
.
In addition to standard Python support via setuptools, the dockerizer supports Disdat pipelines that use:
Custom Python packages (not available via pip)
Linux packages via either a list of .deb
packages in a deb.txt
file or custom yourpkg.deb
files.
R packages
In Disdat, dockerization is based on having a setup.py
file that can be used to build a source distribution for your code (an sdist
). Here is the setup.py
file from our .
One can add R, Linux, and other source Python packages (sdists) to your image by creating a configuration directory. The directory can have any name (here we call it config
). One places the relevant directives under $PROJECT_HOME/config/
Create a file $PROJECT_HOME/config/python-3.6.8-slim/r.txt
For example, this file could list MASS quantreg forecast dplyr tidyr data.table timetk lubridate
as packages to install.
List your Debian packages in a `deb.txt` like$PROJECT_HOME/config/python-3.6.8-slim/deb.txt
Or you may place Debian packages directly in $PROJECT_HOME/config/python-3.6.8-slim/
Place sdists under $PROJECT_HOME/config/python-sdist/
so that they are included in the Docker images virtual environment.
Users can push Disdat Docker images to AWS Elastic Container Registry (ECR) using the --push
option to the dockerize
command:
To push images to Docker, the user needs to specify the registry prefix in the Disdat configuration file. We assume that the user has first followed the instructions in the AWS ECR documentation for setting up AWS credentials and profiles required to access AWS services, as the dockerizer uses the current AWS profile to determine the ECR URL and obtain Docker server authentication tokens. The user then sets the following configuration options in the Disdat configuration file in the [docker]
stanza:
registry
: Set to *ECR*
.
repository_prefix
: An optional prefix of the form a/b/[...]
that
dockerize
will prepend to the image name. Given a name
in the setup.py
file like disdat-examples
and a prefix a/b
, dockerize
will push
an image named a/b/disdat-examples:latest
.