Dockerize a Pipeline

Docker files are amazing, and you shouldn't write them.

This section will teach you:

  • How to build Docker containers for your pipeline

  • How to specify Python package dependencies (just use setup.py!)

  • How to send the container to AWS ECR (if you've set up your AWS credentials)

The Disdat dockerizer will build a container based on your project's setup.py It will install any Python dependencies it finds in that setup.py file. If you're project can create a source distribution via pip setup.py sdist then you can use Disdat to dockerize your pipeline.

Pro Tip: Does your project depend on packages in your organization's own PyPi server? If so, you'll want to create a pip.conf file and then refer to it in the Disdat configuration file like this.

Build the container

  1. Have you installed Docker on your dev box? Do that first!

  2. Change into your project's directory.

  3. Run dsdt dockerize <your project's directory>

$dsdt dockerize .
Copying dot file /Users/kyocum/.pip/pip.conf into /var/folders/qv/3rgd_4_569s_8x9xftx96m40zcg1xn/T/tmpx8j7e2hxdockerize
---------- Building base operating system environment
docker build \
		--build-arg KICKSTART_ROOT=/opt/kickstart \
		--build-arg CONDA_VERSION=NO_CONDA \
		--build-arg VIRTUAL_ENV=/opt/python-virtualenv \
		--file /var/folders/qv/3rgd_4_569s_8x9xftx96m40zcg1xn/T/tmpx8j7e2hxdockerize/Dockerfiles/00-disdat-python-3.6.8-slim.dockerfile \
		--tag disdat-python-3.6.8-slim \
		/var/folders/qv/3rgd_4_569s_8x9xftx96m40zcg1xn/T/tmpx8j7e2hxdockerize
Sending build context to Docker daemon  11.94MB
Step 1/11 : FROM python:3.6.8-slim

[ . . . a lot of other output . . . ]

Step 27/28 : ENTRYPOINT [ "/opt/bin/entrypoint.py" ]
 ---> Running in ca2eb595db0f
Removing intermediate container ca2eb595db0f
 ---> 28ffbf8232dd
Step 28/28 : CMD [ "--help" ]
 ---> Running in 3d268a339452
Removing intermediate container 3d268a339452
 ---> 64a7eb830096
Successfully built 64a7eb830096
Successfully tagged disdat-examples:latest
----- Built Docker image for the disdat-examples pipeline on python-3.6.8-slim

Disdat builds one container per git repository. Thus one container can be used to run all the pipelines you define in that repository.

Check to see if your container is now registered with Docker

$docker images
REPOSITORY                        TAG                 IMAGE ID            CREATED             SIZE
disdat-examples                   latest              64a7eb830096        2 minutes ago       763MB
disdat-python-3.6.8-slim-python   latest              79e5b4a09c12        3 minutes ago       687MB
disdat-python-3.6.8-slim          latest              810c82af94c8        3 minutes ago       412MB
python                            3.6.8-slim          73ba0dc9fc6c        7 months ago        138MB

Note: Disdat names of your container based on the name field in your setup.py file.

from setuptools import setup, find_packages


setup(
    name='disdat-examples',
    version='0.0.1rc0',

    packages=find_packages(),
    include_package_data=True,

    install_requires=[
        'disdat>=0.8.16',
        'jupyter',
        'pandas'
    ]
)

Pro Tip: By default we build containers with based on Python 3.6.8-slim. If you're in desperate need of Python 2.7, something has gone wrong with your dev process. If you're in desperate need for Python 3.6.8 +, then you'll need to make a PR to the project for another version of slim that looks like this file.

Last updated