Setup and Configuration
The majority of Disdat configuration has to do with using AWS resources (S3 for storing data contexts and bundles, and AWS Batch for running Disdat container pipelines). Disdat has its own configuration file as well, and that is covered below.
Setting up for AWS
Disdat uses AWS s3 as its "backing store" for each data context (and its bundles). If you want to create remotes for your local contexts, and then push and pull bundles to and from S3, you'll need to set up AWS. First you need to have an AWS account and then set up your AWS credentials.
Optional but useful for setting up credentials in 2. Install the AWS CLI in your Python virtual environment via
pip install awscliPlace your AWS credentials in your
~/.aws/credentialsfile (AWS instructions).
Setting up Docker
Disdat can take your pipeline and create a Docker container in which it runs. To do so (and to be able to run dsdt dockerize .) you need to install Docker on your system.
Mac: Install Docker Desktop
Unix (Ubuntu): Install via apt (instructions)
Disdat Configuration File
Disdat stores its configuration in ~/.config/disdat/disdat.cfg Running dsdt init sets up and creates this configuration file for you.
You don't need to bother with the configuration file unless you're going to Dockerize and then submit jobs to AWS Batch or Sagemaker. And if you do then you should:
Set a
repository_prefix. This will be a prefix for your ECR docker images and AWS Batch job descriptions.Does your pipeline need custom packages? If so, point it a pip.conf
Set
aws_batch_queueto the name of your AWS Batch queue.
Developer Install:
Note in this case we're going to clone the repo and assume you have already activated your virtual environment. We then do an *editable* installation of the Disdat repo. We finally run a bash script that creates some files that are necessary if you wish to use the Disdat Dockerizer.
Last updated
Was this helpful?