Run the Pipeline Container (AWS)

This section will teach you:

  • Configure Disdat with an ECR prefix and the name of your AWS Batch job queue

  • How to execute your container on Batch

  • How to look at your resulting bundles

Have you setup your AWS credentials? You will also need to stand up AWS Batch. That means creating 1.) a compute environment and, 2.) a batch job submission queue.

You then need to add your AWS Batch job queue name to the Disdat configuration file.

We're still assuming your running in (switched into) your examples context that has a remote attached:

$dsdt context
*	examples	[examples@s3://disdat-prod/context]

Push container to ECR

Let's push our container up to AWS ECR. We use the same dockerize command. We say --no-build because you built it in the prior step.

$dsdt dockerize --no-build --push .

You should see a bunch of transfer status updates as Docker moves the container to ECR.

Submit the job to batch

$dsdt run --backend AWSBatch  . pipelines.dependent_tasks.B
Re-using prior AWS Batch run job definition : {'jobDefinitionName': 'kyocum-disdat-examples-job-definition', 'jobDefinitionArn': 'arn:aws:batch:us-west-2:48135127292:job-definition/kyocum-disdat-examples-job-definition:1', 'revision': 1, 'status': 'ACTIVE', 'type': 'container', 'parameters': {}, 'containerProperties': {'image': '41235552.dkr.ecr.us-west-2.amazonaws.com/kyocum/test/disdat-examples', 'vcpus': 2, 'memory': 4000, 'command': [], 'volumes': [], 'environment': [], 'mountPoints': [], 'ulimits': [], 'resourceRequirements': []}}
Job disdat-examples-1579040780 (ID 08c213a6-9fb2-468c-99c1-a72f52534dcd) with definition kyocum-disdat-examples-job-definition:1 submitted to AWS Batch queue disdat-batch-queue

If you log in to your AWS account you should see something like:

Once the job moves to the SUCCEEDED state, then the container has run successfully. If it didn't you can click on the job and follow the clicks through to its CloudWatch logs. Those look like:

Let's grab our results from the remote context

$dsdt pull
Fast Pull synchronizing with remote context test@s3://disdat-prod-111461292-us-west-2/beta/context
Fast pull fetching 4 objects...
Fast pull complete -- thread pool closed and joined.
$dsdt ls -v
NAME                	PROC_NAME           	OWNER   	DATE              	COMMITTED	UUID                                    	TAGS
b                   	B__99914b932b       	root    	01-14-20 14:33:29 	True    	a0ae2be4-d496-4b0f-94ac-6a53d612c0ad
a                   	A__99914b932b       	root    	01-14-20 14:33:29 	True    	4d5a1f57-30fa-4c1a-b2fe-8b23fceb7103

Yay! We pushed a container, ran it up on AWS, and then grabbed all of our results!

Continuing on!

There's a lot more to know about how Disdat manages bundles when it runs pipelines remotely. And how to control the size of the container instance of your job.

  • I need to know more details about building pipelines (using dependencies, return types, etc.)

  • I need more information on running in AWS

  • I need to build containers with unix*, R, or other dependencies

Last updated