Contents
Skip to end of metadata
Go to start of metadata

Overview

In this tutorial, we will explore how pegasus simplifies the docking of several ligands for a given protein. Once a pegasus workflow is established for a sample case, we could re-use the workflow for docking calculations of different ligand libraries and proteins.  In the long run, the pegasus workflow approach saves a lot of time in adding ligands to library or set  up calculations for set of receptors and ligands. 

Docking with library of ligands

We want to screen a library of ligands using Autodock-Vina and pegasus workflow manager. The necessary files to run the example are available to you via tutorial command. 

 

 

Let us take a  look at the files inside the directory "tutorial-pegasus-vina".  The following are vina input files. 

The ligands are listed inside  "input_ligands" directory.

 

The following files are related to pegasus workflow management

 

The file "pegasusrc" contains the pegasus configuration information. We can simply keep this file in the current working directory without worrying much about the details ( If you would like to know the details, please visit the pegasus home page). The files - dax.xml and sites.xml contain the information about the work flow and data management. 

Submit script

Let us pay attention to few parts of the "submit" script to understand about submitting the workflow.  Open the file "submit" and take a look

sites-generator  

The purpose of sites-generator script is to generate the sites.xml file. There are several lines declared in the sites-generator script. We need to understand the lines defining the scratch and output directories. 

The files  "submit.bash"  and "sites-generator.bash" will not change very much for a new workflow.  We need to edit these two files,  when we change the name of the dax-generator and/or the path of outputs, scratch and workflows.

DAX generator 

The file - dax.xml contains the workflow information, including the description about the jobs and required input files. We could manually write the dax.xml file but it is not very pleasant for the human eye to deal with the xml format. Here, dax.xml is generated via the python script "dax-generator-singleJob.py".  Take a look at the python script, it is self explanatory with lots of comments.  If you have difficulty to understand the script, please feel free to send us an email. Here is the brief description about dax-generator python script.

Job submission and status

To submit the job

To check the status of the submitted job

Pegasus creates the following directories

The path of the scratch, workflows and outputs directories are declared in the "submit" scripts at lines 19, 20, 25,26 and 47. 

Key points

  • Pegasus requires dax.xml, sites.xml and pegasusrc files. These files contain the information about executable, input and output files and the relation between them while executing the jobs.
  • It is convenient to generate the xml files via scripts. In our example, dax.xml is generated via python script and sites.xml is generated via bash script.
  • To implement a new workflow, edit the existing dax-generator, sites-generator and  submit scripts.  In the above examples, we modified the workflow for the single NAMD job to implement the workflows of N-sequential and M-parallel, N-sequential jobs.

References

  1. Pegasus documentation

  2. OSG Connect QuickStart

 

For further assistance or questions, please email connect-support@opensciencegrid.org.

  • No labels