In this tutorial, we will explore how pegasus simplifies the workflow of docking with library of ligands. Once a pegasus workflow is established for a sample case, we could re-use the workflow for docking calculations of different ligand libraries and proteins. In the long run, the pegasus workflow approach saves a lot of time in adding ligands to library or set up calculations for set of receptors and ligands.
Docking with library of ligands
We want to screen a library of ligands using Autodock-Vina with pegasus workflow. The necessary files to run the example are available to you via tutorial command.
Let us take a look at the files inside the directory "tutorial-pegasus-vina". The following are vina input files.
The ligands are listed inside "input_ligands" directory.
The following files are related to pegasus workflow management
The file "pegasusrc" contains the pegasus configuration information. We can simply keep this file in the current working directory without worrying much about the details ( If you would like to know the details, please visit the pegasus home page). The files - dax.xml and sites.xml contain the information about the work flow and data management.
Let us pay attention to few parts of the "submit" script to understand about submitting the workflow. Open the file "submit" and take a look
The purpose of sites-generator script is to generate the sites.xml file. There are several lines declared in the sites-generator script. We need to understand the lines defining the scratch and output directories.
The files "submit.bash" and "sites-generator.bash" will not change very much for a new workflow. We need to edit these two files, when we change the name of the dax-generator and/or the path of outputs, scratch and workflows.
The file - dax.xml contains the workflow information, including the description about the jobs and required input files. We could manually write the dax.xml file but it is not very pleasant for the human eye to deal with the xml format. Here, dax.xml is generated via the python script "dax-generator-singleJob.py". Take a look at the python script, it is self explanatory with lots of comments. If you have difficulty to understand the script, please feel free to send us an email. Here is the brief description about dax-generator python script.
Job submission and status
To submit the job
To check the status of the submitted job
Pegasus creates the following directories
The path of the scratch, workflows and outputs directories are declared in the "submit" scripts at lines 19, 20, 25,26 and 47.
- Pegasus requires dax.xml, sites.xml and pegasusrc files. These files contain the information about executable, input and output files and the relation between them while executing the jobs.
- It is convenient to generate the xml files via scripts. In our example, dax.xml is generated via python script and sites.xml is generated via bash script.
- To implement a new workflow, edit the existing dax-generator, sites-generator and submit scripts. In the above examples, we modified the workflow for the single NAMD job to implement the workflows of N-sequential and M-parallel, N-sequential jobs.
Pegasus Documentation Pegasus documentation page.
OSG QuickStart. Getting started with the Open Science Grid (OSG).
Condor Manual. Manual for the high throughput condor (HTCondor) software to schedules the jobs on OSG.
For further assistance or questions, please email email@example.com.