There are some conditions where transferring to the worker node via HTCondor is not practical - for example, when the required input datasets are larger than the local scratch space available on the remote worker node. One way to do this is to create a tarball (a
.tar.gz file) and instruct your job to pull it remotely as secondary payload. This tutorial will show you how to do this using a helper framework named SkeletonKey in conjunction with Parrot and Chirp.
- In the examples used on this page, text in red is being used as a placeholder and will need to be replaced with user specific information (e.g. username )
- Names of servers are denoted using blue text (e.g. login.osgconnect.org)
- Directory or file names are denoted using green text (e.g. ~/my_file)
Before going through the examples, login to login.osgconnect.net and setup a work area:
All of the files that we ask you to type below are present in the tutorial folder, ~/tutorial-chirp-access. You may edit them in place instead of typing; or you can type them fresh to reinforce your experience.
Remote data access
This example will guide you through creating a job that will read and write from a filesystem exported by Chirp. Chirp securely exposes a local filesystem path over the network so that remote jobs can access local data. SkeletonKey is a helper for setting up the secure access.
Create the application tarball
A tarball is a single-file archive of one or more files and folders that can be unpacked into its original form, much like a zip file. Tools for working with tarballs are universal to UNIX/Linux servers, while zip/unzip are perhaps less common.
First, create a new folder to contain your payload. You will then use this folder to create your tarball.
Create a shell script, ~tutorial-stash-chirp/data_app/data_access.sh with the following lines:
Notice the use of the
$CHIRP_MOUNT environment variable in this script. The SkeletonKey helper defines
$CHIRP_MOUNT as the local path to the directory being exported from the Chirp server.
Next, make sure the data_access.sh script is executable and create a tarball:
Then copy the tarball to your public directory in Stash. Ensure that it can be read by anyone:
Note that this makes data_app.tar.gz available via HTTP, at
http://stash.osgconnect.net/+username/data_app.tar.gz. This illustrates the integration of file access in OSG Connect and Stash, and SkeletonKey will make use of this.
Create a job wrapper
Open a file called ~/tutorial-stash-tchirp/data_access.ini and add the following lines, replacing username with the appropriate values. This file is a SkeletonKey configuration profile.
Run SkeletonKey on ~/tutorial-stash-chirp/data-access.ini. This creates a job wrapper named run_job.py — an executable that you will submit to Condor as a job, which performs setup and then invokes your real application.
The job wrapper will virtually mount your ~/stash (/stash/user/username) folder through Parrot and Chirp, and data_access.sh will deposit output there. Even though the job runs locally and is very short, it may take surprisingly long because of the "remote" access setup. In real-world jobs the setup time is negligible compared to the job run time itself.
Verify that the file was written correctly:
The output should match the output given in the example above with the exception of the date and time. Once the output is verified, delete the output file:
Submitting jobs to OSG Connect
Create a file called ~/tutorial-stash-chirp/osg-connect.submit with the following contents. Replacing PROJECT_NAME with an appropriate value:
Submit the job to HTCondor. This will put 100 instances of the job onto the grid, because of the "Queue 100":
Verify that the jobs ran successfully: