It is important to know how to transfer the data between login.osgconnet.net and the remote worker node. Because the input/output files are transferred to the worker machine for running a job. The file transfer to the remote machine are accomplished via HTTP, condor transfer or skeleton key.
HTCondor has a built-in mechanism to transfer binaries and files to and from compute nodes. If users have relatively small amounts of data and binaries to transfer (<100MB) or needs to do ad-hoc job submissions, then this mechanism can be effective.
Before getting started, users should login to login01.osgconnect.org and get a copy of the tutorial files:
Word Distribution Example
This example will use the HTCondor transfer mechanisms to transfer a binary (distribution) and a file with a list of words (random_words) to compute nodes that are running the jobs. The condor file that will be used is shown below:
The key parts of the submit file are the
transfer_input_files parameter that gives a comma separated list of paths to the files that will be transferred. In addition,
ShouldTransferFiles needs to be set to YES and
when_to_transfer_output needs to be set to
ON_EXIT in order to make sure that the HTCondor will return the output.
Finally, change submit file to by replacing PROJECT_NAME with the appropriate value before submitting the file:
When the jobs are completed, verify the output:
Before getting started, users should login to login01.osgconnect.net and get a copy of the tutorial files:
Making data accessible over HTTP
All user accounts on OSG-Connect have a directory that is automatically web accessible. This directory is located at ~/data/public. To make a file or directory accessible, copy it to this directory or a subdirectory of this directory and give files permissions of 644 and directories permissions of 755. E.g. :
Accessing data from stash over HTTP within jobs
The final part of this section covers getting data within stash to jobs running on OSG using HTTP access. This example will show the user how to access stash over HTTP within jobs. The primary component of this example is the shell script that is run on the compute node. It downloads the random_words data file and then generates a histogram with the most common words found in the file. Before running this example, app_script.sh needs to be edited to replace username with the user's username:
Next edit the application/application.submit file and replace PROJECT_NAME with the appropriate project name:
Once that change has been made, submit the file:
Once the jobs are completed, users can look at the output in the logs directory and verify that the job ran correctly: