Contents
Skip to end of metadata
Go to start of metadata

Overview

Suppose you have access to a cluster at your home institution. You normally login to that cluster and submit jobs to its local queue with your local identity.  You can connect this cluster and local identity to OSG Connect. Once you have done that you can submit and monitor jobs in the normal way, using condor_submit and condor_q from login.osgconnect.net.

To make this possible OSG Connect uses the OSG BOSCO technology. The remote clusters connected this way are called BOSCO resources and login.osgconnect.net is your BOSCO submit node. When you connect a BOSCO resource only you will be able to send jobs to that resource and the jobs will run there using your identity, exactly like when you ssh and submit the jobs there.

Requirements

The BOSCO resource you want to connect must satisfy the following requirements:

  • You must be able to ssh to it (using username/password or ssh keys)
  • The resource must run a BOSCO supported platform (currently RHEL5, RHEL6, Debian 6, or any of their derivative, all in the 64bit version)
  • The resource must run one of the following queue managers:
    • PBS flavors (Torque and PBSPro)
    • HTCondor (7.6 or later)
    • SGE (Sun Grid Engine)
    • LSF
    • SLURM (with Torque/PBS command wrappers installed)

Connecting your resource

If the requirements above are satisfied you can go ahead and connect the resource.

You need to know the following and replace them in the commands below:

  • USER_NAME,  your user name on the BOSCO resource, a.k.a. login name
  • HOSTNAME.DOMAIN, the host name of the BOSCO resource, a.k.a the fully qualified host name 
  • QUEUE_MGR, the queue manager used on the BOSCO resource. This is the program used to submit jobs on the BOSCO resource. If you don't know what this is please ask the administrator of that cluster. For HTCondor use condor, for LSF use lsf, for SGE (or Open Grid or other Grid Engine versions) use sge, for PBS  (Torque and PBSPro) or SLURM use pbs

Type:

bosco_cluster -a USER_NAME@HOSTNAME.DOMAIN QUEUE_MGR

Then BOSCO may ask you to add the host key to the known hosts (answer yes) and will prompt you to enter the password that you normally enter when you login on HOSTNAME.DOMAIN. This setup process will add some files in your home directory on login.osgconnect.net and add a bosco folder on the home directory of the BOSCO resource. Once the setup process completes BOSCO will print two lines that you have to add to your submit files to send jobs to this resource (see below). Take note of these two lines.

Example of running bosco_cluster -a  Expand source
Icon

You must be able to login to the remote cluster. If password authentication is OK, the script will ask you for your password. If key only login is allowed, then you must load your key in the ssh-agent. Here is an example adding the key and testing the login:

 Expand source
Icon

Some clusters have multiple login nodes behind a round robin DNS server. You can recognize them because when you login to the node (e,g: ssh login.mydomain.org), it will show a name different form the one used to connect (e.g.: hostname -f will return login2.mydomain.org). If this happens you must add the BOSCO resources by using a name of the host, not the DNS alias (e.g. bosco_cluster --add login2.mydomain.org). This is because sometime these multiple login nodes do not share all the directories and BOSCO may be unable to find its files if different connections land on different hosts :

 Expand source

For further options check the BOSCO manual.

You can connect multiple resources to your account. You can always list the BOSCO resources currently connected to your account using bosco_cluster -l. And you can remove a BOSCO resource by using bosco_cluster -r USER_NAME@HOSTNAME.DOMAIN .

Submitting jobs to your resource

Jobs will not go automatically to your BOSCO resources, you have to send them there explicitly. To submit jobs to your connected BOSCO resources you have to replace the universe = vanilla line in the submit file with the two lines suggested at the end of the connection setup (see above). And you can add also an optional line to specify the name of the queue. These three lines will look like:

 

universe = grid
grid_resource = batch QUEUE_MGR USER_NAME@HOSTNAME.DOMAIN
batch_queue = QUEUE_NAME

The example below is very similar to the quickstart example, the main difference are the first two lines in the submit file. As for the quickstart example you need the short.sh and the log directory, you can get them setting up the quickstart tutorial: $ tutorial quickstart; $ cd ~/osg-quickstart  Then edit tutorial01 to look like the example below: 

Icon

 In the example remember to replace QUEUE_MGR, USER_NAME and HOSTNAME.DOMAIN with the ones you just connected!

Example of submit file  Expand source

Most options and techniques used in the submit file can be used also in these jobs. An exception is that you cannot use the requirements attribute to select resources because you send the job explicitly to the BOSCO resource. See the BOSCO manual for ways to pass custom submit properties and modifying the maximum number of submitted jobs to a resource.

When you type condor_q these jobs will appear together with all your other jobs running on OSG Connect resources.

 

  • No labels