Contents
Skip to end of metadata
Go to start of metadata

Overview

In this module we demonstrate job submission to the OSG-Connect environment from local campus resources with BOSCO. Specifically in this example I will use RCC resources (Midway), a centrally managed organization providing research computing services and resources to University of Chicago researchers. You can follow this tutorial using any local cluster as long as it is a platform supported by BOSCO. This will allow you to manage jobs running in both environments (Local Cluster and OSG-Connect) from one host.

Here is a diagram showing a picture of how the different resources are connected when you install BOSCO on your Local Cluster:

 

Icon

Contact your local administrators on your campus to get access to a local cluster. In the examples below substitute cnetid with your actual user ID and substitute midway-login1.rcc.uchicago.edu with the host name (fully qualified domain name, including the domain) used to login into your cluster.

If you are at the University of Chicago you can request a Midway account, see the RCC new user guide. In the examples below substitute cnetid with your actual CNetID

You have to be able to login on your cluster to do this tutorial. Stop here and ask for an account if you don't have any.

 

 

Log into your Local Cluster (Midway) account:

ssh midway.rcc.uchicago.edu

For the remaining of this tutorial we assume that you are running on one of Midway login nodes.

Install and configure BOSCO

The following example is from Midway (midway-login1.rcc.uchicago.edu). Any Midway login node will be the same.

  1. Download the BOSCO installer package
    [cnetid@midway-login1 ~]$ curl -o bosco_quickstart.tar.gz ftp://ftp.cs.wisc.edu/GET_THE_URL_FROM_THE_PAGE/bosco_quickstart.tar.gz 
     Click here to see the output...
    [cnetid@midway-login1 ~]$
     curl -o bosco_quickstart.tar.gz ftp://ftp.cs.wisc.edu/condor/bosco/1.2/bosco_quickstart.tar.gz 
    % Total % Received % Xferd Average Speed Time Time Time Current
                               Dload Upload Total Spent Left Speed
    100 2551 100 2551 0 0 4398 0 --:--:-- --:--:-- --:--:-- 10585
    • NOTE: If you have no curl you can use wget to download the file: wget -O ./bosco_quickstart.tar.gz ftp://ftp.cs.wisc.edu/GET_THE_URL_FROM_THE_PAGE/bosco_quickstart.tar.gz 

  2. Untar the bosco_quickstart script: 
    [cnetid@midway-login1 ~]$ tar xvzf ./bosco_quickstart.tar.gz 

     

  3. Run the quickstart script.
    [cnetid@midway-login1 ~]$ bosco_quickstart
    
    • When prompted "Do you want to install Bosco? Select y/n and press [ENTER]:" press "y" and ENTER.
    • When prompted "Type the cluster name and press [ENTER]:" type login01.osgconnect.net and press ENTER.
    • When prompted "Type your name at login01.osgconnect.net (default YOUR_USER) and press [ENTER]:" enter your username on OSG-Connect and press ENTER.
    • When prompted "Type the queue manager for login01.osgconnect.net (pbs, condor, lsf, sge, slurm) and press [ENTER]:" enter condor and press ENTER.
    • Then when prompted "user@login01.osgconnect.net's password:" enter your OSG-Connect password. 

       

      Installation output  Expand source
  4. Setup the environment
    [cnetid@midway-login1 ~]$ source ~/bosco/bosco_setenv 

     

  5. BOSCO has been started for you but in the future you may need to restart it with:
    [cnetid@midway-login1 ~]$ bosco_start
    BOSCO Started


    At this point, submission to login01.osgconnect.net, which gets to the full OSG-Connect environment is now ready. The BOSCO services will remain running even if you log out unless explicitly shut down.

Each time setup the BOSCO environment

Each time you login or start a new shell setup the environment and invoke bosco_start (bosco_start is a no-op if the services are already running):

$ source ~/bosco/bosco_setenv
$ bosco_start
BOSCO Started

Create a tutorial directory

Create a new directory to run this tutorial and the log directory for the jobs:

$ mkdir -p tutorial-bosco/log
$ cd tutorial-bosco

Submit a job to OSG-Connect

Now run a simple job, like the Job 1 of the Quickstart tutorial . The workload is the same, the submit description file will be slightly different.

Create a workload

Inside the tutorial directory that you created or installed previously, let's create a test script to execute as your job (remember to make the script executable!):

$ vi short.sh
$ chmod +x short.sh

Here is the content of short.sh:

 


#!/bin/bash
# short.sh: a short discovery job

printf "Start time: "; /bin/date
printf "Job is running on node: "; /bin/hostname
printf "Job running as user: "; /usr/bin/id

echo "Environment:"
/bin/env | /bin/sort

echo "Dramatic pause..."
sleep ${1-15}    # Sleep 15 seconds, or however much we're told to sleep
echo "Et voila!"


 

Create a condor submit file:

The next step is to create a submission file for the job.

$ vi bosco01.sub

Here is the bosco01.sub content, configured to use a special project name on login01.osgconnect.net. This is a general purpose project name and you are encouraged to use one of the projects that you are member of. You can see the projects you are member of by using theosgconnect_show_projects command. This is very nearly the minimal content of a submission file.
Note that differently from the previous examples, now the Universe of the job is now grid. This tells BOSCO to run the job on the resource added during the setup.

file: bosco01.sub

 

########################
# Submit description file for short test program
########################
Universe       = grid
Executable     = short.sh
Error   = log/job.err.$(Cluster)-$(Process)
Output  = log/job.out.$(Cluster)-$(Process)
Log     = log/job.log.$(Cluster)
+ProjectName="OSG-Connect"
Queue 1

 

Remember: Replace 'cnetid' with your own!

 

Fully commented submit file  Expand source

 

Submit the job using condor_submit.

$ condor_submit bosco01.sub
Submitting job(s).
1 job(s) submitted to cluster 2.

Icon

Note the "submitted to cluster 2": if you did a fresh installation of BOSCO the ID of the job group you've created is 2 (1 was an automatic test job). You'll use this id for monitoring the status of your jobs.

Check job status

 

Check job status

The condor_q command tells the status of currently running jobs. Generally you will want to limit it to your own jobs:

$ condor_q

-- Submitter: midway-login1 : <128.135.158.243:9618?sock=9147_a86e_2> : midway-login1
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
2.0      cnetid           9/4  10:21   0+00:00:00 I  0   0.0  short.sh

1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
 
Icon

Note that condor_q lists only your jobs even without specifying the cnet id. Only you can submit to your BOSCO, it is your personal HTCondor installation.

 

Submit a job to the Local Cluster (Midway)

BOSCO by default submits to the user@login01.osgconnect.net resource added in the setup above (grid_resource = batch condor user@login01.osgconnect.net). To submit to the local queue (e.g. on Midway), you must add the line grid_resource = batch QUEUE_TYPE to the submit file. QUEUE_TYPE can be: pbs (for PSB and SLURM), sge (for Grid Engine) or condor (for HTCondor). On Midway I will use grid_resource = batch pbs, because Midway uses SLURM with PBS emulation.
Now edit bosco02.sub:

vi bosco02.sub

Here is the bosco02.sub content, it is very similar to bosco01.sub. The differences are that there is no line about the Project Name (this would be ignored on Midway or your Local Cluster) and there is the line grid_resource = batch pbs to submit to the local queue:

bosco02.sub

########################
# Submit description file for short test program running on OSG-Connect
########################
Universe       = grid
grid_resource = batch pbs
Executable     = short.sh
Error   = log/job.err.$(Cluster)-$(Process)
Output  = log/job.out.$(Cluster)-$(Process)
Log     = log/job.log.$(Cluster)
Queue 1

Note BOSCO assumes PBS emulation for SLURM, Midway's job manager. To submit and inspect the job repeat the steps in the previous section using bosco02.sub in place of bosco01.sub:

$ condor_submit bosco02.sub
Submitting job(s).
1 job(s) submitted to cluster 3.

Submit more jobs to OSG-Connect and Midway

This example submits 20 jobs each to both OSG-Connect and Midway. To ease the observation of the job we'll increase the sleep time to 40 seconds.

  1. Edit both submit files bosco01.sub and bosco02.sub and add the line Arguments = 40 and change the last line to Queue 20:
    $ vi bosco01.sub
    $ vi bosco02.sub 
     Click here to see the new submit files...

    bosco01.sub:

    ########################
    # Submit description file for short test program running on OSG-Connect
    ########################
    Universe       = grid
    Executable     = short.sh
    Arguments = 40
    Error   = log/job.err.$(Cluster)-$(Process)
    Output  = log/job.out.$(Cluster)-$(Process)
    Log     = log/job.log.$(Cluster)
    +TutorialJob=true
    +AccountingGroup = "group_friends"
    Queue 20
    

    bosco02.sub:

    ########################
    # Submit description file for short test program running on OSG-Connect
    ########################
    Universe       = grid
    grid_resource = batch pbs
    Executable     = short.sh
    Arguments = 40
    Error   = log/job.err.$(Cluster)-$(Process)
    Output  = log/job.out.$(Cluster)-$(Process)
    Log     = log/job.log.$(Cluster)
    +TutorialJob=true
    +AccountingGroup = "group_friends"
    Queue 20
  2. Submit both sets of 20 jobs:
    [cnetid@midway-login1 tutorial-bosco]$ condor_submit bosco01.sub
    Submitting job(s)....................
    20 job(s) submitted to cluster 5.
    [cnetid@midway-login1 tutorial-bosco]$ condor_submit bosco02.sub
    Submitting job(s)....................
    20 job(s) submitted to cluster 6.
    
  3. Watch the jobs go through the queue by using watch -n2 condor_q -grid. The -grid option changes the format of condor_q and provides more information about where the jobs run.
    $ condor_q -grid
    
    -- Submitter: midway-login1 : <128.135.112.71:11002?sock=21373_27a9_3> : midway-login1
     ID      OWNER             STATUS     GRID->MANAGER    HOST       GRID_JOB_ID
       5.0   cnetid            IDLE       batch->mmb@uc3-sub.uchicago /86277//
       5.1   cnetid            IDLE       batch->mmb@uc3-sub.uchicago /86273//
       5.2   cnetid            IDLE       batch->mmb@uc3-sub.uchicago midway-login1_11
       5.3   cnetid            IDLE       batch->mmb@uc3-sub.uchicago midway-login1_11
       5.4   cnetid            IDLE       batch->mmb@uc3-sub.uchicago /86275//
       5.5   cnetid            IDLE       batch->mmb@uc3-sub.uchicago midway-login1_11
       5.6   cnetid            IDLE       batch->mmb@uc3-sub.uchicago midway-login1_11
       5.7   cnetid            IDLE       batch->mmb@uc3-sub.uchicago /86272//
       5.8   cnetid            IDLE       batch->mmb@uc3-sub.uchicago /86278//
       5.9   cnetid            IDLE       batch->mmb@uc3-sub.uchicago midway-login1_11
       5.10  cnetid            IDLE       batch->mmb@uc3-sub.uchicago /86271//
       5.11  cnetid            IDLE       batch->mmb@uc3-sub.uchicago /86276//
       5.12  cnetid            IDLE       batch->mmb@uc3-sub.uchicago midway-login1_11
       5.13  cnetid            IDLE       batch->mmb@uc3-sub.uchicago /86269//
       5.14  cnetid            IDLE       batch->mmb@uc3-sub.uchicago /86274//
       5.15  cnetid            IDLE       batch->mmb@uc3-sub.uchicago midway-login1_11
       5.16  cnetid            IDLE       batch->mmb@uc3-sub.uchicago midway-login1_11
       5.17  cnetid            IDLE       batch->mmb@uc3-sub.uchicago /86270//
       5.18  cnetid            IDLE       batch->mmb@uc3-sub.uchicago midway-login1_11
       5.19  cnetid            IDLE       batch->mmb@uc3-sub.uchicago midway-login1_11
       6.0   cnetid            IDLE       batch->[?] pbs              /20130522/425873
       6.1   cnetid            IDLE       batch->[?] pbs              /20130522/425874
       6.2   cnetid            IDLE       batch->[?] pbs              midway-login1_11
       6.3   cnetid            IDLE       batch->[?] pbs              /20130522/425873
       6.4   cnetid            IDLE       batch->[?] pbs              /20130522/425874
       6.5   cnetid            IDLE       batch->[?] pbs              midway-login1_11
       6.6   cnetid            IDLE       batch->[?] pbs              midway-login1_11
       6.7   cnetid            IDLE       batch->[?] pbs              /20130522/425873
       6.8   cnetid            IDLE       batch->[?] pbs              midway-login1_11
       6.9   cnetid            IDLE       batch->[?] pbs              midway-login1_11
       6.10  cnetid            IDLE       batch->[?] pbs              /20130522/425873
       6.11  cnetid            IDLE       batch->[?] pbs              /20130522/425874
       6.12  cnetid            IDLE       batch->[?] pbs              midway-login1_11
       6.13  cnetid            IDLE       batch->[?] pbs              /20130522/425873
       6.14  cnetid            IDLE       batch->[?] pbs              /20130522/425874
       6.15  cnetid            IDLE       batch->[?] pbs              midway-login1_11
       6.16  cnetid            IDLE       batch->[?] pbs              midway-login1_11
       6.17  cnetid            IDLE       batch->[?] pbs              /20130522/425873
       6.18  cnetid            IDLE       batch->[?] pbs              midway-login1_11
       6.19  cnetid            IDLE       batch->[?] pbs              midway-login1_11
    

 

Icon

Note that condor_q on your BOSCO installation will list only your jobs. There may be other jobs queued on OSG-Connect but to see them you'll have to login on login01.osgconnect.net and issue condor_q there. Similarly on Midway, qstat will show the jobs from all the users while condor_q shows only the ones submitted from your BOSCO service.

 

Other BOSCO commands

You can check the resources connected to BOSCO:

$ bosco_cluster --list
cnetid@login01.osgconnect.net/condor

You can stop and uninstall BOSCO:

$ source ~/bosco/bosco_setenv
$ bosco_stop
Sending off command to condor_master.
Sent "Kill-Daemon" command for "master" to local master
Stopped HTCondor
BOSCO is now off.
$ bosco_uninstall
Ensuring Condor is stopped...
BOSCO is now off.
Removing BOSCO installation under /home/mmb/bosco
Done

All the HTCondor commands work form BOSCO.  This document contains a detailed description of all the installation options and all the BOSCO commands.

  • No labels