Skip to end of metadata
Go to start of metadata

Introduction

This is a quick introduction to using SWIFT on OSG Connect. SWIFT is a parallel scripting language that lets you easily incorporate workflows using different applications and convert them into a simple script file.  The SWIFT runtime takes this script and tries to run as much of the workflow in parallel as possible. Once you finish using the exercises on this page you will be able to create workflows in SWIFT.

Conventions

The following conventions are used throughout this document: 

  • All of the following exercises must be done using the BASH shell
  • The setup.sh in the tutorial directory is sourced before running any of the tutorial:

    Cleanup example
  • Each part of the exercises below is located in a separate directory (e.g. part01, part02, ...)
  • To cleanup the directory and remove all outputs, after running SWIFT, run in the exercise directory

    Cleanup example

Introductory Exercises

Scripts

The introductory exercises use two different mock "science applications" that act as simple stand-ins for real since applications.

simulate.sh

The simulation.sh script is a simple substitute for a scientific simulation application. It generates and prints a set of one or more random integers in the range 0-29,999 as controlled by its optional arguments.

 See arguments
$ ./app/simulate.sh --help ./app/simulate.sh: usage: -b|--bias offset bias: add this integer to all results [0] -B|--biasfile file of integer biases to add to results [none] -l|--log generate a log in stderr if not null [y] -n|--nvalues print this many values per simulation [1] -r|--range range (limit) of generated results [100] -s|--seed use this integer [0..32767] as a seed [none] -S|--seedfile use this file (containing integer seeds [0..32767]) one per line [none] -t|--timesteps number of simulated "timesteps" in seconds (determines runtime) [1] -x|--scale scale the results by this integer [1] -h|-?|?|--help print this help
ArgumentDescription

-b,--bias

offset bias: add this integer to all results [0]

-B,--biasfile

file of integer biases to add to results [none]

-l,--log

generate a log in stderr if not null [y]

-n,--nvalues

print this many values per simulation [1]

-r,--range

range (limit) of generated results [100]

-s,--seed

use this integer [0..32767] as a seed [none]

-S,--seedfile

use this file (containing integer seeds [0..32767]) one per line [none]

-t,--timesteps

number of simulated "timesteps" in seconds (determines runtime) [1]

-x,--scale

scale the results by this integer [1]

-h,-?,?,--help

print this help

With no arguments, simulate.sh prints 1 number in the range of 1-100. Otherwise it generates n numbers of the form R * scale + bias where R is a random integer. By default it logs information about its execution environment to stderr. Here’s some examples of its usage:

Running simulate application  Expand source

stats.sh 

The stats.sh script serves as a trivial model of an "analysis" program. It reads N files each containing M integers and simply prints the average of all those numbers to stdout. Similarly to simulate.sh it logs environmental information to the stderr.

Running stats script  Expand source

Part 1 - Run an application under Swift

The first swift script, p1.swift, runs simulate.sh to generate a single random number. It writes the number to a file.  The control flow for the swift script is shown below.

In the p1.swift file below, the app construct is used to tell SWIFT how to use the simulate.sh script and what the script expects for its inputs and outputs. The simulate application gets translated to simulate.sh using the apps file for the mapping. The file construct to indicate the name of the file used to hold the output from the simulate script.

p1.swift  Expand source

To run this script:

p1.swift

Now, view the output:

Output from simulate

Part 2 - Running an ensemble of many apps in parallel with "foreach" loops

The p2.swift script introduces the foreach loop to run multiple instances of the simulate script in parallel. Output files are named using the file mapper so each instance writes to output/sim_N.out.

The control flow and source for the script are shown below:

p2.swift  Expand source

To run:

 

Running p2.swift

Output files will be named output/sim_N.out:

 

Output of p2.swift

 

Part 3 - merging/reducing the results of a parallel foreach loop

The p4.swift script introduces a postprocessing step. After all the parallel simulations have completed, the files created by simulation.sh will be averaged by stats.sh.  The @filenames functions is used to store filenames being transfer outputs from the simulate scripts to the stats script. The control flow and source for the SWIFT script follows:

p3.swift  Expand source
To run:

 

Running p3.swift

The output will be named output/average.out.

 

Output of p3.swift

Part 4 - Running a parallel ensemble on OSG Connect resources

This is the first script that will submit jobs to OSG through OSG connect. It is similar to earlier scripts, with a few minor exceptions. To generalize the script for other types of remote execution (e.g., when no shared filesystem is available to the compute nodes), the application simulate.sh will get transferred to the worker node by Swift, in the same manner as any other input data file. The control flow and source for the SWIFT script follows:

 

p4.swift  Expand source
To run:

 

Running p4.swift

The output will be named output/sim_N.out.

Output of p2.swift
SWIFT uses parameters in the sites.xml file to determine parameters to use when submitting jobs to HTCondor.  The key values to note are the pool element which sets the name for the pool that SWIFT submits to.  In addition, there is a condor.+ProjectName value that needs to be set to the project that you're submitting as.

 

sites.xml  Expand source

The other file that SWIFT uses is the apps file.  This file lays out the mappings between applications used in the SWIFT script files and the actual binaries for each pool.  E,g:

apps

 

Part 5 - Linking applications together on OSG-Connect

The p5.swift introduces a postprocessing step. After all the parallel simulations have completed, the files created by simulation.sh will be averaged by stats.sh. This is similar to p3, but all app invocations are done on remote nodes with Swift managing file transfers.  The workflow and source follow:

p5.swift  Expand source

 

To run:

 

Running p4.swift

The output will be named output/stats.out.

Output of p2.swift

Part 6 - Specifying more complex workflow patterns

The p6.swift script builds on p5.swift, but adds new apps for generating a random seed and a random bias value.  The script's control flow and source is shown below:


p6.swift  Expand source

To run:

 

Running p6.swift

The output will be named output/stats.out.

Verifying output

 

Further information and refernces

  • No labels