This is a quick introduction to using SWIFT on OSG Connect. SWIFT is a parallel scripting language that lets you easily incorporate workflows using different applications and convert them into a simple script file. The SWIFT runtime takes this script and tries to run as much of the workflow in parallel as possible. Once you finish using the exercises on this page you will be able to create workflows in SWIFT.
The following conventions are used throughout this document:
- All of the following exercises must be done using the BASH shell
The setup.sh in the tutorial directory is sourced before running any of the tutorial:
- Each part of the exercises below is located in a separate directory (e.g. part01, part02, ...)
To cleanup the directory and remove all outputs, after running SWIFT, run in the exercise directory
The introductory exercises use two different mock "science applications" that act as simple stand-ins for real since applications.
The simulation.sh script is a simple substitute for a scientific simulation application. It generates and prints a set of one or more random integers in the range 0-29,999 as controlled by its optional arguments.
$ ./app/simulate.sh --help ./app/simulate.sh: usage: -b|--bias offset bias: add this integer to all results  -B|--biasfile file of integer biases to add to results [none] -l|--log generate a log in stderr if not null [y] -n|--nvalues print this many values per simulation  -r|--range range (limit) of generated results  -s|--seed use this integer [0..32767] as a seed [none] -S|--seedfile use this file (containing integer seeds [0..32767]) one per line [none] -t|--timesteps number of simulated "timesteps" in seconds (determines runtime)  -x|--scale scale the results by this integer  -h|-?|?|--help print this help
With no arguments, simulate.sh prints 1 number in the range of 1-100. Otherwise it generates n numbers of the form R * scale + bias where R is a random integer. By default it logs information about its execution environment to stderr. Here’s some examples of its usage:
The stats.sh script serves as a trivial model of an "analysis" program. It reads N files each containing M integers and simply prints the average of all those numbers to stdout. Similarly to simulate.sh it logs environmental information to the stderr.
Part 1 - Run an application under Swift
The first swift script, p1.swift, runs simulate.sh to generate a single random number. It writes the number to a file. The control flow for the swift script is shown below.
In the p1.swift file below, the app construct is used to tell SWIFT how to use the simulate.sh script and what the script expects for its inputs and outputs. The simulate application gets translated to simulate.sh using the apps file for the mapping. The file construct to indicate the name of the file used to hold the output from the simulate script.
To run this script:
Now, view the output:
Part 2 - Running an ensemble of many apps in parallel with "foreach" loops
The p2.swift script introduces the foreach loop to run multiple instances of the simulate script in parallel. Output files are named using the file mapper so each instance writes to output/sim_N.out.
The control flow and source for the script are shown below:
Output files will be named output/sim_N.out:
Part 3 - merging/reducing the results of a parallel foreach loop
The p4.swift script introduces a postprocessing step. After all the parallel simulations have completed, the files created by simulation.sh will be averaged by stats.sh. The @filenames functions is used to store filenames being transfer outputs from the simulate scripts to the stats script. The control flow and source for the SWIFT script follows:
The output will be named output/average.out.
Part 4 - Running a parallel ensemble on OSG Connect resources
The output will be named output/sim_N.out.
The other file that SWIFT uses is the apps file. This file lays out the mappings between applications used in the SWIFT script files and the actual binaries for each pool. E,g:
Part 5 - Linking applications together on OSG-Connect
The p5.swift introduces a postprocessing step. After all the parallel simulations have completed, the files created by simulation.sh will be averaged by stats.sh. This is similar to p3, but all app invocations are done on remote nodes with Swift managing file transfers. The workflow and source follow:
The output will be named output/stats.out.
Part 6 - Specifying more complex workflow patterns
The p6.swift script builds on p5.swift, but adds new apps for generating a random seed and a random bias value. The script's control flow and source is shown below:
The output will be named output/stats.out.