Launchpad
Access
Access to the cluster is restricted. By default, new user accounts do not have access. If you cannot login, contact to request access.
Help
If you need help with launchpad usage, read all of the documentation below, especially the section on troubleshooting. If you still require assistance, send email to . You must be subscribed to the batch-users mailing list in order to post to it.
Background
The Martinos Center High-Performance Compute Cluster is called launchpad. It consists of 127 nodes (around 100 still working). Each node has 8 processors (CPUs) and 56GB of shared virtual memory. By default, when a job is submitted to the cluster, it is allotted 1 CPU and 7GB of vmem. Eight jobs can normally run on each node at once, and a maximum of 1,016 on the cluster at a time. Read through the tutorial, Troubleshooting, the FAQ and advanced info sections. If you still have a question (request access, usage, etc.), send a message to the mailing list batch-users [at] nmr.mgh.harvard.edu.
Quick Start Guide
This tutorial is a guide to help users use the launchpad cluster with a step-by-step example using actual data. This example shows a standard FreeSurfer reconstruction on the sample subject, bert.
Log in to launchpad
Source the current FreeSurfer stable environment and set the SUBJECTS_DIR environmental variable, the same as you normally would if you were to run it locally.
$ source /usr/local/freesurfer/nmr-stable51-env
$ setenv SUBJECTS_DIR /autofs/cluster/freesurfer/subjects/
$ cd $SUBJECTS_DIR
Change the SUBJECTS_DIR path in the example to the path where your actual data is.
In order to run your command, it must be submitted to the scheduler on the batch cluster and queued for execution. The wrapper script, pbsubmit, will do this for you. Here is an example command.
$ pbsubmit -m <username> -c "recon-all -subjid bert -all"
> Opening pbsjob_2
> qsub -V -S /bin/sh -m abe -M <username> -l nodes=1:ppn=1,vmem=7gb -r n /pbs/<username>/pbsjob_2
> 3440112.launchpad.nmr.mgh.harvard.edu
This will submit your command [recon-all] to the batch cluster and send an email to the user when the job begins execution and when it is completed. Change '<username>' in this example, and all subsequent examples, to your actual Martinos username.
As seen in the output above:
- pbsjob_2 is the Job Name
- qsub [...] is the actual command used to send the job to the cluster
- 3440112.launchpad.nmr.mgh.harvard.edu is the Job ID
The job has been sent to the scheduler. If there are any nodes available, the job will begin execution, and you should get an email to inform you. If there are no nodes available, your job will be queued for execution and wait till a node becomes available.
To check on the status of your job:
$ qstat | grep -w <username>
> 3440112.launchpad pbsjob_2 <username> 00:03:05 R default
This display shows all your jobs on the cluster. The columns specifiy:
- Job ID
- Job Name
- Username
- CPU Time
- Status: (R)unning, (Q)ueued, or (H)eld
- Queue - Read this question to specify other priorities
When the job is done, you will get another email to confirm that the job has completed execution. Make sure that the job completed as expected and with an exit status of 0. To do this, either look in the 'status' file, or run jobinfo:
$ cat /pbs/<username>/pbsjob_2.status
> ...
> compute-0-62.local
> running Mon May 14 16:23:22 EDT 2012
> done Tue May 15 10:36:25 EDT 2012 status 0
The job completed on Tuesday May 15 2012 at 10:36am with an exit status of 0. That means the job ran, and completed properly.
Or you can check on the exit status by running jobinfo:
$ jobinfo 3440112
> JOB INFO FOR 3440112:
> Queued on 05/14/2012 16:23:21
> Started on 05/14/2012 16:23:22
> Ended on 05/15/2012 10:36:25
> Run on host compute-0-62
> User is <username>
> Cputime: 17:27:07
> Walltime: 18:12:56
> Resident Memory: 1515836kb
> Virtual Memory: 2429044kb
> Exit status: 0
There are several advantages with this method. The first, is the amount of virtual memory required during execution is listed. If you need to explicitly specify the amount of vmem, please read these questions concerning memory usage in the FAQ section. The second is the comparison between Cputime and Walltime. This will help to determine whether a job is bogged down by I/O or multithreading. Please read these questions concerning this topic in the FAQ section.
If your job exited with a non-zero exit status, then something went wrong. See the Troubleshooting and FAQ sections below for help.
Summary Guidelines
- Never run commands on the launchpad master node. Always submit them as jobs.
- Never ssh directly onto one of the nodes to run a job. However, you may login to a node to check on the job progress and state.
- Any jobs that do large amounts of I/O, such as format conversions should not be run on the cluster unless the data is under the /cluster/ filespace. Even then, limit yourself to 20 such jobs (use the highio queue).
- If the I/O occurs at the start of the job, space out consecutive jobs submissions
- Use /scratch/ for temp files.
- Can help to mitigate I/O issues by writing files to /scratch/ instead of /cluster/ or /space/
- Don't use /tmp/. Can fill up the root partition and crash the node
- All jobs requiring a MATLAB license should be submitted using the matlab queue.
- Each user is allowed to use one instance of any MATLAB toolbox. The user can choose whether to use it either at their own workstation or for a job on the cluster.
- Compile any MATLAB code using deploytool. These jobs aren't bound by any of the above restrictions. Detailed Instructions
- By default jobs are allotted 1 CPU and 7GB of virtual memory, unless expressly set.
- By default jobs have 96 hours walltime (active running time), unless expressly set.
Troubleshooting
Use the 'jobinfo' command shown above. If your job finished with a non-zero exit status, check the log files. After termination of a job, the standard output and standard error from the job are sent to log files in your pbs directory. Read through those files, and search for any helpful exit messages.
$ less /pbs/<username>/pbsjob_2.o3440112
and
$ less /pbs/<username>/pbsjob_2.e3440112
If you are having trouble diagnosing the problem, trying running the command locally. Additionally, you can ssh into 'oct' to test-run a command or compile software before using launchpad. The problem may not be related to launchpad.
If this section does not address or solve your problems, read the "FAQ" section below.
FAQ
How many jobs can I submit at once? and how?
Good question. We generally recommend running a maximum of 100 standard jobs at once (1 CPU and 7GB, which is the default amount of resources allocated). If you require more cores or virtual memory than the default, you can only run 100 jobs worth of computational resources. Read this question for a more detailed explanation. During busy times, consider running only 50 jobs at once.
If you plan to submit dozens or hundreds of jobs that run a command you've never used before, only run a couple at once and wait for them to finish to get a good idea of the computation resources required. Use the 'jobinfo' command to inspect the memory required, and the I/O performed.
If you've run a couple test-cases and are ready to submit hundreds of jobs, use one of the max queues: max10, max20, max50, max75, max100 (See here for usage). These allow for self-imposed limits on the number of jobs running at once.
Lastly, if you plan to run more than 10 jobs that each request more than 2 CPUs and/or 14GB of vmem, please discuss on the batch-users [at] nmr.mgh.harvard.edu mailing list to discuss proper cluster-sharing etiquette.
Why is my job automically cancelled after running for 96 hours?
We restrict jobs to run for 96 hours. After that, they are automatically deleted. If you submitted a job and are worried that it will not complete before the time limit, email batch-users [at] nmr.mgh.harvard.edu and one of the admins can extend the job for you. If you know, before you submit the job, that you will need more time use the 'extended' queue.
$ pbsubmit -q extended .... or $ qsub -q extended ...
This queue imposes a time limit of 192 hours.
I just submitted my job but it is not running, it is in a queued state. Why?
If the cluster is filled up with other jobs and there are no available resources (CPUs or Mem) to execute your job, it will be queued. As soon as another job finishes, opening up a slot, the scheduler will automatically begin execution on the next job in the queue, according to priority.
Someone just submittted 1,000 jobs and the cluster is now full. All I want is to run 1 job, will I have to wait for all their jobs to finish??
Short Answer: No, you won't. The scheduling software uses an algorithm, based on accrued priority, to distribute computing resources to all users of the compute cluster.
Long Answer: No, you won't. Understanding the scheduling algorithm and how priority works will make this clear.
When the compute cluster is full, how does priority work in determining what job is next to execute??
The key to understanding the scheduling algorithm is to use the command 'showq'. All jobs are listed and each is assigned to one of three Scheduler-Queues (S-Queues): Running, Idle or Blocked -- use 'showq -r', 'showq -i', 'showq -b' to see all jobs in each S-Queue, respectively. Note: These are different from batch queues used to submit a job (max100, extended, etc.) and also different from, but correlated to, PBS job states (Running, Queued, Held).
A job will first go to the Blocked S-Queue when submitted. The job will then move to the Idle S-Queue if the following things are true:
- A node that has the resources required for the job exists (even if it is busy)
- There are no user or system Holds on the job (such as interjob dependencies)
- The limits for the user and batch queue used for the job have not been reached (such as max 20 jobs on highio queue)
- The user does not already have four jobs in the Idle S-Queue waiting to be run
Once a job is in the Idle S-Queue it will be given a starting priority value according to the batch queue it was submitted in (e.g. 10000 for default). For each minute it remains in the Idle S-Queue it will accrue 10 priority points per minute.
See the Queues answer for info on the starting priority value for each queue.
Only the job with the largest priority value in the Idle S-Queue is eligible to run.
When the requested node resources become available for that job it will be assigned to execute on that node.
This is why we often see the case of a job at the top of the Idle S-Queue that is asking for all 8 CPU slots of a node keeping single CPU slots jobs from running on nodes that have 1-7 free CPU slots.
In this way resources are distributed for all users and jobs, each with different execution needs.
When the compute cluster is full and there is a wait for jobs to execute, run 'showq -i' to observe the status of the Idle S-Queue.
What is the I/O problem?
There are two different components to this issue. The first deals with file access from the launchpad compute cluster to disk space outside the storage cluster (ie - /space/). An example of this is data on a local CNY machine or user workstation. Each request from a cluster node to a CNY workstation funnels through a single 1GB network connection on the launchpad master node. This can affect users in a couple ways. Interactive activity on launchpad appears slower for all users, and the workstation will be affected as it has to process each I/O request (usually multiple at once). We don't want any usage of this kind.
The second form of the I/O problem is large amounts of I/O from the compute cluster to data on the cluster storage (ie - /cluster/). When a series of jobs are all trying to access data on /cluster, the storage servers can be overloaded. This will result in a slowness for all users at the center who try to access data and file paths on /cluster. Through experience and trial-and-error we found that 5 - 20 jobs of this type can run at once before it negatively impacts users depending on the storage location and type of job. For this reason any jobs that fall under this category must be submitted to the 'highio' queue. It sets a limit at only 20 concurrently running jobs on launchpad.
The one exception we allow is if the particular job does all of it's I/O at the very beginning or very end of execution. In this case if you can determine where/when and most importantly, how long the I/O lasts, you can space out the submission of each job with that particular time interval. In this case, when one job's I/O ends, the next job will begin its own I/O; so only one job will have an I/O requirement at one time.
After my job finished, how can I tell if my job used too much I/O or was multithreaded?
After your job finished, run the jobinfo command. Compare the two values for Walltime and CPUtime. Walltime measures from the beginning till the end of execution, in real world time. CPUtime is the amount of time the processor was active during the execution of the command.
If the CPUtime is larger than the Walltime, then the job used multiple processors during execution. These extra CPUs were either requested, or the program used multithreading.
If the Walltime is longer than the CPUtime, then the job was held back for some reason. Usually this is because the job is I/O intensive. If this is the case, we try to limit the total amount of these jobs on the cluster at one time to only 20. Submit the jobs using the 'highio' queue. Too many jobs that require large amounts of I/O overload the cluster and make it slow and unusable for other users. In this case we will delete the jobs that are the most egregious until the load is assuaged.
If my job is running, how can I tell if my job is using too much I/O?
To see the activity during execution, ssh directly onto the node your job is running, and run 'top'. The most important columns are %CPU and just to the left of that is 'S', or Status. They will give a general idea of what the job is currently doing. If the job is listing less than 100% of a CPU, it means that your job is not reaching maximal efficiency on the cluster. The most likely culprit is that it is wasting time in I/O transferring data back and forth between the cluster and external data storage. The other column, 'S', lists the status of your job. Your should be actively running, or in 'R' state. If you see your job in the 'D' state (i.e., uninterruptible sleep), it is using large amounts of I/O.
If the job is using significantly less than 100% of a CPU and is stuck in the 'D' state, it is using too much I/O. We only want a total of 20 of these jobs running on the cluster at once (from all users). Use the 'highio' queue for these jobs. If we notice the cluster performing slowly due to a large load, we won't hesitate to delete any users' jobs that are taxing the cluster too much.
How can I avoid my jobs from using too much I/O?
There are several tips and tricks to avoid using too much I/O.
The first is to make sure all your data is read from or written to /cluster/ storage. This is the best way to prevent I/O bottlenecks from forming.
Use the compute cluster 'tensor'. It is located here in CNY, and isn't subject to the same I/O bottleneck limitations. Usage is the exact same as on launchpad. The nodes are slower and have less memory, but the jobs may actually complete faster because they won't have the same restrictions put on them.
Additionally, you may use shared 'scratch' space. If temporary or intermediate data files need to be read or written throughout execution, use the /cluster/scratch/ or /scratch/ directories. Upon completion you can move/copy any files back to any network mounted storage.
Also space out the submissions by several seconds or minutes, so the jobs aren't all reading and/or writing at the same time. In the case of a FreeSurfer recon, even if the data is on /cluster/ storage, one of the first steps in the pipeline uses heavy I/O. The nu-correct stage reads large amounts of data from a /space/ directory causing a bottleneck. By simply spacing out jobs by a couple of minutes, not all the jobs enter that same stage at the same time, thereby mitigating the I/O load on cluster.
Advanced users can track execution stats including I/O levels with timeplots and gracefully handle instances if the job is killed due to high network loads. Please read this post from our friends at the Neuroinformatics Research Group at Harvard for instructions.
How can I tell if my job requires more processors or memory?
You may not know till you run it. Run the command once as a test case. When
it is completed, use jobinfo to get a summary of the required resources. If
it failed due to insufficient memory, increment your memory request by 7GB at
a time, until it successfully executes. See next question on how to do this.
To track it in real time, you may ssh
directly onto the node the job is running and run 'top'. There should be a
column called 'VIRT'. Never ssh directly onto a node to do anything other
than 'top'. More advanced users can run the command locally, or on oct, with
/usr/bin/time. If you explicitly select the CPUs and vmem, only use what you
need. Taking extra CPUs or vmem wastes resources and blocks other users jobs
from running. Not requesting enough CPUs forces sharing on the node and makes
your job, and all the other jobs on that node to run slower. Know thyself.
What if my job requires multiple processors or more memory?
By default a job is allotted 1 CPU with 7GB of virtual memory. If you are absolutely 100% positive that you need more of either or both you can request more.
$ pbsubmit -n 3 ... or $ qsub -l nodes=1:ppn=3,vmem=21gb ...
Will ask for 3 CPUs with a total of 21GB of virtual memory for your job.
Be aware that if you ask for more than 7GB of memory, you are using up a limited resource on a node. If you submit a job that asks for 1 CPU, but 8-14GB of memory, you are actually using up two-sevenths of the nodes' vmem resource. This one job actually counts as 2 jobs. Keep this in mind. If you plan more than 10 jobs that require more than 2CPUs and/or 14GB of virtual memory, please email the batch-users mailing list to discuss the proper protocol.
How do I select a different job queue to use?
Typically, you should only submit your jobs using the default batch queue (which is used if no queue is specified on submission). However, for special circumstances there are other ones available.
- Routing Queues - Use these "routing" queues if you will submit more than 500 jobs at a time. There is a limit of 500 jobs submitted to an execution queue imposed upon each user, so these "routing" queues were created to accomodate more than 500 submitted jobs. Submit all jobs to one of these routing queues. These queues will manage all jobs and automatically transfer (route) jobs into the execution queues when there is space under the limit.
Routing Queue behavior |
Routing Queue | Route | Execution Queue |
r_default | --> | default |
r_max100 | --> | max100 |
r_max200 | --> | max200 |
r_max500 | --> | max500 |
Execution Queues - The table below summarizes the initial starting priority value and max CPU slots allowed to be running at one time per user for each batch queue.
Queue Name | Starting Priority | Max CPU Slots |
default | 10000 | 150 |
max500 | 8800 | 500 |
max200 | 9400 | 200 |
max100 | 10000 | 100 |
max75 | 10000 | 75 |
max50 | 10000 | 50 |
max20 | 10000 | 20 |
max10 | 10000 | 10 |
p5 | 8800 | 150 |
p10 | 9400 | 150 |
p20 | 10300 | 100 |
p30 | 10600 | 75 |
p40 | 10900 | 50 |
p50 | 11200 | 20 |
p60 | 11500 | 10 |
Jobs accrue 10 priority points per minute in the
Idle S-Queue. For example, it takes 30 minutes for
a default queue job to reach the starting priority of a p20 job.
Therefore, after 30 minutes in the
Idle S-Queue any jobs subsequently submitted in the p20 queue
cannot execute ahead of the default queued job. However, a job submitted to the p30 queue that enters the Idle S-Queue at this time, will still have a higher priority value and execute before the aforementioned default queued job.
The max500 queue is restricted. For permission to use it you need to
email the batch users list to request access and describe why you need it.
I/O Queue - This is new and was created for any user who runs a job that requires high I/O. We used to limit it to 20 jobs per user, but we now limit it to 20 jobs total running on the cluster. Note these are jobs not CPU slots. Any user who knows their jobs use a lot of I/O or we tell them use too much I/O must submit their jobs with this queue.
Matlab Queue - Use this queue to run any Matlab jobs. Since there are a finite number of Matlab licenses for the Center, this queue limits the total number of jobs to 60 and for each user to only 10 at once. Read this post on how to run a Matlab script without using Matlab licenses -- in which case you don't need to use the Matlab Queue
Extended Queue - This queue is for any jobs that require more than 96 hours to complete. The time limit for this queue is 192 hours. Each user is limited to 100 CPU slots running in this queue. Also no more than 250 CPU slots total over all users can be running in this queue. This queue has 9000 starting priority.
GPU Queue - This queue no longer exists. All 16 GPUs have been retired.
For example, to use the max100 queue:
$ pbsubmit -q max100 ... or $ qsub -q max100 ...
How do I run a series of jobs together that each depend on the previous job completing?
The easiest way to do it is to write a parent script that runs a series of programs. Then submit the parent script for execution on the cluster. In C-Shell, the parent script should look like this:
!# /bin/csh -e
Program 1
Program 2
Program 3
The '-e' flag in the script above, specifies that it should exit if any of the individual processes exit with an error (a non-zero exit status). Make the script executable and then submit it for execution.
If this is not sufficient, there are other ways to accomplish this. Slightly more complicated is passing '-W depend=afterok:JobID' in the qsub call. If you have two jobs and the second should only start after the first one has completed execution, you can follow this example:
$ pbsubmit -c "Program1"
> Opening pbsjob_5
> qsub -V -S /bin/sh -l nodes=1:ppn=1,vmem=7gb -r n /pbs/<username>/pbsjob_5
> 70593.launchpad.nmr.mgh.harvard.edu
$ pbsubmit -o "-W depend=afterok:70593.launchpad.nmr.mgh.harvard.edu" -c "Program2"
While the first job is running, the second job remains in a 'Hold' state until the first job properly executes and completes. After that, the dependency for the second job is lifted, and is free to run.
How can I run a Matlab job?
Create your matlab script (i.e. - analyze_data.m) with the last line of the script 'exit'. Make sure it is in your current directory (the one you run pbsubmit from) or your Matlab path.
$ pbsubmit -q matlab -m <username> -c "matlab.new -nodisplay -nodesktop -nojvm -r analyze_data"
This will run your Matlab script on the cluster using the matlab queue (required for any job that uses a Matlab license) and mail you when the job begins and ends. Notice you leave off the ".m" extension in the script name.
We recommend deploying your Matlab code by "compiling" it. This is more efficient for all the Matlab users at the Center. See the question below for instructions on how to do this.
My Matlab job is exiting with license failures. Why?
There are a limited amount of Matlab licenses for use throughout the Martinos Center as well as on the cluster. Because of this restriction, please exit out of any idle Matlab instances and any toolboxes to free up resources for everyone to share. Users are limited to running no more than 10 jobs on the cluster that use a Matlab license and can only run 1 job that uses any toolbox. Use the matlab queue to enforce the 10 job limit.
To see a list of current Matlab licenses in use:
We highly recommend deploying your Matlab code into a stand-alone program. The main advantage of this method is that the program can be exported and run without even needing Matlab. This preserves the finite number of Matlab licenses to users running individual instances of Matlab. Read these step-by-step instructions, written by Jean-Phillipe Coutu. Thanks.
My job is running, but I need to cancel it. Can you help?
You can help yourself. Use the qdel command with the Job ID to cancel that specific job.
$ qdel 3440112.launchpad.nmr.mgh.harvard.edu
or to delete all your jobs:
$ qselect -u <username> | xargs qdel
How do I run an interactive job?
Use qsub to specify an interactive job and allow X11 forwarding:
@launchpad] $ qsub -V -I -X -m b -M <username>
> qsub: waiting for job <JobID>.launchpad.nmr.mgh.harvard.edu to start
> qsub: job <JobID>.launchpad.nmr.mgh.harvard.edu ready
@<node>] $
-:Run Interactive Job Here:-
@<node>] $ exit
> logout
> qsub: job <JobID>.launchpad.nmr.mgh.harvard.edu completed
@launchpad] $
If there is a waiting list on the cluster, append '-q p60' so that you will be given a high priority. Do not submit an interactive job with the default queue if there is a long waiting list. When a slot opens up, you must be able to actively attend to it. When your interactive session begins execution, you will be ssh'ed into the node. Run your interactive job directly in that window. Once the job is done, exit out of the node. Your job will still be considered "running" and thereby occupying a processor until you exit out of the node. So don't submit a job, or leave the job idle after execution, if you can't actively attend to it.
If your problem/question is not addressed in the Troubleshooting or FAQ sections, ask on the batch-users [at] nmr.mgh.harvard.edu. Provide as many details as possible for someone to help diagnose the problem.
Advanced options
Run 'pbsubmit -h' for help.
-l resource_list | : | passes resource list options to qsub |
-q queue | : | select a priority queue |
-o qsub_options | : | passes additional options to the qsub command call |
-O file | : | redirect standard output to <file> |
-E file | : | redirect standard error to <file> |
-T | : | write the script for submission, but doesn't submit job |
Run 'man qsub' for help.
-a datetime | : | specify time for submission of job. datetime in the format [[[[CC]YY]MM]DD]hhmm[.SS] |
-e path | : | set path where standard error file is written |
-h | : | put a hold on the job |
-l resource_list | : | specify the resource requirements of the job. In the form of resource_name=value,resource_name=value,... |
-m options | : | specify the condition(s) where mail is sent to the user. <options> is 'n' for none, or any combination of 'a' for when the job is aborted, 'b' when the job begins execution, and 'e' when the job terminates. |
-M users | : | list of users to send the email updates. |
-N jobname | : | explicitly set the Job Name. Standard output and error files are automatically assigned <jobname>.o<jobid> and <jobname>.e<jobid>. |
-o path | : | set path where standard output file is written |
-q queue | : | set the priority queue for the job |
-r y/n | : | yes or no, whether the job is rerunable. default is y. |
-S shell | : | specify the shell to interpret the script being submitted. defaults to the user's login shell. |
-v variables | : | pass extra environmental variables to the job. in the form variable=value,variable=value,var... |
-V | : | all environmental variables are passed to the batch job. |
-W attribute_list | : | can assign different attributes to the job, including dependencies and synchronizing with other jobids, and set the group. Dependencies in the form of depend=attribute:value,attribute:value,attr.... Group in the form of group_list=value. |
Acknowledgments
If you used launchpad to conduct your published research please include these grant numbers in the article: "Instrumentation Grants 1S10RR023401, 1S10RR019307, and 1S10RR023043."
|