|
|
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
|
|
|
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
|
|
|
# Content
|
|
|
|
|
|
- [Job scheduling with slurm](#job-scheduling-with-slurm)
|
|
|
- [Slurm configuration](#slurm-configuration)
|
|
|
- [Slurm partitions](#slurm-partitions)
|
|
|
- [Interactive access to the nodes](#interactive-access-to-the-nodes)
|
|
|
- [Basic slurm script with an MPI application](#basic-slurm-script-with-an-mpi-application)
|
|
|
- [Other use-cases for slurm:](#other-use-cases-for-slurm)
|
|
|
- [Run a program in the background](#run-a-program-in-the-background)
|
|
|
- [Nohup](#nohup)
|
|
|
- [Screen (recommended)](#screen-recommended)
|
|
|
|
|
|
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
|
|
|
|
|
|
# Job scheduling with slurm
|
|
|
|
|
|
## Slurm configuration
|
|
|
|
|
|
By default and for each Slurm job, a directory named `job.<job_id>` will be created on the `/scratch` directory of each node. You can use it to store additional data. For now, this directory is not deleted at the end of the job, but you need to plan a copy of those data in the /data directory.
|
|
|
|
|
|
## Slurm partitions
|
|
|
|
|
|
There are two partitions on which you can submit jobs on irma-atlas:
|
|
|
* public: This partition allows to you access the 4 nodes. This is the default partition, that notably allows you to run MPI jobs ;
|
|
|
* K80: This partition allows you to access the node on which the K80 GPGPU cards are installed.
|
|
|
|
|
|
## Interactive access to the nodes
|
|
|
|
|
|
You can access with ssh to the nodes, as long as you have a job running on that node.
|
|
|
If you use `sbatch` and then ssh, you will be disconnected from the node when the job ends.
|
|
|
If you want to keep accessing a node for a certain period of time, you can allocate a job and then connect to the node. To do so, you can use the `salloc` command, e.g:
|
|
|
|
|
|
```
|
|
|
# Here you allocate a job with the following constraints:
|
|
|
# -t "02:00:00": the job will remain active for 2 hours
|
|
|
# -p K80: it will be submitted to the K80 partition
|
|
|
# -w irma-atlas4: the job will target the irma-atlas4 machine only
|
|
|
# --exclusive: You will have exclusive access to the node
|
|
|
salloc -t "02:00:00" -p K80 -w irma-atlas4 --exclusive
|
|
|
```
|
|
|
|
|
|
> **IMPORTANT NOTE:** Please be reasonable with your use of the `--exclusive` and `-t "XX:YY:ZZ"`, as it could prevent other users to access the node. You can cancel a job with `scancel`.
|
|
|
|
|
|
## Basic slurm script with an MPI application
|
|
|
|
|
|
First, read the slurm quickstart: [(external link)](https://computing.llnl.gov/linux/slurm/quickstart.html)
|
|
|
|
|
|
Here is a basic slurm script to get you started:
|
|
|
|
|
|
```
|
|
|
#!/bin/bash
|
|
|
|
|
|
# Lines with SBATCH starting with ## are comments and starting with # are actual commands for sbatch
|
|
|
|
|
|
#SBATCH -p public
|
|
|
# number of cores
|
|
|
#SBATCH -n 96
|
|
|
# Hyperthreading is enabled on irma-atlas, if you do not want to use it
|
|
|
# You must specify the following option
|
|
|
#SBATCH --ntasks-per-core 1
|
|
|
# min-max number of nodes
|
|
|
##SBATCH -N 3-6
|
|
|
# max time of exec (will be killed afterwards)
|
|
|
##SBATCH -t 12:00:00
|
|
|
# number of tasks per node
|
|
|
##SBATCH --tasks-per-node 1
|
|
|
# specify execution constraitns
|
|
|
##SBATCH --constraint \"intel\"
|
|
|
# min mem size
|
|
|
##SBATCH --mem=16684
|
|
|
# display info about cpu binding
|
|
|
##SBATCH --cpu_bind=verbose
|
|
|
# send a mail at the end of the exec
|
|
|
#SBATCH --mail-type=END
|
|
|
#SBATCH --mail-user=login@server.com
|
|
|
|
|
|
# If you want to have access to Feel++ logs
|
|
|
# export the FEELPP_SCRATCHDIR variable to an NFS mounted directory
|
|
|
export FEELPP_SCRATCHDIR=/scratch/job.$SLURM_JOB_ID
|
|
|
|
|
|
#################### OPTIONAL:
|
|
|
# In case you want to use modules.
|
|
|
# You first have to activate the module command
|
|
|
source /etc/profile.d/modules.sh
|
|
|
|
|
|
# Source the configuration for Feel++ or your custom configuration
|
|
|
PREVPATH=`pwd`
|
|
|
cd /data/software/config/etc
|
|
|
source feelpprc.sh
|
|
|
cd ${PREVPATH}
|
|
|
|
|
|
# Load modules here
|
|
|
# This is an example of module to load
|
|
|
module load gcc490.profile
|
|
|
#################### OPTIONAL:
|
|
|
|
|
|
# Finally launch the job
|
|
|
# mpirun of openmpi is natively interfaced with Slurm
|
|
|
# No need to precise the number of processors to use
|
|
|
cd <appdir>
|
|
|
mpirun --bind-to-core -x LD_LIBRARY_PATH <appname> --config-file <appcfg.cfg>
|
|
|
|
|
|
mkdir -p /data/<login>/slurm
|
|
|
cp -r /scratch/job.$SLURM_JOB_ID /data/<login>/slurm
|
|
|
```
|
|
|
|
|
|
See [sbatch](https://computing.llnl.gov/linux/slurm/sbatch.html) for additional commands.
|
|
|
|
|
|
Then you can launch the application with `sbatch <name_of_the_script>`.
|
|
|
|
|
|
## Other use-cases for slurm:
|
|
|
|
|
|
* Slurm with R:
|
|
|
* Samples from the University of Michigan: [(external link)](http://sph.umich.edu/biostat/computing/cluster/slurm.html)
|
|
|
|
|
|
## Run a program in the background
|
|
|
|
|
|
### Nohup
|
|
|
|
|
|
One way is to send the scripts out of the ssh session using this command
|
|
|
|
|
|
ssh user@host "nohup script1 > /dev/null 2>&1 &; nohup script2; ..."
|
|
|
|
|
|
Note: Be careful, you can't use nohup in your ssh session. Why ? When you exit your ssh session, a SIGTERM signal is sent which will close all process with the user UID (Even background ones).
|
|
|
|
|
|
### Screen (recommended)
|
|
|
|
|
|
Screen is a windows manager that let you create virtual terminals in several process. (See the Manual for full details)
|
|
|
|
|
|
Run a program in the background:
|
|
|
|
|
|
```
|
|
|
screen # open shell in a virtual window (BEFORE using chroot!)
|
|
|
screen -d # detach your virtual terminal
|
|
|
```
|
|
|
You can now exit from your ssh session.
|
|
|
|
|
|
To recover your virtual terminal, use
|
|
|
|
|
|
```
|
|
|
screen -r
|
|
|
```
|
|
|
To list all available windows, type
|
|
|
|
|
|
```
|
|
|
screen -ls
|
|
|
```
|
|
|
|
|
|
There are many shortcuts you can use. Some of them are sumed up here:
|
|
|
|
|
|
```
|
|
|
<ctrl+a> <?> : Displays commands infos
|
|
|
<ctrl+a> <:> : Enter to the command prompt of screen
|
|
|
<ctrl+a> <"> : Window list
|
|
|
<ctrl+a> <0> : opens window 0
|
|
|
<ctrl+a> <c> : Create a new window
|
|
|
<ctrl+a> <S> : Split current region into two regions
|
|
|
<ctrl+a> <tab> : Switch the input focus to the next region
|
|
|
<ctrl+a> <ctrl+a> : Toggle between current and previous region
|
|
|
<ctrl+a> <Esc> : Enter copy Mode
|
|
|
<ctrl+a> <Q> : Close all regions but the current one
|
|
|
<ctrl+a> <X> : Close the current region
|
|
|
<ctrl+a> <d> : Detach from the current screen session
|
|
|
```
|
|
|
|
|
|
If you want to make you screen more user-friendly, you can customize it so that the bottom status line displays all the terminals opened in screen and the currently opened one. There are some configuration examples in the following link: [.screenrc examples](https://bbs.archlinux.org/viewtopic.php?id=55618)
|
|
|
|
|
|
See Manual for other features. |