... | ... | @@ -4,281 +4,9 @@ |
|
|
|
|
|
- [Description](description)
|
|
|
- [Environment modules](modules)
|
|
|
- [Job scheduling with slurm](slurm)
|
|
|
- [Troubleshooting](troubleshooting)
|
|
|
- [Feel++](feelpp)
|
|
|
|
|
|
## Switching between configuration : Modules
|
|
|
|
|
|
The `module` command is installed on the cluster. It allows you to use specific versions of libraries, that are not present in the packaging system. This is done by modifying your environment variables, like `LD_LIBRARY_PATH` or `PATH`.
|
|
|
|
|
|
|
|
|
#### Usage
|
|
|
|
|
|
* List the available modules with:
|
|
|
|
|
|
```
|
|
|
module avail
|
|
|
```
|
|
|
|
|
|
* List the currently load modules with:
|
|
|
|
|
|
```
|
|
|
module list
|
|
|
```
|
|
|
|
|
|
* Load a new module:
|
|
|
|
|
|
```
|
|
|
module load <modulename>
|
|
|
```
|
|
|
|
|
|
* Unload a module:
|
|
|
|
|
|
```
|
|
|
module unload <modulename>
|
|
|
```
|
|
|
|
|
|
When using module, there are two kinds of modules available:
|
|
|
|
|
|
* Single modules: Modules that load environment for a specific library
|
|
|
* Profile modules: Meta-modules that load other modules
|
|
|
|
|
|
When using the modules, you really need to pay attention to the modules you load, specifically due to the dependencies that exist between modules (e.g. petsc is compiled for a specific version of OpenMPI, thus will not work with other MPI versions). For this reason, we will only provide support for Profile modules.
|
|
|
In case you encounter problems, please ask the persons in charge, indicated when you log on atlas.
|
|
|
|
|
|
## Feel++ as a library
|
|
|
|
|
|
Feel++ is installed as a library available thanks to the module. It is recommended to take that solution if you do not need to modify the library.
|
|
|
|
|
|
* After the module configuration in your shell config file, add:
|
|
|
|
|
|
```sh
|
|
|
module load latest.testing.profile
|
|
|
module load science/feelpp/nightly
|
|
|
export CC=/usr/bin/clang-3.7
|
|
|
export CXX=/usr/bin/clang++-3.7
|
|
|
```
|
|
|
|
|
|
* The first line is to load the meta module that contains every Feel++ dependancies.
|
|
|
* The second is to load the library
|
|
|
* The two lasts are to use the same compiler as the one the library was installed with.
|
|
|
* Your root `CMakeLists.txt` has to be - considering you are trying to compile `yourCode.cpp`:
|
|
|
|
|
|
```sh
|
|
|
cmake_minimum_required(VERSION 2.8)
|
|
|
|
|
|
find_package(Feel++
|
|
|
PATHS $ENV{FEELPP_DIR}/share/feel/cmake/modules
|
|
|
/usr/share/feel/cmake/modules
|
|
|
/usr/local/share/feel/cmake/modules
|
|
|
/opt/share/feel/cmake/modules
|
|
|
)
|
|
|
if(NOT FEELPP_FOUND)
|
|
|
message(FATAL_ERROR "Feel++ was not found on your system. Make sure to install it and specify the FEELPP_DIR to reference the installation directory.")
|
|
|
endif()
|
|
|
|
|
|
feelpp_add_application(youApplication SRCS yourCode.cpp)
|
|
|
```
|
|
|
|
|
|
* The last step is to run the `cmake /where/are/your/sources/files` and `make` commands in your build directory - typically `/ssd/YOUR_NAME/build`
|
|
|
|
|
|
## Switching network configuration
|
|
|
|
|
|
By default, OpenMPI will use the best network available, i.e. Infiniband on the cluster (see [OpenMPI tcp](http://icl.cs.utk.edu/open-mpi/faq/?category=tcp#tcp-auto-disable))
|
|
|
|
|
|
However, if you want to use TCP, please refer to [tcp params](http://icl.cs.utk.edu/open-mpi/faq/?category=tcp#tcp-params).
|
|
|
|
|
|
If you want to use ethernet instead of infiniband, just add the following option to the mpirun command:
|
|
|
|
|
|
```
|
|
|
mpirun -mca btl tcp,self -np X ...
|
|
|
```
|
|
|
|
|
|
## Slurm configuration
|
|
|
|
|
|
By default and for each Slurm job, a directory named `job.<job_id>` will be created on the `/scratch` directory of each node. You can use it to store additional data. For now, this directory is not deleted at the end of the job, but you need to plan a copy of those data in the /data directory.
|
|
|
|
|
|
## Slurm partitions
|
|
|
|
|
|
There are two partitions on which you can submit jobs on irma-atlas:
|
|
|
* public: This partition allows to you access the 4 nodes. This is the default partition, that notably allows you to run MPI jobs ;
|
|
|
* K80: This partition allows you to access the node on which the K80 GPGPU cards are installed.
|
|
|
|
|
|
## Interactive access to the nodes
|
|
|
|
|
|
You can access with ssh to the nodes, as long as you have a job running on that node.
|
|
|
If you use `sbatch` and then ssh, you will be disconnected from the node when the job ends.
|
|
|
If you want to keep accessing a node for a certain period of time, you can allocate a job and then connect to the node. To do so, you can use the `salloc` command, e.g:
|
|
|
|
|
|
```
|
|
|
# Here you allocate a job with the following constraints:
|
|
|
# -t "02:00:00": the job will remain active for 2 hours
|
|
|
# -p K80: it will be submitted to the K80 partition
|
|
|
# -w irma-atlas4: the job will target the irma-atlas4 machine only
|
|
|
# --exclusive: You will have exclusive access to the node
|
|
|
salloc -t "02:00:00" -p K80 -w irma-atlas4 --exclusive
|
|
|
```
|
|
|
|
|
|
> **IMPORTANT NOTE:** Please be reasonable with your use of the `--exclusive` and `-t "XX:YY:ZZ"`, as it could prevent other users to access the node. You can cancel a job with `scancel`.
|
|
|
|
|
|
## Basic slurm script with an MPI application
|
|
|
|
|
|
First, read the slurm quickstart: [(external link)](https://computing.llnl.gov/linux/slurm/quickstart.html)
|
|
|
|
|
|
Here is a basic slurm script to get you started:
|
|
|
|
|
|
```
|
|
|
#!/bin/bash
|
|
|
|
|
|
# Lines with SBATCH starting with ## are comments and starting with # are actual commands for sbatch
|
|
|
|
|
|
#SBATCH -p public
|
|
|
# number of cores
|
|
|
#SBATCH -n 96
|
|
|
# Hyperthreading is enabled on irma-atlas, if you do not want to use it
|
|
|
# You must specify the following option
|
|
|
#SBATCH --ntasks-per-core 1
|
|
|
# min-max number of nodes
|
|
|
##SBATCH -N 3-6
|
|
|
# max time of exec (will be killed afterwards)
|
|
|
##SBATCH -t 12:00:00
|
|
|
# number of tasks per node
|
|
|
##SBATCH --tasks-per-node 1
|
|
|
# specify execution constraitns
|
|
|
##SBATCH --constraint \"intel\"
|
|
|
# min mem size
|
|
|
##SBATCH --mem=16684
|
|
|
# display info about cpu binding
|
|
|
##SBATCH --cpu_bind=verbose
|
|
|
# send a mail at the end of the exec
|
|
|
#SBATCH --mail-type=END
|
|
|
#SBATCH --mail-user=login@server.com
|
|
|
|
|
|
# If you want to have access to Feel++ logs
|
|
|
# export the FEELPP_SCRATCHDIR variable to an NFS mounted directory
|
|
|
export FEELPP_SCRATCHDIR=/scratch/job.$SLURM_JOB_ID
|
|
|
|
|
|
#################### OPTIONAL:
|
|
|
# In case you want to use modules.
|
|
|
# You first have to activate the module command
|
|
|
source /etc/profile.d/modules.sh
|
|
|
|
|
|
# Source the configuration for Feel++ or your custom configuration
|
|
|
PREVPATH=`pwd`
|
|
|
cd /data/software/config/etc
|
|
|
source feelpprc.sh
|
|
|
cd ${PREVPATH}
|
|
|
|
|
|
# Load modules here
|
|
|
# This is an example of module to load
|
|
|
module load gcc490.profile
|
|
|
#################### OPTIONAL:
|
|
|
|
|
|
# Finally launch the job
|
|
|
# mpirun of openmpi is natively interfaced with Slurm
|
|
|
# No need to precise the number of processors to use
|
|
|
cd <appdir>
|
|
|
mpirun --bind-to-core -x LD_LIBRARY_PATH <appname> --config-file <appcfg.cfg>
|
|
|
|
|
|
mkdir -p /data/<login>/slurm
|
|
|
cp -r /scratch/job.$SLURM_JOB_ID /data/<login>/slurm
|
|
|
```
|
|
|
|
|
|
See [sbatch](https://computing.llnl.gov/linux/slurm/sbatch.html) for additional commands.
|
|
|
|
|
|
Then you can launch the application with `sbatch <name_of_the_script>`.
|
|
|
|
|
|
## Other use-cases for slurm:
|
|
|
|
|
|
* Slurm with R:
|
|
|
* Samples from the University of Michigan: [(external link)](http://sph.umich.edu/biostat/computing/cluster/slurm.html)
|
|
|
|
|
|
## Run a program in the background
|
|
|
|
|
|
### Nohup
|
|
|
|
|
|
One way is to send the scripts out of the ssh session using this command
|
|
|
|
|
|
ssh user@host "nohup script1 > /dev/null 2>&1 &; nohup script2; ..."
|
|
|
|
|
|
Note: Be careful, you can't use nohup in your ssh session. Why ? When you exit your ssh session, a SIGTERM signal is sent which will close all process with the user UID (Even background ones).
|
|
|
|
|
|
### Screen (recommended)
|
|
|
|
|
|
Screen is a windows manager that let you create virtual terminals in several process. (See the Manual for full details)
|
|
|
|
|
|
Run a program in the background:
|
|
|
|
|
|
```
|
|
|
screen # open shell in a virtual window (BEFORE using chroot!)
|
|
|
screen -d # detach your virtual terminal
|
|
|
```
|
|
|
You can now exit from your ssh session.
|
|
|
|
|
|
To recover your virtual terminal, use
|
|
|
|
|
|
```
|
|
|
screen -r
|
|
|
```
|
|
|
To list all available windows, type
|
|
|
|
|
|
```
|
|
|
screen -ls
|
|
|
```
|
|
|
|
|
|
There are many shortcuts you can use. Some of them are sumed up here:
|
|
|
|
|
|
```
|
|
|
<ctrl+a> <?> : Displays commands infos
|
|
|
<ctrl+a> <:> : Enter to the command prompt of screen
|
|
|
<ctrl+a> <"> : Window list
|
|
|
<ctrl+a> <0> : opens window 0
|
|
|
<ctrl+a> <c> : Create a new window
|
|
|
<ctrl+a> <S> : Split current region into two regions
|
|
|
<ctrl+a> <tab> : Switch the input focus to the next region
|
|
|
<ctrl+a> <ctrl+a> : Toggle between current and previous region
|
|
|
<ctrl+a> <Esc> : Enter copy Mode
|
|
|
<ctrl+a> <Q> : Close all regions but the current one
|
|
|
<ctrl+a> <X> : Close the current region
|
|
|
<ctrl+a> <d> : Detach from the current screen session
|
|
|
```
|
|
|
|
|
|
If you want to make you screen more user-friendly, you can customize it so that the bottom status line displays all the terminals opened in screen and the currently opened one. There are some configuration examples in the following link: [.screenrc examples](https://bbs.archlinux.org/viewtopic.php?id=55618)
|
|
|
|
|
|
See Manual for other features.
|
|
|
|
|
|
## Troobleshooting
|
|
|
|
|
|
### My code runs slower on a computing server than on my laptop. Whay is the problem ?
|
|
|
|
|
|
You might experience drops in performance when scaling to a larger computer, for example from a laptop. The most common way to solve this is to export the following variable:
|
|
|
|
|
|
```
|
|
|
export OMP_NUM_THREADS=1
|
|
|
```
|
|
|
|
|
|
This is due to an underlying library using the maximum number of available threads to execute a code. This causes drops in performance. When declaring the previous variable, you will only allow one thread per process, thus restoring expected performance.
|
|
|
|
|
|
Other factors that might harm performance:
|
|
|
|
|
|
* Slower hard drives
|
|
|
* Slower CPUs / Hyperthreading
|
|
|
|
|
|
### CMake
|
|
|
|
|
|
* You end up with errors similar to this one at the end of the cmake step:
|
|
|
|
|
|
```
|
|
|
CMake Warning at feel/CMakeLists.txt:59 (add_library):
|
|
|
Cannot generate a safe runtime search path for target feelpp because files
|
|
|
in some directories may conflict with libraries in implicit directories:
|
|
|
|
|
|
runtime library [libmpi_cxx.so.1] in /usr/lib may be hidden by files in:
|
|
|
/data/software/install/openmpi-1.8.5/gcc-4.9.0/lib
|
|
|
runtime library [libmpi.so.1] in /usr/lib may be hidden by files in:
|
|
|
/data/software/install/openmpi-1.8.5/gcc-4.9.0/lib
|
|
|
|
|
|
Some of these libraries may not be found correctly.
|
|
|
```
|
|
|
|
|
|
This error is very likely to be linked to loaded modules built with different version of libraries, e.g. one module has been built with version 1.8.5 of OpenMPI and an other one built with version 1.6.4 of OpenMPI, thus CMake is having trouble finding the correct library to use.
|
|
|
To solve this issue and IF you used a profile for loading modules, refer to this section to an administrator and ask him to update the modules.
|
|
|
If you didn't use a profile for modules, then you probably have a conflict between modules. |
|
|
\ No newline at end of file |