|
## Irma atlas cluster
|
|
# Irma atlas cluster
|
|
|
|
|
|
### What the cluster is
|
|
## Description
|
|
|
|
|
|
#### Node configuration
|
|
### Node configuration
|
|
|
|
|
|
The configuration has a frontal node, named irma-atlas, and 4 compute nodes.
|
|
The configuration has a frontal node, named irma-atlas, and 4 compute nodes.
|
|
|
|
|
... | @@ -27,7 +27,7 @@ Since the 25th of novembre 2015, one of the node has been equipped with 2 NVIDIA |
... | @@ -27,7 +27,7 @@ Since the 25th of novembre 2015, one of the node has been equipped with 2 NVIDIA |
|
Everything is interconnected with both 10Gb Ethernet cards and 40Gb Infiniband cards.
|
|
Everything is interconnected with both 10Gb Ethernet cards and 40Gb Infiniband cards.
|
|
The workload manager is [slurm](https://computing.llnl.gov/linux/slurm/).
|
|
The workload manager is [slurm](https://computing.llnl.gov/linux/slurm/).
|
|
|
|
|
|
#### Storage
|
|
### Storage
|
|
|
|
|
|
On the frontal node irma-atlas, you have access to several storage:
|
|
On the frontal node irma-atlas, you have access to several storage:
|
|
|
|
|
... | @@ -35,7 +35,7 @@ On the frontal node irma-atlas, you have access to several storage: |
... | @@ -35,7 +35,7 @@ On the frontal node irma-atlas, you have access to several storage: |
|
* The /data/<username> directory. If this directory does not exist, you must create one, so you don't mix your files with other users. This partition has a size of 50 TB, so you can store big data, like simulation results, compilation-related files and libraries ...
|
|
* The /data/<username> directory. If this directory does not exist, you must create one, so you don't mix your files with other users. This partition has a size of 50 TB, so you can store big data, like simulation results, compilation-related files and libraries ...
|
|
* The /ssd/<username> directory. If this directory does not exist, you must create one, so you don't mix your files with other users. This partition has a size of 2 TB and is put on SSDs for increased access speed. You can use it to store medium-sized data.
|
|
* The /ssd/<username> directory. If this directory does not exist, you must create one, so you don't mix your files with other users. This partition has a size of 2 TB and is put on SSDs for increased access speed. You can use it to store medium-sized data.
|
|
|
|
|
|
### Switching between configuration : Modules
|
|
## Switching between configuration : Modules
|
|
|
|
|
|
The `module` command is installed on the cluster. It allows you to use specific versions of libraries, that are not present in the packaging system. This is done by modifying your environment variables, like `LD_LIBRARY_PATH` or `PATH`.
|
|
The `module` command is installed on the cluster. It allows you to use specific versions of libraries, that are not present in the packaging system. This is done by modifying your environment variables, like `LD_LIBRARY_PATH` or `PATH`.
|
|
|
|
|
... | @@ -112,7 +112,7 @@ When using module, there are two kinds of modules available: |
... | @@ -112,7 +112,7 @@ When using module, there are two kinds of modules available: |
|
When using the modules, you really need to pay attention to the modules you load, specifically due to the dependencies that exist between modules (e.g. petsc is compiled for a specific version of OpenMPI, thus will not work with other MPI versions). For this reason, we will only provide support for Profile modules.
|
|
When using the modules, you really need to pay attention to the modules you load, specifically due to the dependencies that exist between modules (e.g. petsc is compiled for a specific version of OpenMPI, thus will not work with other MPI versions). For this reason, we will only provide support for Profile modules.
|
|
In case you encounter problems, please ask the persons in charge, indicated when you log on atlas.
|
|
In case you encounter problems, please ask the persons in charge, indicated when you log on atlas.
|
|
|
|
|
|
### Feel++ as a library
|
|
## Feel++ as a library
|
|
|
|
|
|
Feel++ is installed as a library available thanks to the module. It is recommended to take that solution if you do not need to modify the library.
|
|
Feel++ is installed as a library available thanks to the module. It is recommended to take that solution if you do not need to modify the library.
|
|
|
|
|
... | @@ -148,7 +148,7 @@ feelpp_add_application(youApplication SRCS yourCode.cpp) |
... | @@ -148,7 +148,7 @@ feelpp_add_application(youApplication SRCS yourCode.cpp) |
|
|
|
|
|
* The last step is to run the `cmake /where/are/your/sources/files` and `make` commands in your build directory - typically `/ssd/YOUR_NAME/build`
|
|
* The last step is to run the `cmake /where/are/your/sources/files` and `make` commands in your build directory - typically `/ssd/YOUR_NAME/build`
|
|
|
|
|
|
### Switching network configuration
|
|
## Switching network configuration
|
|
|
|
|
|
By default, OpenMPI will use the best network available, i.e. Infiniband on the cluster (see [OpenMPI tcp](http://icl.cs.utk.edu/open-mpi/faq/?category=tcp#tcp-auto-disable))
|
|
By default, OpenMPI will use the best network available, i.e. Infiniband on the cluster (see [OpenMPI tcp](http://icl.cs.utk.edu/open-mpi/faq/?category=tcp#tcp-auto-disable))
|
|
|
|
|
... | @@ -160,17 +160,17 @@ If you want to use ethernet instead of infiniband, just add the following option |
... | @@ -160,17 +160,17 @@ If you want to use ethernet instead of infiniband, just add the following option |
|
mpirun -mca btl tcp,self -np X ...
|
|
mpirun -mca btl tcp,self -np X ...
|
|
```
|
|
```
|
|
|
|
|
|
### Slurm configuration
|
|
## Slurm configuration
|
|
|
|
|
|
By default and for each Slurm job, a directory named `job.<job_id>` will be created on the `/scratch` directory of each node. You can use it to store additional data. For now, this directory is not deleted at the end of the job, but you need to plan a copy of those data in the /data directory.
|
|
By default and for each Slurm job, a directory named `job.<job_id>` will be created on the `/scratch` directory of each node. You can use it to store additional data. For now, this directory is not deleted at the end of the job, but you need to plan a copy of those data in the /data directory.
|
|
|
|
|
|
### Slurm partitions
|
|
## Slurm partitions
|
|
|
|
|
|
There are two partitions on which you can submit jobs on irma-atlas:
|
|
There are two partitions on which you can submit jobs on irma-atlas:
|
|
* public: This partition allows to you access the 4 nodes. This is the default partition, that notably allows you to run MPI jobs ;
|
|
* public: This partition allows to you access the 4 nodes. This is the default partition, that notably allows you to run MPI jobs ;
|
|
* K80: This partition allows you to access the node on which the K80 GPGPU cards are installed.
|
|
* K80: This partition allows you to access the node on which the K80 GPGPU cards are installed.
|
|
|
|
|
|
### Interactive access to the nodes
|
|
## Interactive access to the nodes
|
|
|
|
|
|
You can access with ssh to the nodes, as long as you have a job running on that node.
|
|
You can access with ssh to the nodes, as long as you have a job running on that node.
|
|
If you use `sbatch` and then ssh, you will be disconnected from the node when the job ends.
|
|
If you use `sbatch` and then ssh, you will be disconnected from the node when the job ends.
|
... | @@ -187,7 +187,7 @@ salloc -t "02:00:00" -p K80 -w irma-atlas4 --exclusive |
... | @@ -187,7 +187,7 @@ salloc -t "02:00:00" -p K80 -w irma-atlas4 --exclusive |
|
|
|
|
|
> **IMPORTANT NOTE:** Please be reasonable with your use of the `--exclusive` and `-t "XX:YY:ZZ"`, as it could prevent other users to access the node. You can cancel a job with `scancel`.
|
|
> **IMPORTANT NOTE:** Please be reasonable with your use of the `--exclusive` and `-t "XX:YY:ZZ"`, as it could prevent other users to access the node. You can cancel a job with `scancel`.
|
|
|
|
|
|
### Basic slurm script with an MPI application
|
|
## Basic slurm script with an MPI application
|
|
|
|
|
|
First, read the slurm quickstart: [(external link)](https://computing.llnl.gov/linux/slurm/quickstart.html)
|
|
First, read the slurm quickstart: [(external link)](https://computing.llnl.gov/linux/slurm/quickstart.html)
|
|
|
|
|
... | @@ -254,12 +254,13 @@ See [sbatch](https://computing.llnl.gov/linux/slurm/sbatch.html) for additional |
... | @@ -254,12 +254,13 @@ See [sbatch](https://computing.llnl.gov/linux/slurm/sbatch.html) for additional |
|
|
|
|
|
Then you can launch the application with `sbatch <name_of_the_script>`.
|
|
Then you can launch the application with `sbatch <name_of_the_script>`.
|
|
|
|
|
|
### Other use-cases for slurm:
|
|
## Other use-cases for slurm:
|
|
|
|
|
|
* Slurm with R:
|
|
* Slurm with R:
|
|
* Samples from the University of Michigan: [(external link)](http://sph.umich.edu/biostat/computing/cluster/slurm.html)
|
|
* Samples from the University of Michigan: [(external link)](http://sph.umich.edu/biostat/computing/cluster/slurm.html)
|
|
|
|
|
|
## Run a program in the background
|
|
## Run a program in the background
|
|
|
|
|
|
### Nohup
|
|
### Nohup
|
|
|
|
|
|
One way is to send the scripts out of the ssh session using this command
|
|
One way is to send the scripts out of the ssh session using this command
|
... | | ... | |