Danish Center for Climate Computing (DC3)
ÆGIR (AEGIR), BYLGJA and HRONN clusters are equipped with 1040 CPU cores. There are 17 nodes with 16 cores per node, 12 nodes with 32 cores per node and 8 nodes with 48 cores per node, 4.16 TB of RAM and high-speed Infiniband or RoCE internal networks.
- 2 CPUs per node: Xeon E5-2667v3 3.2GHz (8 cores per CPU)
- RAM per node: 64GB DDR4 (per node)
- Interconnection: Mellanox QDR Infiniband
- 2 CPUs per node: Intel Xeon E5-2683v4 2.1GHz (16 cores per CPU)
- RAM per node: 128GB DDR4 (per node)
- Interconnection: Mellanox QDR Infiniband
- 2 CPUs per node: Intel Xeon Gold 6248R 3.0GHz (24 cores per CPU)
- RAM per node: 192GB DDR4 (per node)
- Interconnection: RoCE v2
- SSD per node: 120 GB
- Mass storage: 134 TB
To get access to the DC3 systems you need to be either a HPC grant holder or a member of a group holding a current HPC grant.
To get an account please go to the following web-page: https://hpc.ku.dk
Connecting to DC3
In order to login to DC3 computational system, you must use the SSH protocol. This is provided by the "ssh" command on Unix-like systems (including Mac OS X) or by using an SSH-compatible application (e.g. PuTTY on Microsoft Windows). We recommend that you "forward" X11 connections when initiating an SSH session to DC3. For example, when using the ssh command on Unix-based systems, provide the "-Y" option:
ssh -Y firstname.lastname@example.org
In order to download/upload data from/to DC3 use the following command:
scp –pr user@host1:from_path_file1 user@host2:to_path_file2
for more information use man/info commands (man scp).
There are 5 frontend nodes available at the moment: fend01.hpc.ku.dk - fend05.hpc.ku.dk
DC3 provides a rich set of HPC utilities, applications, compilers and programming libraries. If there is something missing that you want, send email to email@example.com with your request and evaluate it for appropriateness, cost, effort, and benfit to the community. See more information about available software and how to use it in the Available Software section below.
Customizing Your Environment
The way you interact with the DC3 computer can be controlled via certain startup scripts that run when you log in and at other times. You can customize some of these scripts, which are called "dot files" by setting environment variables and aliases in them. There are several "standard" dot-files, such files are .bash_profile, .bashrc, .zshrc, .cshrc, .kshrc, .login, .profile, .tcshrc, or .zprofile. Which of those you modify depends on your choice of shell, although note that DC3 recommends the bash. The table below contains examples of basic customizations. Note that when making changes such as these it's always a good idea to have two terminal sessions active on the machine so that you can back out changes if needed!
Easy access to software is controlled by the LMOD module utility. With modules, you can easily manipulate your computing environment to choose applications and programming libraries. In order to have an access to the software one must execute in the command line and/or add in the
$HOME/.bash_profile the following line:
If you want to change software environment you "load", "rm," and "swap" modules. A small set of module commands below can do most of what you'll want to do.
The first command of interest is "ml list", which will show you your currently loaded modules. When you first log in, you have a number of modules loaded for you.
Let's say you want to use a different compiler. The "ml avail" command will list all available modules. You can use the module's name stem to do a useful search.
Let's say I want to use the INTEL compilers instead of GCC. Here's how to make the change:
ml swap gcc/8.3.0 intel/17.1.0
Now you are using the INTEL compilers (C, C++, FORTRAN) version 17.1.0. Note that modules doesn't give you any feedback about whether the swap command did what you wanted it to do, so always double-check your environment using the "ml list" command.
There is plenty of software that is not loaded by default. You can use the "ml avail" command to see what modules are available.
For example, if you want to use the OpenBLAS linear algebra library. Try "ml avail openblas" The default version is 0.3.6, but say you'd rather use some features available only in version 0.2.6. In that case, just load that module. If you want to use the default version, you can type either "ml load openblas" or "ml load openblas/0.3.6", either will work.
Software Available via Module Utility
ANACONDA is completely free enterprise-ready Python distribution for large-scale data processing, predictive analytics, and scientific computing.
VEROS is Versatile Ocean Simulation in Pure Python, aims to be the swiss army knife of ocean modeling. It is a full-fledged GCM that supports anything between highly idealized configurations and realistic set-ups, targeting students and seasoned researchers alike. Thanks to its seamless interplay with Bohrium, Veros runs efficiently on your laptop, gaming PC (with experimental GPU support through OpenCL), and small cluster.
C++ Boost library provides free peer-reviewed portable C++ source libraries to speedup software development.
NCO is netCDF Operator toolkit, which manipulates and analyzes data stored in netCDF-accessible formats, including DAP, HDF4, and HDF5. It exploits the geophysical expressivity of many CF (Climate & Forecast) metadata conventions, the flexible description of physical dimensions translated by UDUnits, the network transparency of OPeNDAP, the storage features (e.g., compression, chunking, groups) of HDF (the Hierarchical Data Format), and many powerful mathematical and statistical algorithms of GSL (the GNU Scientific Library). NCO is fast, powerful, and free.
CDO (Climate Data Operators) is a collection of command line Operators to manipulate and analyze Climate and NWP model Data. Supported data formats are GRIB 1/2, netCDF 3/4, SERVICE, EXTRA and IEG. There are more than 600 operators available.
CESM is NCAR/UCAR Community Earth System Model
FFTw3 (serial & parallel, single & double precision) is a C/FORTRAN subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST).
GCC is the GNU Compiler Collection includes front ends for C, C++, Fortran, as well as libraries for these languages (libstdc++,...). GCC was originally written as the compiler for the GNU operating system.
Intel Parallel Studio XE is the comprehensive suite of development tools for building a modern source code with the latest techniques in vectorization, multithreading, multinode parallelisation, and memory optimisation. It includes C, C++, Fortran compilers, Math Kernel Library (MKL) and MPI library.
ecCodes API ECMWF is an application program interface accessible from C, FORTRAN and Python programs developed for encoding and decoding WMO FM-92 GRIB edition 1 and edition 2 messages. A useful set of command line tools is also provided to give quick access to GRIB messages.
GSL (GNU Scientific Library) is a numerical library for C and C++ programmers. The library provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are over 1000 functions in total with an extensive test suite.
HDF5/HDF5-parallel (Hierarchical Data Format) is a data model, library, and file format for storing and managing data. It supports an unlimited variety of data-types, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.
OpenBLAS is Linear Algebra PACKage written in C & Fortran and provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. The associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also provided, as are related computations such as reordering of the Schur factorizations and estimating condition numbers. Dense and banded matrices are handled, but not general sparse matrices. In all areas, similar functionality is provided for real and complex matrices, in both single and double precision.
MPICH2 is a Message Passing Interface-3 implementation.
OpenMPI is a Message Passing Interface-3 implementation.
NetCDF4 (Network Common Data Form) is a set of interfaces for array-oriented data access and a freely distributed collection of data access libraries for C, Fortran, C++ languages. The netCDF libraries support a machine-independent format for representing scientific data. Together, the interfaces, libraries, and format support the creation, access, and sharing of scientific data.
NetCDF-Parallel is a library providing high-performance parallel I/O while still maintaining file-format compatibility with Unidata's NetCDF, specifically the formats of CDF-1 and CDF-2. Although NetCDF supports parallel I/O starting from version 4, the files must be in HDF5 format. PnetCDF is currently the only choice for carrying out parallel I/O on files that are in classic formats (CDF-1 and 2). In addition, PnetCDF supports the CDF-5 file format, an extension of CDF-2, that supports more data types and allows users to define large dimensions, attributes, and variables (>2B elements). NetCDF gives scientific programmers a self-describing and portable means for storing data. However, prior to version 4, netCDF does so in a serial manner. By making some small changes to the netCDF APIs, PnetCDF can use MPI-IO to achieve high-performance parallel I/O.
PETSc (real, complex) pronounced PET-see (the S is silent), is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations. It supports MPI, shared memory pthreads, and GPUs through CUDA or OpenCL, as well as hybrid MPI-shared memory pthreads or MPI-GPU parallelism. The suite has interfaces for Metis, ParMetis, Scotch, Hypre, SuperLU and MUMPS libraries.
PISM is an open source, parallel, high-resolution ice sheet model.
ELMER is an open source multiphysical simulation software mainly developed by CSC - IT Center for Science (CSC).
Source Code Compilation
Let's assume that we're compiling a source code that will run as a parallel application using MPI for internode communication and the code is written in Fortran, C, or C++. In this case, it's easy because you will use standard compiler wrapper script that bring in all the include files and library paths and set linker options that you'll need. One should use the following wrappers: mpif90, mpicc, or mpic++ for Fortran, C, and C++, respectively.
To compile on DC3, execite in a command line:
mpif90 -o hello.x hello.f90
In case we need to use for compilation an extra library like HDF5, one must load it through module utility. Even with the module loaded, the compiler doesn't know where to find files related to the HDF5 library. Another way to try to figure it out for yourself is to look under the covers in the HDF5 module.
The ml show hdf5-parallel command reveals (most of) what the module actually does when you load it. You can see that it defines some environment variables you can use, for example HDF5_INCLUDE, which you can use in your build script or Makefile. Look at the definition of the HDF5_XXX environment variables. They contains all the include and link options.
Therefore, we can use
mpicc -o hd_copy.x hd_copy.c $HDF5_INCLUDE $HDF5_LIB
These are some common compiler optimizations and the types of code that they work best with.
The registers and arithmetic units on DC3 are capable of performing the same operation on several double precision operands simultaneously in a SIMD (Single Instruction Multiple Data) fashion. This is often referred to as vectorization because of its similarities to the much larger vector registers and processing units of the Cray systems of the pre-MPP era. Vector optimization is most useful for large loops with in which each successive operation has no dependencies on the results of the previous operations. Loops can be vectorized by the compiler or by compiler directives in the source code.
This is defined as the compiler optimizing over subroutine, function, or other procedural boundaries This can have many levels ranging from inlining, the replacement of a function call with the corresponding source code at compile time, up to treating the entire program as one routine for the purpose of optimization. This can be the most compute intensive of all optimizations at compile time, particularly for large applications and can result in an increase in the compile time of an order of magnitude or more without any significant speedup and can even cause a compile to crash. For this reason none of the DC3 recommended compiler optimization options include any significant inter-procedural optimizations. It is most suitable when there are function calls embedded within large loops.
Relaxation of IEEE Floating-point Precision
Full implementation of IEEE Floating-point precision is often very expensive. There are many floating-point optimization techniques that significantly speed up a code's performance by relaxing some of these requirements. Since most codes do not require an exact implementation of these rules, all of the DC3 recommended optimizations include relaxed floating-point techniques.
This table shows how to invoke these optimizations for each compiler. Some of the options have numeric levels with the higher the number, the more extensive the optimizations, and with a level of 0 turning the optimization off. For more information about these optimizations, see the compiler on-line man pages.
|IEEE FP relaxation||
The Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
The entities managed by these SLURM daemons, include nodes, the compute resource in SLURM, partitions, which group nodes into logical (possibly overlapping) sets, jobs, or allocations of resources assigned to a user for a specified amount of time, and job steps, which are sets of (possibly parallel) tasks within a job. The partitions can be considered job queues, each of which has an assortment of constraints such as job size limit, job time limit, users permitted to use it, etc.
These are the SLURM commands frequently used on DC3:
sinfo -p aegir is used to show the state of partitions and nodes managed by SLURM:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST aegir up 1-00:00:00 28 alloc node[164-172,174-180,441-452] aegir up 1-00:00:00 1 idle node173
This shows that there are 29 nodes available (up) in the aegir partition, 28 of them are ocuppied (alloc) and 1 is free (idle) with maximum runtime per job (TIMELIMIT) of 24 hours. Nodes named like 164-180 have 16 CPU cores, nodes 441-452 have 32 CPU cores and nodes 453-460 have 48 cores per node.
To see detail specifics of each partition, one must use:
scontrol show partition aegir
PartitionName=aegir AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=NO QoS=N/A DefaultTime=NONE DisableRootJobs=YES ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=node[164-180,441-452] PriorityJobFactor=320 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=1312 TotalNodes=29 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
This output shows us in detail that:
Anyone can submit a job to the aegir partition (AllowGroups=ALL). The walltime limit on the aegir partition is 1 day (MaxTime=1-00:00:00). It is important to understand that “TotalCPUs=1312” number shows a maximum number of cores * threads (2 per core) available on the DC3 cluster.
scontrol show Node=node164 shows an information about node164
NodeName=node164 Arch=x86_64 CoresPerSocket=8 CPUAlloc=32 CPUTot=32 CPULoad=11.90 AvailableFeatures=v1 ActiveFeatures=v1 Gres=(null) NodeAddr=node164 NodeHostName=node164 Version=18.08 OS=Linux 3.10.0-957.12.2.el7.x86_64 #1 SMP Tue May 14 21:24:32 UTC 2019 RealMemory=64136 AllocMem=16384 FreeMem=57967 Sockets=2 Boards=1 State=ALLOCATED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=aegir,bylgja BootTime=2019-07-29T21:52:54 SlurmdStartTime=2020-01-22T18:09:02 CfgTRES=cpu=32,mem=64136M,billing=32 AllocTRES=cpu=32,mem=16G CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
This output shows us the following:
Node 164 has a job running (CPUAlloc), it also shows that there are 2 threads per core (ThreadsPerCore), 32 cores available (CPUTot = physical cores * ThreadsPerCore), amount of memory on the node (RealMemory in Mb) and free disk space (TmpDisk) etc.
squeue -p aegir command is used to show jobs in the queueing system. The command gives an output similar to this:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 16310118 aegir cesmi6ga guido PD 0:00 2 (Resources) 16306731 aegir s40flT0 jlohmann R 1:36:53 1 node172 16309537 aegir i6gat31i nutrik R 2:18:33 5 node[441,443-446] 16307317 aegir Ctrl2dFl jlohmann R 4:56:44 1 node442 16306131 aegir RTIPw1 jlohmann R 5:35:45 1 node164 16305493 aegir S2d360 jlohmann R 7:29:17 1 node452 16303418 aegir cesmi6ga guido R 9:53:55 4 node[166,177-179] 16301838 aegir pit31drb nutrik R 11:33:27 5 node[447-451] 16299026 aegir cesmi6ga guido R 15:32:06 2 node[170-171] 16298749 aegir cesmi6ga guido R 16:11:26 2 node[165,176] 16297229 aegir cesmi6ga guido R 18:33:01 2 node[168-169]
This partial output shows us that:
User nutrik is running on the aegir partition, on nodes [441,443-446] and [447-451] two different jobs each of which running for 2 hrs 18 min and 11 hrs 33 min, respectively. guido is currently queueing on the aegir partition with his job cesmi6ga waiting for a free available nodes (PD).
More generally, the output shows us the following:
The first column is the JOBID, which is used for termination or modification of a job. The second column is the partition the job is running on. The third column is the job's name. The fourth column is the user’s name of the person queueing the job. The fifth column is the state of the job. Some of the possible job states are as follows: PD (pending), R (running), CA (cancelled), CF(configuring), CG (completing), CD (completed), F (failed), TO (timeout), NF (node failure) and SE (special exit state).PD (pending), R (running), CA (cancelled), CF(configuring), CG (completing), CD (completed), F (failed), TO (timeout), NF (node failure) and SE (special exit state). The sixth column is the job's runtime. The seventh & eighth columns are the number of allocated nodes and the nodes list the job is running on.
sbatch $BATCH_FILE is used to submit a job script for execution. The script contains one or more srun commands to launch parallel tasks.
scancel $JOBID is used to cancel a pending or running job or a job step. It can also be used to send an arbitrary signal to all processes associated with a running job or a job step.
SLURM example batch script:
#!/bin/sh # #SBATCH -p aegir #SBATCH -A ocean #SBATCH --job-name=myjob #SBATCH --time=00:30:00 #SBATCH --constraint=v1 #SBATCH --nodes=2 #SBATCH --ntasks=32 #SBATCH --cpus-per-task=1 #SBATCH --exclusive #SBATCH --mail-type=ALL #SBATCH --firstname.lastname@example.org #SBATCH --output=slurm.out srun --mpi=pmi2 --kill-on-bad-exit my_program.exe
Then submit the script:
In this example we use aegir partition to run my_program.exe, set our jobname, request 30 minutes of runtime and nodes with 16 cores (--constraint=v1), 2 nodes and 32 cores (with one task per core), no sharing of nodes resources, send e-mail notifications and define file name for standard job output.
One can request a node with 32 cores, but in this case SLURM batch script looks like:
#!/bin/sh # #SBATCH -p aegir #SBATCH -A ocean #SBATCH --job-name=myjob #SBATCH --time=00:30:00 #SBATCH --constraint=v2 #SBATCH --nodes=1 #SBATCH --ntasks=32 #SBATCH --cpus-per-task=1 #SBATCH --exclusive #SBATCH --mail-type=ALL #SBATCH --email@example.com #SBATCH --output=slurm.out srun --mpi=pmi2 --kill-on-bad-exit my_program.exe
#SBATCH --nodes=1 and
#SBATCH --constraint=v2 are changed to correspond to the nodes with 32 cores.
The examples of batch scripts above are given for source codes compiled with GNU C, C++ and FORTRAN compilers and linked against MPICH or MVAPICH2 libraries. You can omit
--mpi=pmi2 if your source code was built with Intel Parallel Studio.
There is no individual user quota but group quota with a limited amount of space which is enforced by the file system. If this limit is exceeded, the whole group will not be able to write new data.
You can check the current use with:
lfs quota -h /lustre/hpc
More information on mass storage and workload manager can be found here: https://hpc.ku.dk
|Student / Researcher||Project||Supervisor / PI|
|Xaver Lange (PostDoc)||Regional Ocean Modelling (Villum Experiment)||Markus Jochum|
|Dion Häfner (PhD)||Rogue Waves Prediction (DHRTC)||Markus Jochum|
|Bettina Meyer (PostDoc)||Quantifying convective precipitation extremes under changing climate (Villum Foundation)||Jan Olaf Härter|
|Romain Frédéric Sébastien Fiévet||Quantifying convective precipitation extremes under changing climate (Villum Foundation)||Jan Olaf Härter|
|Gorm Gruner Jensen||Quantifying convective precipitation extremes under changing climate (Villum Foundation)||Jan Olaf Härter|
|Ann-Sofie Priergaard Zinck (MSc)||Thesis||Christine Hvidberg|
|Iben Koldtoft (PhD)||Ice2Ice||Christine Hvidberg
Jens Hesselbjerg Christensen
|Johannes Lohmann (PostDoc)||Villum Experiment||Peter Ditlevsen|
|Guido Vettoretti||EU H2020 TiPES||Peter Ditlevsen|
|Kasper Tølløse (MSc)||Thesis||Eigil Kaas|
Haerter, J.O., Meyer, B., Nissen, S.B. (2020). Diurnal self-aggregation. npj Clim Atmos Sci 3, 30. https://doi.org/10.1038/s41612-020-00132-z
- Keisling, B.A., Nielsen L.T., Hvidberg C.S., Nuterman R., DeConto R.M. (2020). Pliocene–Pleistocene megafloods as a mechanism for Greenlandic megacanyon formation. Geology, 48. https://doi.org/10.1130/G47253.1
- Nielsen, S. B., Jochum, M., Pedro, J. B., Eden, C., Nuterman, R. (2019). Two-time scale carbon cycle response to an AMOC collapse. Paleoceanography and Paleoclimatology, 34. https://doi.org/10.1029/2018PA003481
- Moseley, C., Henneberg, O., Haerter, J. (2019). A statistical model for isolated convective precipitation events. Journal of Advances in Modeling Earth Systems, 11, 360–375. https://doi.org/10.1029/2018MS001383
- Zunino, A. and Mosegaard, K. (2019), An efficient method to solve large linearizable inverse problems under Gaussian and separability assumptions, Computers & Geosciences, 122, 77-86. https://doi.org/10.1016/j.cageo.2018.09.005
- Häfner D., Jacobse R. L., Eden C., Kristensen M. R. B., Jochum M., Nuterman R., Vinter B. (2018), Veros v0.1-a fast and versatile ocean simulator in pure Python. Geoscientific Model Development, Vol. 11, No. 8, p. 3299-3312. https://doi.org/10.5194/gmd-11-3299-2018
- Nielsen L., Adalgeirsdottir G., Gkinis V., Nuterman R., Hvidberg C. (2018). The effect of a Holocene climatic optimum on the evolution of the Greenland ice sheet during the last 10 kyr. Journal of Glaciology, 64(245), 477-488. https://doi.org/10.1017/jog.2018.40
- Nielsen, S. B., Jochum, M., Eden, C., Nuterman, R. (2018). An energetically consistent vertical mixing parameterization in CCSM4. Ocean Modelling, 127, 46-54. https://doi.org/10.1016/j.ocemod.2018.03.002
- Poulsen, M. B., Jochum, M., Nuterman, R. (2018). Parameterized and resolved Southern Ocean eddy compensation. Ocean Modelling, 124, 1-15. https://doi.org/10.1016/j.ocemod.2018.01.008
How to Acknowledge Granted DC3 Resources
The author(s) is(are) grateful for computing resources and technical assistance provided by the Danish Center for Climate Computing, a facility built with support of the Danish e-Infrastructure Corporation, Danish Hydrocarbon Research and Technology Centre, VILLUM Foundation, and the Niels Bohr Institute.