The compute servers are all multi processor machines running Linux. See the current resource usage page for the current status. All machines have the same operating system (Linux; CentOS 7 64-bits), configuration and software.
CPUs and memory
|100plus||ht, avx, avx2||64||32||768 Gb|
|hopper||ht, avx, avx2||64||32||512 Gb|
|maxwell||ht, avx, avx2||56||28||256 Gb|
|parzen||ht, avx, avx2||56||28||256 Gb|
|viterbi||ht, avx, avx2||56||28||256 Gb|
|watson||ht, avx||32||16||256 Gb|
|gauss||ht, avx||32||16||192 Gb|
|markov||ht, avx||32||16||192 Gb|
|neumann||ht, avx||32||16||192 Gb|
The nodes in the INSY cluster are heterogeneous, i.e. they have different types of hardware (processors, memory, GPUs), different functionality (some more advanced than others) and different performance characteristics. If a program requires specific features, you need to specifically request those for that job.
- ht: Hyper-threading processors (two CPUs per core, allocated in pairs)
avx: Advanced Vector Extensions (AVX) support
avx2: Advanced Vector Extensions 2 (AVX2) support
- All machines have multiple central processing units (CPUs) that perform all the computations. Each CPU can process one thread (i.e. a separate string of computer code) at a time. A computer program consist of one or multiple threads, and thus needs one or multiple CPUs simultaneously to do its computations.
Most programs use a fixed number of threads. Giving the program access to more CPUs than its number of threads will not make it any faster because it simply isn't capable of using the extra CPUs. When a program has less CPUs available than its number of threads, the threads will have to time-share the available CPUs (i.e. each thread only gets part-time use of a CPU), and the program will run slower. (And even slower because of the added overhead of the switching of the threads.) So it's always necessary to match the number of CPUs to the number of threads, or the other way around.
The number of threads running simultaneously determines the load of a server. If the number of running threads is equal to the number of available CPUs, the server is loaded 100% (or 1.00). When the number of threads that want to run exceed the number of available CPUs, the load rises above 100%.
- The CPU functionality is provided by the cores in the processor chips in the machines. Traditionally, one physical core contained one logical CPU, thus the CPUs operated completely independent. Most current chips feature hyper-threading: one core contains two logical CPUs. These CPUs share parts of the core and thus have some dependencies. Therefore these CPUs are always allocated in pairs by the job scheduler.
- All machines have large main memories for performing computations on big data sets. All programs (and users) share this memory, and must (together) make sure that they do not try to use more than the available amount of memory.
32-bit programs can only address (use) up to 3Gb (gigabytes) of memory.
|GTX 680||Kepler||3.0||1536||2 GB|
Some nodes also have additional Graphics Processing Units (GPUs) which support the CUDA-platform for General-Purpose computing on GPUs (GPGPU). Two different types of GPUs are available, so you should use the one that best matches the requirements of your program. The NVIDIA GeForce GTX 680 offers the best performance while the NVIDIA Quadro K2200 has more advanced functionality.
- The architecture defines the hardware functionality and performance characteristics of the GPU.
- There are several versions of CUDA, with each higher version supporting more advanced functionality. The CUDA Compute Capability specifies the highest CUDA-version supported by the GPU.
- The cores perform the computations. The more cores, the higher the potential parallelization of the algorithm.
- The GPUs provide their own internal (fixed-size) memory for storing data for GPU computations. All required data needs to fit in the internal memory or your computations will suffer a big performance penalty.