Durham hosts a multitude of compute platforms which are free to use for Durham researchers and, consequently, enable a lot of research delivered by IDAS members. If you decide to use these machines, please consult their individual acknowledgment requirements.
This machine is hosted in Durham on behalf of the N8 Centre of Excellence in Computationally Intensive Research (N8CIR). It is a multi-GPU cluster managed by ARC’s platforms team and provides multiple nodes connected through high-speed network were each node hosts four GPUs with shared memory, i.e. 4x40GB per node. The machine can be used for all types of research work, yet is particularly well-suited for machine learning/training which requires up to 160 GB of GPU memory. Access for Durham researchers is lightweight through a web form where you specify the required GPU needs. Non-N8 members can apply through UKRI calls.
The machine now also hosts a few Grace Hopper supernodes.
This machine is Durham’s internal CPU workhorse with 100+ nodes each equipped with a 128 core EPIC AMD CPU and connected through a high-speed network. The machine is predominantly used for research where you need many CPU nodes for one run (e.g. communicating through MPI) or studies which need many single CPU nodes. Access for student research projects can be granted, too. Finally, this machine is meant to facilitate the development and profiling of larger CPU-based applications. Access is free for Durham users.
This GPU cluster is hosted at the Department of Computer Science and features a multitude of NVIDIA GPUs. The system is used for teaching and research work, predominantly featuring ML development and single node training. Access can be granted to non-Computer Science staff and research students upon request.
Durham hosts its own HPC testbeds. The cluster is embedded into Cosma (but independent of it) and available to all Durham researchers who want to experiment with the latest or exotic hardware. Examples of nodes comprise the latest Intel, NVIDIA and AMD GPUs or interconnection networks which are not found in other production-type systems in the UK. Access is free through a lightweight sign-up mechanism, and the system is predominantly designed for development and performance assessment studies, even though shorter production, i.e. compute and training, runs.
COSMA is the Distributed Research utilising Advanced Computing (DiRAC) Memory Intensive system, i.e. a part of the DiRAC national facility that supports research in cosmology, astrophysics, particle physics and nuclear physics. It is hosted in Durham, but access is usually grant through STFC. It hosts a large number of CPUs with large main memory, connected through high-speed interconnects, and is predominantly designed for long-running, large-scale production runs.
Below are some recommendations for particular use cases:
… run a large-scale machine learning model over a long time and/or with massive GPU memory requirements: use Bede
… develop a massively parallel CPU-based application: use Hamilton … run a large number of single node CPU jobs: use Hamilton … develop and tune some codes for the latest GPU architectures (both for AI and simulation): use a node from the testbeds
… train a machine learning model, but a single GPU is sufficient as I don’t need more than 80 GB main memory: preferrable use the Grace Hopper nodes in either Bede or the testbeds
… train a machine learning model as a student (non-PhD): ask if you can get access to NCC
… develop a machine learning model as a student (non-PhD): ask if you can get access to NCC
… develop a machine learning model as CS member of staff: you might want to use NCC if the testbeds are not available