Radio astronomy, like many other computational challenges, is producing huge multi-dimensional data cubes, which require the application of complex computational workflows on High Performance Compute (HPC) clusters. These workflows are typically I/O bound and thus trying to get any faster requires to optimize I/O. Even when assuming that we would be able to load all the data into the cluster memory, depending on the access patterns, there will be enormous amounts of inter and cross-node communication required to get the data from one process to the next. HPC clusters are inherently non homogeneous in terms of I/O performance within a single computer (due to NUMA architectures) and worse across multiple computers due to the network topology of the cluster and even worse when it comes to accessing data on the (shared), multi-level file system. Access patterns in typical radio astronomy algorithms on the other side are highly dependent on the algorithms applied to the data and thus ultimately on the science goals. These access patterns can differ from the one extreme of being perfectly aligned with the native order of the data cube to the other extreme of being orthogonal to it. In some cases the workflows require access in one direction first and then in another one, making optimal layout for both of them essentially impossible. Having such an issue on a single machine has severe impacts on overall performance already, but on a cluster it might lead to a situation where every worker of a N times distributed algorithm initiates access to pieces of data residing on many, if not all, nodes of the cluster. This will lead to extreme data movement activities across the whole cluster. In this project we will investigate the effect of such conflicting access patterns and try to address the issue by distributing the multi-dimensional data using so called space filling curves such like Hilbert curves. In real life the distribution is also highly dependent on the actual layout and properties of the target HPC cluster. Thus it is necessary to identify at least the minimal set of parameters required to adjust the data distribution to differing target HPC systems.
We are interested to hear from potential candidates from a computer science or software engineering background with a firm interest in applying this expertise to scientific exploration and knowledge extraction. People with a background in other sciences, but with a solid knowledge of software development practices and tools would be equally suited. The candidate would join an active multi-disciplinary research and development group with many scientific and commercial cross fertilisation possibilities as well as excellent international collaborations. The work will be in collaboration with the American Oak Ridge Laboratory, one of the leading institutions in high performance computing.