Advisory Committee Chair
Purushotham Bangalore
Advisory Committee Members
Anthony Skjellum
Chengcui Zhang
Jay Lofstead
Kathryn Mohror
Peter Pirkelbauer
Document Type
Dissertation
Date of Award
2016
Degree Name by School
Doctor of Philosophy (PhD) College of Arts and Sciences
Abstract
Scientific applications running on extreme scale high performance computing (HPC) machines require high performance I/O and storage system to carry out operations including checkpointing, analysis and visualization of simulation outputs. However, there is significant imbalance in I/O and compute capabilities in HPC systems, and as a result, applications often have to spend a large amount of time to complete I/O. Such imbalance is projected to be higher in the future exascale systems. The I/O problem is exacerbated by I/O interference that occurs when processes of an application or multiple applications simultaneously access the shared storage systems, such as a parallel file system. I/O interference brings degradation and variation in the performance achieved by applications. Interference can also be seen during network access for data transfer. Researchers have identified that the addition of a faster storage tier, such as solid state drives, between compute cluster and parallel file system of HPC systems can potentially reduce the impact of the gap between compute and I/O capabilities. However, such extension of the storage stack adds to the I/O challenges. The faster storage tiers, memory and the added intermediate tier, will have limited space and bandwidth, and therefore can create congestion. In addition, when used as shared resource across applications, these tiers are also prone to I/O interference, similar to the parallel file system. So, data should be moved across the storage stack while considering the possible I/O interference across the hierarchy. This dissertation investigates the problem of managing data flow through the HPC storage stack, which leads towards understanding of the challenges and towards design of scheduling and coordination techniques to manage data flow through the storage stack. It presents a hierarchical coordination framework to manage access to the different storage tiers such as a parallel file system and a burst buffer, and to manage data flow across multiple tiers. The framework includes a combination of global I/O coordination strategy that manages system wide accesses to the shared storage tiers, and decentralized data traffic control techniques that control data traffic locally on the storage servers, in a scalable manner. It also provides quality of service(QoS) mechanisms that capture and address heterogeneous performance requirements of the data traffic, which is the result of diverse application characteristics and expanding use cases for the evolving HPC storage stack. Empirical experiments on existing supercomputers, and simulations configured with parameters of future machines demonstrate the effectiveness and performance benefits of the proposed management framework. This dissertation contributes towards more efficient HPC storage stack which is capable of effectively supporting large scale, I/O intensive scientific applications on present and future extreme scale HPC systems.
Recommended Citation
Thapaliya, Sagar, "Managing Data Flow Through The Storage Hierarchy Of Extreme Scale Hpc Systems" (2016). All ETDs from UAB. 3123.
https://digitalcommons.library.uab.edu/etd-collection/3123