Managing Data Flow Through The Storage Hierarchy Of Extreme Scale Hpc Systems

Sagar Thapaliya, University of Alabama at Birmingham

Advisory Committee Chair

Purushotham Bangalore

Advisory Committee Members

Anthony Skjellum

Chengcui Zhang

Jay Lofstead

Kathryn Mohror

Peter Pirkelbauer

Document Type

Dissertation

Date of Award

2016

Degree Name by School

Doctor of Philosophy (PhD) College of Arts and Sciences

Abstract

Scientific applications running on extreme scale high performance computing (HPC) machines require high performance I/O and storage system to carry out operations including checkpointing, analysis and visualization of simulation outputs. However, there is significant imbalance in I/O and compute capabilities in HPC systems, and as a result, applications often have to spend a large amount of time to complete I/O. Such imbalance is projected to be higher in the future exascale systems. The I/O problem is exacerbated by I/O interference that occurs when processes of an application or multiple applications simultaneously access the shared storage systems, such as a parallel file system. I/O interference brings degradation and variation in the performance achieved by applications. Interference can also be seen during network access for data transfer. Researchers have identified that the addition of a faster storage tier, such as solid state drives, between compute cluster and parallel file system of HPC systems can potentially reduce the impact of the gap between compute and I/O capabilities. However, such extension of the storage stack adds to the I/O challenges. The faster storage tiers, memory and the added intermediate tier, will have limited space and bandwidth, and therefore can create congestion. In addition, when used as shared resource across applications, these tiers are also prone to I/O interference, similar to the parallel file system. So, data should be moved across the storage stack while considering the possible I/O interference across the hierarchy. This dissertation investigates the problem of managing data flow through the HPC storage stack, which leads towards understanding of the challenges and towards design of scheduling and coordination techniques to manage data flow through the storage stack. It presents a hierarchical coordination framework to manage access to the different storage tiers such as a parallel file system and a burst buffer, and to manage data flow across multiple tiers. The framework includes a combination of global I/O coordination strategy that manages system wide accesses to the shared storage tiers, and decentralized data traffic control techniques that control data traffic locally on the storage servers, in a scalable manner. It also provides quality of service(QoS) mechanisms that capture and address heterogeneous performance requirements of the data traffic, which is the result of diverse application characteristics and expanding use cases for the evolving HPC storage stack. Empirical experiments on existing supercomputers, and simulations configured with parameters of future machines demonstrate the effectiveness and performance benefits of the proposed management framework. This dissertation contributes towards more efficient HPC storage stack which is capable of effectively supporting large scale, I/O intensive scientific applications on present and future extreme scale HPC systems.

Recommended Citation

Thapaliya, Sagar, "Managing Data Flow Through The Storage Hierarchy Of Extreme Scale Hpc Systems" (2016). All ETDs from UAB. 3123.
https://digitalcommons.library.uab.edu/etd-collection/3123

Download

Included in

Arts and Humanities Commons

COinS

Managing Data Flow Through The Storage Hierarchy Of Extreme Scale Hpc Systems

Advisory Committee Chair

Advisory Committee Members

Document Type

Date of Award

Degree Name by School

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Managing Data Flow Through The Storage Hierarchy Of Extreme Scale Hpc Systems

Authors

Advisory Committee Chair

Advisory Committee Members

Document Type

Date of Award

Degree Name by School

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner