Advisory Committee Chair
Purushotham V Bangalore
Advisory Committee Members
Matthew L Curry
Robert Hyatt
Sidharth Kumar
Anthony Skjellum
Document Type
Dissertation
Date of Award
2019
Degree Name by School
Doctor of Philosophy (PhD) College of Arts and Sciences
Abstract
Parallel file systems that exploit Redundant Arrays of Inexpensive Devices (RAID) as the mechanism for greater resilience are primarily intended to provide high bandwidth and low latency. Quantifying and studying the trade-offs among reduced run time (bandwidth and latency), resilience (availability and integrity), and cost (energy and capital) is important. For instance, distributing the checksums of RAID systems appears in conflict with the canonical parallel access patterns in high performance computing such as long sequential reads, random access, and checkpoint operations. Choices consequently have to be made between performance, concurrency, latency, energy, capital, integrity and availability of the data for normal operation as well as during recovery of a failed device. New strategies are emerging for exascale storage that create additional layers in the storage hierarchy. These strategies are primarily designed to take advantage of the economics of cloud storage technologies and especially the benefits of erasure coding. The Los Alamos National Laboratory has implemented a “Campaign” layer placed below the traditional Parallel File System where longer latencies and lower bandwidth can be traded for lower cost and higher capacities. In addition, Burst Buffers are now being used on top of the traditional Parallel File System to provide higher bandwidth and lower latency for petascale and beyond. These new layers are specialization over the Parallel File system based on trade-offs between cost and performance. In this dissertation we analyze the requirements of the HPC storage space and identify special problems in the archive layers. We leverage the GPGPU to provide erasure coding on large stripe sizes to increase performance and availability. We also show that data confidentiality can be provided along with erasure coding on GPGPU reducing the overall cost of data protection for nearline disk archive storage.
Recommended Citation
Haddock, Hampton Walker, "A Scalable Nearline Disk Archive Storage Architecture for Extreme Scale High Performance Computing" (2019). All ETDs from UAB. 1834.
https://digitalcommons.library.uab.edu/etd-collection/1834