All ETDs from UAB

Advisory Committee Chair

Da Yan

Advisory Committee Members

Purushotham Bangalore

Sidharth Kumar

Carmeliza Navasca

Chengcui Zhang

Yang Zhou

Document Type

Dissertation

Date of Award

2022

Degree Name by School

Doctor of Philosophy (PhD) College of Arts and Sciences

Abstract

Finding from a big graph those subgraphs that satisfy certain conditions is useful in many applications such as community detection and subgraph matching. These problems have a high time complexity, but existing systems to scale them are all IO-bound in execution. We propose the first truly CPU-bound distributed framework called G-thinker that adopts a userfriendly subgraph-centric vertex-pulling API for writing distributed subgraph mining algorithms. To utilize all CPU cores of a cluster, G-thinker features (1) a highly concurrent vertex cache for parallel task access and (2) a lightweight task scheduling approach that ensures high task throughput. These designs well overlap communication with computation to minimize the CPU idle time and help G-thinker achieve orders of magnitude speedup compared with the existing subgraph-centric system. However, the old G-thinker design does not balance the workloads of different subgraphmining tasks sufficiently, leading to the straggler problem when mining expensive pseudoclique structures such as quasi-cliques and k-plexes. Recently, we proposed a system-algorithm codesign solution which will address this challenge by redesigning G-thinker’s execution engine to prioritize long-running tasks for mining, and by utilizing a novel time-delayed divide-andconquer strategy to effectively decompose the workloads of long-running tasks to improve load balancing. Moreover, since cliques are defined over undirected graphs, existing pseudo-clique definitions also only work on undirected graphs, limiting their application in many real networks that are directed. We generalized the concept of quasi-cliques to directed and proposed an efficient recursive algorithm that integrates many effective pruning rules that are validated by ablation studies. We also study the finding of top-k large quasi-cliques directly by bootstrapping the search from more compact quasi-cliques, to scale the mining to larger networks. Inspired by this parallel paradigm, I also propose a novel programming framework, called T-thinkerQ, for answering online subgraph queries in parallel following the TLAT paradigm. T-thinkerQ utilizes a novel active task-queue list to ensure the fairness that queries are answered in the received order. To track query progress so that users are timely notified when a query iii completes, T-thinkerQ also adopts a novel lineage-based design that keeps track of how subtasks are generated by straggler tasks for divide-and-conquer processing. We use four kinds of subgraph queries to demonstrate the programming friendliness of T-thinkerQ as well as its excellent CPUscalability.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.