Da Yan

Doctor of Philosophy (PhD) College of Arts and Sciences


Quasi-cliques and k-plexes are dense structures with established significance in graph mining, offering flexibility and resilience to data anomalies. However, mining these structures poses computational challenges due to their NP-hardness. In this study, we address these challenges and propose solutions for efficient mining, in addition to visualization tools for tuning and analysis. For quasi-clique mining, we adapted the distributed solution (G-thinker) [1] to a sharedmemory multi-core environment, making it accessible to average users. Our solution is two orders of magnitude faster than the recent solution by [2], and it scales almost ideally with the number of CPU cores. To facilitate user interaction and parameter tuning, we created QCQ-Viewer, enabling fine-tuning of mining parameters and intuitive examination of resulting quasi-cliques. In the context of frequent subgraph pattern mining, we introduced T-FSM [3], a system for efficiently mining frequent patterns in big graphs. T-FSM ensures high concurrency, limited memory usage, and effective load balancing, incorporating the more accurate Fraction-Score frequentness measure. To enhance usability, we developed FSM-Viewer, a graphical user interface enabling users to mine frequent subgraph patterns and explore matched instances batch by batch. This thesis also addresses the k-plex mining problem from two angles: the maximum k-plex problem and the maximal k-plex problem. For the maximum k-plex problem, we summarize recent works and introduce a visualization tool for network analysis. Regarding the maximal k-plex problem, we design new algorithms and new pruning and branching rules to efficiently find maximal k-plexes. This study makes a valuable contribution by introducing efficient mining techniques for quasi-cliques and k-plexes in large graphs. Moreover, it also provides visualization tools that facilitate the exploration of significant structures and insights in various applications. As a result, this research enables the discovery of meaningful patterns and enhances our understanding of complex networks.



