Advisory Committee Chair
Alan Sprague
Advisory Committee Members
Michael Bailey
Tyler Moore
Nitesh Saxena
Chengcui Zhang
Document Type
Dissertation
Date of Award
2016
Degree Name by School
Doctor of Philosophy (PhD) College of Arts and Sciences
Abstract
Phishing has been a problem since before the early 2000s and has only become more prevalent and diverse since. Phishing countermeasures have been developed and used to prevent or mitigate phishing attacks. However, each countermeasure has pros and cons and not every countermeasure is effective in every situation. Choosing the best suited phishing countermeasure or combination of phishing countermeasures to use and track their effectiveness requires grouping phish based upon common characteristics and tactics used by phish or phish grouping. To be effective phish grouping needs to produce dependable groupings, quickly produce groups, and analyze large volumes of phish. This dissertation develops the Simple Set Comparison (SSC) tool. The SSC tool enables existing phish grouping processes to run faster. It also decreases the maximum amount of memory required allowing grouping of a larger number of phish. The SSC tool utilizes a multi-step approach that makes use of parallel processing to improve runtime and reduce the maximum amount of memory required. This dissertation evaluates the efficiency and quality of using the SSC tool with the SLINK style phish grouping algorithm used by Malcovery Security. The SLINK style algorithm using the SSC tool is compared to the SLINK style algorithm without using the SSC tool on the ability to produce a clustering, the quality of the clustering produced, and the runtime to produce a clustering. Four experiments are run using three different implementations of the SLINK style clustering algorithm over large phishing data sets. The SSC tool improved the runtime of the SLINK style algorithm in each experiment. The SLINK style algorithm algorithm with the SSC tool produces results 37 times faster than without in the first experiment, 404 times faster in the second experiment, 6 times faster in the third experiment, and 10.8 times faster in the fourth experiment. The tool produces results faster, while maintaining equivalent quality. The SSC tool improves the SLINK style algorithm's runtime and reduces the maximum amount of memory required to produce a clustering, allowing larger volumes of phish to be grouped, and produces similar clusterings to the SLINK style algorithm without the tool.
Recommended Citation
Britt, Jason Robert, "Clustering Phish Using The Simple Set Comparison Tool" (2016). All ETDs from UAB. 1262.
https://digitalcommons.library.uab.edu/etd-collection/1262