All ETDs from UAB

Advisory Committee Chair

Leon Jololian

Advisory Committee Members

Karthikeyan Lingasubramanian

Tanik M Murat

Document Type

Thesis

Date of Award

2017

Degree Name by School

Master of Science in Electrical Engineering (MSEE) School of Engineering

Abstract

The primary research of this thesis focused on the development of a Big Data framework for performing sentiment analysis on social networking sites. Over the last decade, social media has been gaining lots of popularity for sharing thoughts and feelings with a user base of over two billion users. Social networking sites such as Twitter, Facebook, and Instagram are increasingly becoming huge repositories of thoughts and opinions on a wide variety of topics. Several public and private organizations, such as Government and companies are attempting to exploit the expressed preferences, opinions, and attitudes regarding politics, commercial products and other matters of personal importance towards a competitive edge. One of the efficient ways to get this information is by performing sentiment analysis on these electronic repositories. With the data being ubiquitous, the bottlenecks here are processing speed, storage, and time involved in the traditional storage system. Therefore, in order to deal with the data processing of these massive amounts of data, some special tools and techniques have been offered by Big Data framework. Much of the current work involving sentiment analysis is performed on online customer reviews, blogs, forums, etc., using lexical and machine learning methods by adopting batch processing. In this thesis, we offer an innovative approach for developing a generalized Big Data framework for performing social network sentiment analysis. This framework consists of a real-time collection of streamed data followed by machine learning methods for performing sentiment analysis. For getting data and for performing sentiment analysis, a live mini-blogging website ‘Twitter’ is considered which generates tweets at the rate of 6000 per second with an inherent restriction of no more than 140 characters per tweet. For processing speeds and storage capacity, a large-scale cluster-computing framework like the Apache Spark was used, due to its capacity in handling data and at high speed. As for the sentiment analysis on the collected tweets, the Naive Bayes algorithm was used. The final generalized framework was tested on two case studies using different problem scenarios with considerable success.

Included in

Engineering Commons

Share

COinS