An Architecture for Big Data Analysis in the Context of Social Media: The Case of Twitter
Keywords:
Big Data analytics, Big Data analytics frameworks, unstructured data analysis, Big Data architecture.Abstract
Big Data analytics puts lots of pressure on ICT providers for developing new tools and technology to manage complex data. Challenges include storing and processing of huge volume of unstructured data, handling high-velocity data streams, cleansing noise and abnormality in the data, analyzing the data and finding value or meaning full results. Current tools and technologies are incapable to store, process and analyze huge amount of diverse data. In this research, we proposed an architecture that enables an effective storage and analysis of unstructured data, and developed a prototype to evaluate and test it. This research presents the work of investigating and designing a Big Data analysis solution using a MapReduce platform named Hadoop and a data warehouse infrastructure built on top of Hadoop called Hive which enables the analysis of unstructured data. The proposed architecture is validated through the development of a prototype that can analyze unstructured data using Hadoop MapReduce, HDFS (Hadoop File System), and Hive. We also evaluated whether this architecture is achieving its goals and objectives. The evaluation is conducted through streaming Twitter data, storing, processing, finally fetching and performing sentiment analysis. Twitter, one of the largest social media sites, is used as a data source in our experiment. Hence, this study provided an architecture that helps to address problems related to Big Data analysis and besides it is novel due to the use of MapReduce and Hive in a unique way for Twitter data analysis. The results of this work can be applied by enterprises in sentiment analysis to understand how their customers feel about a particular product or service and to track how those opinions change over time, and also to get information regarding the relative performances of their competitors.
References
[2] Casado, R. and M. Younas, Emerging trends and technologies in Big Data processing. Concurrency and Computation: Practice and Experience, 2014.
[3] Kumar, R., et al., Apache Hadoop, NoSQL and NewSQL Solutions of Big Data. International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE). 1(6): p. 28-36.
[4] Zadrozny, P. and R. Kodali, Big Data and splunk, in Big Data Analytics Using Splunk. 2013, Springer. p. 1-7.
[5] Wieczorkowski, J. and P. Polak, Big Data: Three-aspect approach. Online Journal of Applied Knowledge Management.
[6] Bakshi, K. Considerations for Big Data: Architecture and approach. in Aerospace Conference, 2012 IEEE. 2012. IEEE.
[7] Chandarana, P. and M. Vijayalakshmi. Big Data analytics frameworks. in Circuits, Systems, Communication and Information Technology Applications (CSCITA), 2014 International Conference on. 2014. IEEE.
[8] Akerkar, R., Big Data computing. 2013: CRC Press.
[9] White, T., Hadoop: The definitive guide. 2012: " O'Reilly Media, Inc.".
[10] Hadoop, A. Hadoop 1.2.1 Documentation. MapReduce Tutorial. 2013 [cited 2015; Available from: http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Example%3A+WordCount+v1.0.
[11] Nugent, A., F. Halper, and M. Kaufman, Big Data for dummies. 2013: John Wiley & Sons.
[12] ActivSteps. Practical Data Science. 2013 [cited 2015; Available from: http://www.datascience- labs.com/hive/hiveql-data-definition/.
[13] services, A.w. Amazon Elastic MapReduce. 2009 [cited 2015; Available from: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/what-is-hue.html.
[14] cloudera. Hue Installation Guide. [cited 2015; Available from: http://cloudera.github.io/hue/docs-2.0.1/manual.html#_introduction.
[15] Das, T. and P.M. Kumar, Big Data analytics: A framework for unstructured data analysis. International Journal of Engineering Science & Technology, 2013. 5(1): p. 153.
[16] von Alan, R.H., et al., Design science in information systems research. MIS quarterly, 2004. 28(1): p. 75-105.
[17] Peffers, K., et al. The design science research process: a model for producing and presenting information systems research. in Proceedings of the first international conference on design science research in information systems and technology (DESRIST 2006). 2006.
[18] Liu, B., Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 2012. 5(1): p. 1-167.
[19] Twitter. Developers Documentation. Overview. 2015 [cited 2015; Available from: https://dev.twitter.com/overview/documentation.
[20] teach-ict.com. Real time processing. [cited 2015; Available from: http://www.teach-ict.com/as_a2_ict_new/ocr/A2_G063/332_designing_systems/processing_methods/miniweb/pg4.htm#.
[21] TechTarget. Big data buzz gets louder with Apache Hadoop and Hive. 2015 [cited 2015; Available from: http://searchcloudcomputing.techtarget.com/tip/Big-data-buzz-gets-louder-with-Apache-Hadoop-and-Hive.
[22] Pivotal, S.b. Spring XD Guide. Reference Guide. 2015 [cited 2015; Available from: http://docs.spring.io/spring-xd/docs/current/reference/html/.
[23] Atlassian. Apache Hive. SerDe. [cited 2015; Available from: https://cwiki.apache.org/confluence/display/Hive/SerDe.
[24] Sanchez, G. Mininig Twitter with R. Basic Sentiment Analysis in R 2012 [cited 2015; Available from: https://sites.google.com/site/miningtwitter/questions/sentiment/analysis.
[25] Pradhan, M. sentiments.rar. 2014 [cited 2015; Available from: https://drive.google.com/file/d/0B7wy3b65I3jiUzN4WHBkVXdFejA/edit.
Downloads
Published
Issue
Section
License
Authors who submit papers with this journal agree to the following terms.
