BIG DATA ANALYSIS
Big data analytics and analysis techniques have been used by people over the century to aid them in decision making processes. During the past twenty years, the speed and volume with which data is being generated has increased tremendously beyond comprehension. In 2013, the total data in the world amounted to 4.4 zettabytes (Sagiroglu, S & Sinanc, D. 2013, 42-47). With this much increase in the data being generated daily, it is impossible to analyze it despite today’s advanced technology. Big data transformed from the traditional analysis of data in the last decade because of this growing need to process these unstructured and large sets of data. The background of big data and analytics can be divided into four phases:
• Big data phase one
Big Data, data analysis, and data analysis originate from the data management domain. They rely heavily on extraction, optimization and storage techniques that are used in Relational Data Management Systems. Data warehousing and management are the main components in the first phase of Big Data (Mannoppa, A. 2020). This phase gives us the foundation of the modern data analysis using techniques and tools such as standard reporting, database queries, and analytical processing online.
• Big Data phase two
The web and the internet began to offer specific opportunities for data collection and analysis in the 2000s. The expansion in online stores and increase in web traffic drove companies such as Amazon, eBay, and Yahoo to analyze their customers’ behavior through analyzing their IP addresses, their click rates, and their search logs. This helped in opening up more possibilities to the world. From data analytics, Big Data, and data analysis point of view, web traffic that was based on HTTP introduced a big increase in unstructured and semi-structured data. Because of this, companies needed to find new storage and approaches to deal with the unstructured and semi-structured data in order to analyze them efficiently. The invention and growth of social media brought about the need for analysis techniques, tools, and technologies that would be able to extract useful information out of the unstructured data (Erl, T. et al. 2016).
• Big Data phase three
Although unstructured content that is based on the web is still the common focus for most organizations in big data, data analysis, and data analytics, there are emerging possibilities for data retrieval using mobile devices. Currently not only do mobile phones give us possibilities for behavioral data analysis (such as searches and queries), but they also give us the possibility to analyze and store data based on location that is GPS data. With this advancement in mobile devices, tracking movement has been made possible as well as analyzing data that is related to health and physical movements (for instance how many steps a user takes per day) (Mannoppa, A. 2020). The rise of internet enabled devices that are based on sensors is increasing the rate of data generation. So currently Internet of things (IoT), thermostats, and millions of smart TVs are generating data in zettabytes per day. This makes extracting useful information and valuable data from these data sources hard.
The biggest challenges of Big Data and its analysis arise from the 3Vs that is volume, velocity, and variety which are the key dimensions of Big Data. Espinosa. A, et al. (2019) argue that veracity should be included in the Vs of Big Data because of the challenges of cleansing it. The following are the challenges that are faced when dealing with big data in rank order form the most challenging to the least (Espinosa, A et al. 2019, 11).
Challenge % count
Analytical challenges: 35.5%
Data transport and storage: 29%
Data management: 26.9%
Analytical tools: 25.8%
Data growth rate: 24.7%
Validation of data: 18.3%
Input and output processes: 16.1%
Security and compliance: 15.1%
Unstructured and structured data: 12.9%
Expansion and growth of data: 10.8%
Data hybrid: 8.6%
Ownership of data: 7.5%
Time to information: 5.4%
Value mining: 2.2%
Information mining: 1.1%
World prediction: 0%
The tremendous increase in the amount and complexity of data being collected remains the biggest challenge of Big Data, data analysis and data analytics. This growth is making it harder to design the appropriate systems to analyze the data in order to extract useful information that could be used in decision making. Besides the increased growth, other challenges facing big data is the noise, relational nature and the opacity of the data which makes it difficult to analyze it, move it, and store it in such a way that none of it will be corrupted or lost. On the technical view, the features that make Big Data hard to handle are what bring about most challenges (Espinosa, A, et al. 2019). Big Data also poses a big challenge in matters of privacy, the issues of privacy related to analyzing, moving and storing the data because of the increased breaches. Challenges in management include access schemes, governance, and distribution of data. The main issue is how the data is distributed geographically, the people that have access to it and which parts of the data is owned by who.
In conclusion, Big Data involves large amounts of data that has velocity, variety and volume (the three Vs or the basic dimensions of big data). This amount is increasing at a very high speed and is raising concerns and challenges. Analyzing the bulk data to extract some useful information is difficult. Moving the data, and storing it is also a big challenge.
Erl, T. Buhler, P. & Khattak, W.,2016. Big Data Adoption and Planning Considerations.Big Data Analytics Cycle. Pp 11. Available at: http://www.informit.com/articles/article.aspx?p=2473128&seqNum=11
Espinosa, A. Kaisler, S. Armour, F. & Money. W., 2019. Big Data Redux: New Issues and Challenges Moving Forward. Proceedings of the 52nd Hawaii International Conference on Systems and Sciences.
Mannoppa, A., 2020. Data Science vs. Big Data vs. Data Analytics. Big Data and Analytics. Retrieved from: https://www.simpllearn.com/data-science-vs-big-data-vs-data-analytics-article
Sagiroglu, S. & Sinanc, D., 2013. Big data: A review. S.1:IEEE, pp. 42-47