What is Big Data..?

Think of the following:


  • Every second on an average, there are 6000 tweets on Twitter.
  • Every minute, nearly 510,000 comments are posted 293,000 statuses are updated, and 136,000 photos are uploaded on Facebook.
  • Every hour, Walmart, a global discount departmental store chain, handles more than 1 million customer transaction.
  • Every day, consumer make around 11.5 million payments by using PayPal.


We live in a digital world where data is increasing rapidly because of the ever-increasing use of the Internet, sensors, and heavy machines at the very high rate. The sheer volume, variety, velocity, and veracity of such data is signified by the term 'Big Data'. Big data is structured, unstructured, semi-structured or heterogeneous in nature. It becomes difficult for the computing system to manage 'Big data' because of the immense speed and volume at which it is generated. Traditional data management, warehousing, and analysis system fizzle to analyze this type of data. due to its management, warehousing, and analysis systems fizzle to analyze this type of data. Due to its complexity, big data is stored in distributed architecture file system.

Hadoop by Apache is widely used for storing and managing Big Data. Analyzing Big Data is a challenging task as it involves large distributed file system, which should be fault tolerant, flexible and scalable.

Data is everywhere, in every industry, in the form of numbers, images, videos, and text. As data continues to grow, so does the need to organize it. Collecting such huge amount of data would just be a waste of time, effort, and storage space if it cannot be put to any logical use. The need to sort, organize, analyze and offer this critical data in a systematic manner leads to rising of the much-discussed term, Big Data.
The process of capturing or collecting Big Data is known as 'datafication'. Big Data is 'datafied' so that it can be used productively. Big Data cannot be made useful by simply organizing it, rather the data's usefulness lies in determining what we can do with it.
Features of Big Data:

Structuring of Big Data 

Structuring of data, in simple terms, is arranging the available data in a manner such that it becomes easy to study, analyze, and derive the conclusion from it. The question arises that, what is need of structuring?
In daily life, you may have come across question like:
  • How do I use to my advantage the vast amount of data and information I come across?
  • Which news article should I read of the thousands I come across?
  • How do I choose a book of the million available on my favorite sites or stores?
  • How do I keep myself updated about new events, sports, inventions, and discoveries taking place across the globe?
Today, solutions to such questions can be found by information processing systems. These systems can analyze, and structure a large amount of data specifically for you on the basis of what you searched, what you looked at, and for how long you remained at a particular page or website. In other words, structuring data helps in understanding user behaviors, requirements, and preferences to make personalized recommendations for every individual.


Types of Data

Data that comes from multiple sources such as databases, weblogs, chat history and GPS maps, varies in its format. However, different formats of data need to be made consistent and clear to be used for analysis. Data is obtained primarily from the following types of sources:
  • Internal sources, such as organizational or enterprise data
  • External sources, such as social data
On the basis of the data received from the sources mentioned, Big Data comprises :
  • Structured Data
  • Unstructured Data
  • Semi-structured Data

Structured Data

Structured data is data that has been organized into a formatted repository, typically a database, so that its elements can be made addressable for more effective processing and analysis. Processing structured data is much easier and faster than processing data without any specific patterns.

Sources of structured data  are:

  1. Relational Database 
  2. Flat Files in the form of records.
  3. Multidimensional databases.
  4. Legacy Database

Unstructured Data

Unstructured data is a set of that might or might not have any logical or repeating patterns.

Sources of unstructured data are:
  1. Text both internal and external to an organization-Documents, logs, survey, results, feedbacks, and emails from both within and across the organization.
  2. Social media- Data obtained from social networking platform, including Youtube, Facebook, Twitter, Linkedin, and Flickr.
  3. Mobile data- data such as text message and location information.

Semi-Structured Data

Semi-structured data, also known as having a schema-less or self-describing structure, refers to a form of structured data that contains tags or markup elements in order to separate elements and generate hierarchies of records and fields in the given data.

Sources of semi-structured data are:
  1. File systems such as Web data in form of cookies
  2. Data exchange formats such as Javascript Object Notation (JSON) data.

Comments

Popular Posts