Big Data 4V's

In an earlier post, we discussed What is Big data...? As there is no single definition for Big Data. Now I will be discussing the 4V's Big Data.

THE 4 V’S OF BIG DATA



The general agreement of the day is that there are specific attributes that define big data. In most big data, these are called the four V’s: volume, variety, velocity, and veracity. (You might consider a fifth V, value.)

VOLUME

The main characteristic that makes data “big” is the sheer volume. It makes no sense to focus on minimum storage units because the total amount of information is growing exponentially every year.
Volume is the amount of data generated by organization or individuals. Today, the volume of data in the most organization is approaching exabytes. Some experts predict the volume of data to reach in the coming years.
For that same year, EMC, a hardware company that makes data storage devices, though it was closer to 900 exabytes and would grow by 50 percent every year. No one really knows how much new data is being generated, but the amount of information being collected is huge.

VARIETY

Variety is one the most interesting developments in technology as more and more information is digitized. Traditional data types (structured data) include things on a bank statement like date, amount, and time. These are things that fit neatly in a relational database.
Structured data is augmented by unstructured data, which is where things like Twitter feeds, audio files, MRI images, web pages, weblogs are put — anything that can be captured and stored but doesn’t have a meta-model (a set of rules to frame a concept or idea — it defines a class of information and how to express it) that neatly defines it.
Unstructured data is a fundamental concept in big data. The best way to understand unstructured data is by comparing it to structured data. Think of structured data as data that is well defined in a set of rules. For example, money will always number and have at least two decimal points; names are expressed as text; dates follow a specific pattern.
With unstructured data, on the other hand, there are no rules. A picture, a voice recording, a tweet — they all can be different but express ideas and thoughts based on human understanding. One of the goals of big data is to use technology to take this unstructured data and make sense of it.

VELOCITY

Velocity is the frequency of incoming data that needs to be processed. Think about how many SMS messages, Facebook status updates or credit card swipes are being sent on a particular telecom carrier every minute of every day, and you’ll have a good appreciation of velocity. A streaming application like Amazon Web Services Kinesis is an example of an application that handles the velocity of data.

VERACITY

Veracity refers to the trustworthiness of the data. Can the manager rely on the fact that the data is representative? Every good manager knows that there are inherent discrepancies in all the data collected.

VALUE


It may seem painfully obvious to some, but a real objective is critical to this mashup of the four V’s. Will the insights you gather from analysis create a new product line, a cross-sell opportunity or a cost-cutting measure? Or will your data analysis lead to the discovery of a critical causal effect that results in a cure to a disease?

Comments

Popular Posts