How Big MNC’s manage Big Data?

Yash Hirulkar
3 min readSep 28, 2021

What is Big Data?

Big data is a combination of structured , semistructured and unstructured data collected by organizations that can be mined for information and used in machine learning projects, predictive modeling and other advanced analytics applications

The 3V’s of Big Data are ->

Volume:

The amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data. This can be data of unknown value, such as Twitter data feeds, clickstreams on a webpage or a mobile app, or sensor-enabled equipment

Variety:

Variety refers to the many types of data that are available

Velocity:

Velocity is the fast rate at which data is received and (perhaps) acted on. Normally, the highest velocity of data streams directly into memory versus being written to disk.

Now let’s talk about how how Big MNC’s like Google , Facebook , etc handle Big Data

GOOGLE:

Google currently processes over 20 petabytes of data per day through an average of 100,000 MapReduce jobs spread across its massive computing clusters.

What is Mesa ?Velocity is the fast rate at which data is received and (perhaps) acted on. Normally, the highest velocity of data streams directly into memory versus being written to disk.

Now let’s talk about how how Big MNC’s like Google , Facebook , etc handle Big Data

What is Mesa ?

Mesa is a highly scalable analytic data warehousing system that handles petabytes of data, processes millions of row updates per second, and serves billions of queries that fetch trillions of rows per day. Mesa is geo-replicated across multiple datacenters and provides consistent and repeatable query answers at low latency, even when an entire datacenter fails.

In the Mesa systems, the data is stored in tables where each updated batch is tagged with a version.

Mesa pre-aggregates versioned data and stores it using deltas. Each delta have a set of rows (with no repeated keys) and a delta version (number).

Physical Data and Index Formats

Mesa deltas are created and deleted based on the delta compaction policy. Once a delta is created, it is immutable, and therefore there is no need for its physical format to efficiently support incremental modification. The rows in a delta are stored in sorted order in data files of bounded size (to optimize for filesystem file size constraints).

Mesa controller/worker framework

Mesa typically uses the MapReduce framework for parallelizing the execution of different types of workers. One of the challenges here is to partition the work across multiple mappers and reducers in the MapReduce operation.

Mesa System Architecture

Mesa is built using common Google infrastructure and services, including BigTable (for metadata) and Colossus (data files). Mesa runs in multiple datacenters, each of which runs a single instance.

Each Mesa Datacenter instance is composed of two subsystems: update/ maintenance and querying. These subsystems are decoupled, allowing them to scale independently. All persistent metadata is stored in BigTable and all data files are stored in Colossus.

--

--