CAP Theorem

Eric Brewer, systems professor at the University of California, Berkeley, brought the different trade-offs together in a keynote address to the Principles of Distributed Computing (PODC) conference in 2000.He presented the CAP theorem.

The CAP Theorem is a fundamental theorem in distributed systems that states any distributed system can have at most two of the following three properties.

Consistency
Availability
Partition tolerance
Consistency in distributed system:

Let’s consider a very simple distributed system. Our system is composed of two servers, G1 and G2. Both servers are keeping track of the same variable, V, whose value is initially V1. G1 and G2 (nodes of the distributed systems) can communicate with each other and can also communicate with external clients.

Here’s how Gilbert and Lynch describe consistency:

“any read operation that begins after a write operation completes must return that value, or the result of a later write operation”

For example, when a client writes V1 to node G1, before acknowledges to client, node G1 replicates the copy of V1 to node G2 and once G2 acknowledges to G1, then only G1 sends success confirmation to the client. So, when client reads the value from the node G2 it will always returns the same value i.e. V1. If any distributed systems act like this then it is in Consistent state.

Availability:

Here’s how Gilbert and Lynch describe availability.

“Every request received by a non-failing node in the system must result in a response”

In an available system, if our client sends a request to a server and the server has not crashed, then the server must eventually respond to the client. The server is not allowed to ignore the client’s requests.

Network Partition tolerant:

Partition tolerance in CAP means tolerance to a network partition. An example of a network partition is when two nodes can’t talk to each other, but there are clients able to talk to either one or both of those nodes.

A CA system guarantees strong consistency, at the cost of not being able to process requests unless all nodes are able to talk to each other.

An AP system is able to function during the network split, while being able to provide various forms of eventual consistency.

R + W > N à where R: reads, W:writes and N total number of replicas.

If The number of nodes the client writes (W) to + The number of nodes the client reads (R) from > Number of replicas, then system is Consistent state

Ex: if I write into one node and read from one node and number of replicas is 3 then that distributed system is known as Eventually consistent

If I write 2 and read 2 and number of replicas are 2 then the distributed system is Strongly consistent.

ACID VS Consistency in CAP theory:

When it comes to Traditional relational database systems, we all know that ACID is a benchmark that it’s a must property. Without ACID properties we can’t call a database as RDBMS.

Luckily for the world of distributed systems – like Google’s BigTable; Amazon’s Dynamo and Facebook’s Cassandra deal with a loss of consistency and still maintains system reliability with the help of BASE – stands for Basically Available Soft state, Eventual consistency.

Basically Available: This constraint states that the system does guarantee the availability of the data as regards CAP Theorem; there will be a response to any request. But, that response could still be ‘failure’ to obtain the requested data or the data may be in an inconsistent or changing state, much like waiting for a check to clear in your bank account.

Soft state: The state of the system could change over time, so even during times without input there may be changes going on due to ‘eventual consistency,’ thus the state of the system is always ‘soft.’

Eventual consistency: The system will eventually become consistent once it stops receiving input. The data will propagate to everywhere it should sooner or later, but the system will continue to receive input and is not checking the consistency of every transaction before it moves onto the next one.

Conclusion: Here is my two cents on Applying Consistency on every single transaction (when transactions are in trillions) is costly affair and it needs lot of complex code needs to be developed by the developers without compromising the performance. Fortunately companies like Google, Yahoo, Twitter, Amazon etc are able to interact with the customers across the globe continuously, with the necessary availability and partition tolerance, while keeping their costs down, their systems up, and their customers happy.

Having said that it’s ok for the distributed systems not to maintain ACID.

Which applications/companies are using distributed systems which has Eventual Consistency?

Amazon’s S3 is one popular database where it supports the Eventual Consistency. More can be found at https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html (Links to an external site.)

The companies which uses Amazon S3 database includes but not limited to are:

Smart Vision Labs, Inc.
Cane River Pecan Company
Deep Blue Communications, Inc.
Sources:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: