Distributed System

A distributed system is a collection of independent computers that work together to achieve a common goal by sharing resources, data, and communication across a network. In a distributed system, the computers or nodes are connected by a communication network, and they coordinate their actions by exchanging messages and sharing data.

Distributed systems can vary in size and complexity, from small-scale systems consisting of only a few computers to large-scale systems comprising thousands or even millions of nodes. They are used in many applications and domains, including cloud computing, peer-to-peer networks, social networks, online gaming, and scientific simulations.

The main advantage of a distributed system is that it can provide a high level of performance, scalability, and fault tolerance by allowing the workload to be spread across multiple nodes. In addition, a distributed system can enable the sharing of data and resources, leading to more efficient use of computing resources and reducing the need for data duplication.

However, designing and implementing a distributed system can be challenging due to the complexity of distributed algorithms and the need to handle issues such as concurrency, consistency, and fault tolerance. Moreover, distributed systems are often subject to network delays, hardware failures, and other issues that can impact performance and reliability.

Overall, distributed systems are an important tool in modern computing, enabling the creation of large-scale, highly available, and scalable applications and services.

Communication Mechanisms in Distributed Systems

Distributed systems rely on various communication patterns and design techniques:

Message Passing
Remote Procedure Calls (RPC)
Event Bus Pattern
Publish-Subscribe Models

Examples of distributed systems

Domain Name System (DNS)
- Distributed lookup table of hostname to IP address
Content Delivery Network (CDN)
Facebook and Google
- Massive scale
- Fast
- Very reliable
Email servers (SMTP)
TOR network
Phone networks
- Land line and cellular
Cars network electronic components via CANbus
Traffic light controllers
Train control networks
Airplanes
- Avionics use RS232
- Air traffic control uses verbal communication

It is more collaborative
More resilient
User integrity
Hardware sharing economy

Failures in Distributed Systems

Network faillures

Protocols like TCP/IP and SSH are usefull to secure and manage the network

Network not fast enough

The biggest risk is network partition

When two nodes write to same data item in different subgraphs

Shared state diverges

Loss of connectivity

Node faillures

Fail stop

Byzantine Fault Tolerance

Goal : turn it into fail stop

Link to original

Byzantine Fault Tolerance

What is BFT

Byzantine Fault Tolerance(BFT) is the feature of a distributed network to reach consensus(agreement on the same value) even when some of the nodes in the network fail to respond or respond with incorrect information. The objective of a BFT mechanism is to safeguard against the system failures by employing collective decision making(both – correct and faulty nodes) which aims to reduce to influence of the faulty nodes.

Fault tolerance is the system’s ability to continue functioning correctly even when some of its components, including nodes and communication channels, fail or behave maliciously.

A BFT is a faillure that is not a Fail Stop

It mean there is bad actors in the network : Traitor nodes

Flaky node(s)

Malicious nodes(s)

Example of extreme fault tolerance

Bitcoin

Boeing 777 & 787

Assumptions we make

Can all nodes see all message? Some? None?

Do nodes fail? How about the network?

Finite computation?

Static or dynamic adversary?

Bounded communication time?

Fully connected network?

Randomized algorithms?

Quantum or binary computers?

The Two Generals Problem

Two armies, each led by a different general, are preparing to attack a fortified city. The armies are encamped near the city, each in its own valley. A third valley separates the two hills, and the only way for the two generals to communicate is by sending messengers through the valley. Unfortunately, the valley is occupied by the city’s defenders and there’s a chance that any given messenger sent through the valley will be captured.

While the two generals have agreed that they will attack, they haven’t agreed upon a time for an attack. It is required that the two generals have their armies attack the city simultaneously to succeed, lest the lone attacker army die trying. They must thus communicate with each other to decide on a time to attack and to agree to attack at that time, and each general must know that the other general knows that they have agreed to the attack plan. Because acknowledgement of message receipt can be lost as easily as the original message, a potentially infinite series of messages is required to come to consensus.

The thought experiment involves considering how they might go about coming to a consensus. In its simplest form, one general is known to be the leader, decides on the time of the attack, and must communicate this time to the other general. The problem is to come up with algorithms that the generals can use, including sending messages and processing received messages, that can allow them to correctly conclude:

Yes, we will both attack at the agreed-upon time.

Allowing that it is quite simple for the generals to come to an agreement on the time to attack (i.e. one successful message with a successful acknowledgement), the subtlety of the Two Generals’ Problem is in the impossibility of designing algorithms for the generals to use to safely agree to the above statement.

The Byzantine General Problem
Imagine that several divisions of the Byzantine army are camped outside an enemy city, each division commanded by its own general. 

The generals can communicate with one another only by messenger. After observing the enemy, they must decide upon a common plan of action. 

However, some of the generals may be traitors, trying to prevent the loyal generals from reaching an agreement. 

The generals must decide on when to attack the city, but they need a strong majority of their army to attack at the same time. 

The generals must have an algorithm to guarantee that : 
- (a) all loyal generals decide upon the same plan of action, and 
- (b) a small number of traitors cannot cause the loyal generals to adopt a bad plan. 

The loyal generals will all do what the algorithm says they should, but the traitors may do anything they wish. 

The algorithm must guarantee condition (a) regardless of what the traitors do. The loyal generals should not only reach agreement, but should agree upon a reasonable plan.
Goal of the solution : Reaching Consensus

At least two generals must agree for there to be a consensus.

If 1/3 of the generals are traitors, it’s impossible to solve the problem

3m+1 generals

At least 4 generals is required

How to solve the Byzantine General Problem

Assuming

Less than 1/3 of generals are traitors

Oral messages (p2p)

No crypto

Related pages

Distributed System

Consensus

Asymmetric encryption

Cryptography

Link to original

🌿 Alternef Digital Garden

Distributed System

Communication Mechanisms in Distributed Systems

Examples of distributed systems

Failures in Distributed Systems

Byzantine Fault Tolerance

What is BFT

The Two Generals Problem

The Byzantine General Problem

How to solve the Byzantine General Problem

Peer-to-Peer Networking

swarm-computing

tor-network

Utopia

automerge

bittorrent

conflict-free-replicated-data-types

consensus

content-delivery-network

domain-name-system

fail-stop

how-systems-fail

Backlinks

Graph View

Table of Contents

Distributed System

Communication Mechanisms in Distributed Systems

Examples of distributed systems

Social benifices of distributed system

Failures in Distributed Systems

Byzantine Fault Tolerance

What is BFT

The Two Generals Problem

The Byzantine General Problem

How to solve the Byzantine General Problem

Related pages

Backlinks

Graph View

Table of Contents