Distributed systems

A distributed system is a set of software applications that implement architectural patterns to communicate via network and coordinate their actions to achieve a common goal.

Core concepts of distributed systems

Idempotence

An operation is idempotent when only the first received call can cause any change to the system. For example, when a file is saved with no changes the operation is idempotent if the save operation does nothing but it is not idempotent if the "modified" date is changed.

Idempotence is important in distributed systems because retries may be necessary if system calls over the network fail. For example, if a "create" operation succeeds for a record in a database but the success message never reaches the client, the client has no way of knowing about the success and may therefore retry. If the "create" operation is not idempotent, this would undesirably result in the creation of duplicate objects.

sequenceDiagram loop until client receives success message Client ->> API: Save Record API ->> DB: `INSERT INTO ...` DB ->> API: "Success" API -x Client: (network failure) end

Immutability

Data is immutable when it can be created, but not destroyed (meaning modified or deleted). Immutability is valuable for parallel processing, as well as for business needs such as the need to generate detailed audit logs.

Pat Helland's paper, Immutability Changes Everything, goes into detail on the types of problems solved with immutability.

Location independence

An application is location independent when its behavior does not rely on its location, meaning the same application can be deployed to multiple locations and when sent the same message will produce the same behavior as any other instance in any other location.

An example of location dependence is the use of auto-increment IDs in databases because the increment is only valid for the database that generated the ID and could identify a completely different record in another database. However, a GUID, natural key, or content-addressed storage-based identifier (using a hash) is location independent because it uniquely identifies the record, even across multiple database instances. Natural keys and content-addressed storage can be especially useful because they can be derived consistently regardless of location, whereas a GUID is randomly derived and therefore not consistently derivable.

Data and API versioning

In the context of distributed systems, versioning is the practice of maintaining the contract between otherwise independent components of the system. In a distributed system, different versions of applications will be deployed to different locations at different times yet still need to interact successfully. A client application, for example, be routed to a newer version of an API that returns data in a newer format that what the client was originally built to process. Versioning helps maintain compatibility.

One simple versioning strategy is to use additive structure. Additive structure means that new elements are preferred over modifications to existing elements. In an API, this means preferring to add support for new structures to extend existing request/response objects. In a database, this means preferring to add new tables following the snapshot pattern to extend existing entities.

The eight fallacies of distributed computing

When it comes to distributed systems, developers often unknowingly make bad assumptions about the underlying risks and limitations to that system. Peter Deutsch famously listed seven of these assumptions in 1994 while at Sun Microsystems and James Gosling added another to the list in 1997. Collectively, these assumptions are now known as the eight fallacies of distributed computing.

See also: Episode 470L L. Peter Deutsch on the Fallacies of Distributed Computing from Software Engineering Radio.

Those fallacies are:

  1. The network is reliable
  2. Bandwidth is infinite
  3. The network is secure
  4. Topology doesn't change
  5. There is one administrator
  6. Transport cost is zero
  7. The network is homogeneous

Deeper Knowledge on Distributed Systems

Apache Flink

Apache Flink

A distributed processing framework and engine for stateful data computation

Publish/Subscribe Pattern (Pub-sub)

Publish/Subscribe Pattern (Pub-sub)

A software engineering design pattern to separate responsibility between commands and queries

Content-Addressed Storage (CAS)

Content-Addressed Storage (CAS)

A method to store information so that it can be retrieved based on content rather than location

Insert-only Databases

Insert-only Databases

A database design approach that requires deletes and updates to be performed as inserts

Big Data

Big Data

Challenges related to data variety, velocity, and volume

Microservices

Microservices

A software architecture in which applications are made up of loosely coupled services

Broader Topics Related to Distributed Systems

Software Architecture

Software Architecture

The practice of organizing software components in a complex system

Computer Science

Computer Science

The study of algorithms, data structures, information, and computation

Distributed Systems Knowledge Graph