Distributed Computing Models

Overview

Design and implementation of a distributed system requires consideration of the following elements:

Placement of components
Placement of data
Functional roles
Communication patterns

Fortunately, most distributed systems employ one of a small set of common models.

Software Layers

First, consider the software architecture of the components of a distributed system.

The lower two layers comprise the platform, such as Intel x86/Windows or PowerPC/MacOS X, that provides OS-level services to the upper layers.

The middleware sits between the platform and the application and its purpose is to mask heterogeneity and provide a consistent programming model for application developers. Some of the abstractions provided by middleware include the following:

Remote method invocation
Group communication
Event notification
Object replication
Real-time data transmission

Examples of middleware include the following:

Java RMI
CORBA
DCOM

Atop the middleware layer sits the application layer. The application layer provides application-specific functionality. Depending on the application, it may or may not make sense to take advantage of existing middleware.

System Architectures

The application layer defines the functional role of each component in a distributed system, and each component may have a different functional role. There are several common architectures employed by distributed systems. The choice of architecture can impact the design considerations described below:

Responsiveness - how quickly does the system respond to requests?
Throughput - how many requests can the system handle (per second, for example)?
Load Distribution - are requests distributed evenly among components of the system?
Fault Tolerance - can the system continue to handle requests in the face of a failed component?
Security - does the system ensure that sensitive resources are guarded against attack?

Common architectures for distributed systems are as follows:

Client-Server

The client-server model is probably the most popular paradigm. The server is responsible for accepting, processing, and replying to requests. It is the producer. The client is purely the consumer. It requests the services of the server and accepts the results.

The basic web follows the client-server model. Your browser is the client. It requests web pages from a server (e.g., google.com), waits for results, and displays them for the user.

In some cases, a web server may also act as a client. For example, it may act as a client of DNS or may request other web pages.

Multiple Servers

In reality, a web site is rarely supported with only one server. Such an implementation would not be scalable or reliable. Instead, web sites such as Google or CNN are hosted on many (many many) machines. Services are either replicated, which means that each machine can perform the same task, or partitioned, which means that some machines perform one set of tasks and some machines perform another set of tasks. For example, a site like CNN might serve images from one set of machines and HTML from another set of machines.

Proxies and Caches

To reduce latency, load on the origin server, and bandwidth usage, proxies and caches are also used to deliver content. An end host (your browser) may cache content. In this case, when you first request content, your browser stores a copy on your local machine. Subsequent requests for the same content can be fulfilled by using the cache rather than requesting the content from the origin server.

An organization, like USF, may also deploy a proxy server that can cache content and deliver it to any client within the organization. Again, this reduces latency, and it also reduces bandwidth usage. Suppose that several hundred USF students download the same YouTube video. If a proxy server caches the video after the first student's request, subsequent requests can be satisfied by using the cached content, thereby reducing the number of external requests by several hundred.

CDNs, like Akamai, also fall into this category. However, CDNs work a bit differently than traditional proxy servers. CDNs actively replicate content throughout the network in a push-based fashion. When a customer (e.g., CNN) updates its content, the new content is replicated throughout the network. In contrast, a proxy server will cache new content when it is requested by the first client.

P2P

The peer-to-peer model assumes that each entity in the network has equivalent functionality. In essence, it can play the role of a client or a server. Ideally, this reduces bottlenecks and enables each entity to contribute resources to the system. Unfortunately, it doesn't always work that way. One of the early papers on peer-to-peer systems was Free Riding on Gnutella, a paper that demonstrated that peers often free ride by taking resources (downloading files, in this case) and never contributing resources (uploading files).

In addition, enabling communication in such a system is challenging. First, peers must locate other peers in order to participate in the system. This is rarely done in a truly distributed or peer-to-peer fashion. For example, Napster, often cited (controversially) as the first real example of peer-to-peer computing, used a centralized mechanism for joining the network and searching for content. Searching for content or other resources is the second big challenging in implementing peer-to-peer systems. It can be very inefficient to locate resources in a peer-to-peer system and a hybrid, or partially centralized, solution is often employed.

Hierarchical or superpeer systems, like Skype, are also widely used. In these systems, peers are organized in a tree-like structure. Typically, more capable peers are elected to become superpeers (or supernodes). Superpeers act on behalf of downstream peers and can reduce communication overhead.

Other

Mobile Code/Agents

The previous models assume that the client/server/peer entities exchange data. The mobile code model assumes that components may exchange code. An example of this is Java Applets. When your browser downloads and applet, it downloads some Java code that it then runs locally. The big issue with this model is that it introduces security risks. No less a security threat are mobile agents -- processes that can move from machine to machine.

Network Computers/Thin Clients

The network computer model assumes that the end user machine is a low-end computer that maintains a minimal OS. When it boots, it retrieves the OS and files/applications from a central server and runs applications locally. The thin client model is similar, though assumes that the process runs remotely and the client machine simply displays results (e.g., X-windows and VNC).

This model has been around for quite some time, but has recently received much attention. Google and Amazon are both promoting "cloud computing". Sun's Sun Ray technology also makes for an interesting demonstration. Though this model has yet to see success, it is beginning to look more promising.

Mobile Devices

There is an increasing need to develop distributed systems that can run atop devices such as cell phones, cameras, and MP3 players. Unlike traditional distributed computing entities, which communicate over the Internet or standard local area networks, these devices often communicate via wireless technologies such as Bluetooth or other low bandwidth and/or short range mechanisms. As a result, the geographic location of the devices impacts system design. Moreover, mobile systems must take care to consider the battery constraints of the participating devices. System design for mobile ad hoc networks (MANETs), sensor networks, and delay/disruption tolerant networks (DTNs) is a very active area of research.

Fundamental Models

Or, understanding the characteristics that impact distributed system performance and operation.

Interaction

Fundamentally, distributed systems are comprised of entities that communicate and coordinate by passing messages. The following characteristics of communication channels impact the performance of the system:

Latency - the time between the sending of a message at the source and the receipt of the message at the destination.
Bandwidth - the total amount of information that can be transmitted over a given time period (e.g., Mbits/second).
Jitter - "the variation int he time taken to deliver a series of messages." (Coulouris et al)

Additionally, coordination of the actions of entities in a distributed system is impacted by the fact that each entity will have a different clock drift rate. Synchronous distributed systems that rely on certain actions happening at the same time can only be built if you can guarantee bounds on system resources and clock drift rates. Most of the systems that we will discuss are asynchronous; there are no guarantees about the time at which actions will occur.

Generally, it is sufficient to know the order in which events occur. A logical clock is a counter that allows a system to keep track of when events occur in relation to other events.

Failure

It is important to understand the kinds of failures that may occur in a system.

Failstop: A process halts and remains halted. Other processes can detect that the process has failed.
Crash: A process halts and remains halted. Other processes may not be able to detect this state.
Omission: A message inserted in an outgoing message buffer never arrives at the other end's incoming message buffer.
Send-omission: A process completes a send, but the message is not put in its outgoing message buffer.
Receive-omission: A message is put in a process's incoming message buffer, but that process does not receive it.
Arbitrary (Byzantine): Process/channel exhibits arbitrary behavior: it may send/transmit arbitrary messages at arbitrary times, commit missions; a process may stop or take an incorrect step.
Timing failure: Clock drift exceeds allowable bounds.

Security

There are several potential threats a system designer need be aware of:

Threats to processes - An attacker may send a request or response using a false identity (spoofing).
Threats to communication channels - An attacker may eavesdrop (listen to messages) or inject new messages into a communication channel. An attacker can also save messages and replay them later.
Denial of service - An attacker may overload a server by making excessive requests.

Cryptography and authentication are often used to provide security. Communication entities can use a shared secret (key) to ensure that they are communicating with one another and to encrypt their messages so that they cannot be read by attackers.

Sami Rollins

Date: 2007-12-18