There are two formal models of distributed systems: synchronous and asynchronous.

Synchronous distributed systems have the following characteristics:

- the time to execute each step of a process has known lower and upper bounds;
- each message transmitted over a channel is received within a known bounded time;
- each process has a local clock whose drift rate from real time has a known bound.

Asynchronous distributed systems, in contrast, guarantee no bounds on process execution speeds, message transmission delays, or clock drift rates. Most distributed systems we discuss, including the Internet, are asynchronous systems.

Generally, timing is a challenging an important issue in building distributed systems. Consider a couple of examples:

- Suppose we want to build a distributed system to track the battery usage of a bunch of laptop computers and we'd like to record the percentage of the battery each has remaining at exactly 2pm.
- Suppose we want to build a distributed, real time auction and we want to know which of two bidders submitted their bid first.
- Suppose we want to debug a distributed system and we want to know
whether variable x
_{1 }in process p_{1 }ever differs by more than 50 from variable x_{2 }in process p_{2}.

In the first example, we would really like to synchronize the clocks of all participating computers and take a measurement of absolute time. In the second and third examples, knowing the absolute time is not as crucial as knowing the order in which events occurred.

Every computer has a physical clock that counts oscillations of a crystal.
This hardware clock is used by the computer's software clock to track the
current time. However, the hardware clock is subject to *drift* -- the
clock's frequency varies and the time becomes inaccurate. As a result, any
two clocks are likely to be slightly different at any given time. The
difference between two clocks is called their *skew*.

There are several methods for synchronizing physical clocks. *External
synchronization* means that all computers in the system are synchronized
with an external source of time (e.g., a UTC signal). *Internal
synchronization* means that all computers in the system are synchronized
with one another, but the time is not necessarily accurate with respect to
UTC.

In a synchronous system, synchronization is straightforward since upper
and lower bounds on the transmission time for a message are known. One
process sends a message to another process indicating its current time,
*t*. The second process sets its clock to *t + (max+min)/2*
where max and min are the upper and lower bounds for the message transmission
time respectively. This guarantees that the skew is at most
*(max-min)/2*.

Cristian's method for synchronization in asynchronous systems is similar,
but does not rely on a predetermined max and min transmission time. Instead,
a process *p _{1}* requests the current time from another
process

The Berkeley algorithm, developed for collections of computers running Berkeley UNIX, is an internal synchronization mechanism that works by electing a master to coordinate the synchronization. The master polls the other computers (called slaves) for their times, computes an average, and tells each computer by how much it should adjust its clock.

The Network Time Protocol (NTP) is yet another method for synchronizing clocks that uses a hierarchical architecture where he top level of the hierarchy (stratum 1) are servers connected to a UTC time source.

Physical time cannot be perfectly synchronized. Logical time provides a
mechanism to define the *causal order* in which events occur at
different processes. The ordering is based on the following:

- Two events occurring at the same process happen in the order in which they are observed by the process.
- If a message is sent from one process to another, the sending of the message happened before the receiving of the message.
- If e occurred before e' and e' occurred before e" then e occurred before e".

"Lamport called the partial ordering obtained by generalizing these two
relationships the *happened-before* relation." ($\to $)

In the figure, $a\to b$ and $c\to d$. Also, $b\to c$and $d\to f$, which means that $a\to f$. However, we cannot say that $a\to e$or vice versa; we say that they are *concurrent*

(*a || e*).

A Lamport logical clock is a monotonically increasing software counter, whose value need bear no particular relationship to any physical clock. Each process

pkeeps its own logical clock,_{i}L, which it uses to apply so-called_{i}Lamport timestampsto events.

Lamport clocks work as follows:

- LC1:
L_{i}is incremented before each event is issued atp._{i}- LC2:

- When a process
psends a message_{i}m, it piggybacks onmthe valuet = L._{i}- On receiving (
m, t), a processpcomputes_{j}Land then applies LC1 before timestamping the event_{j}:= max(L_{j}, t)receive(m).

An example is shown below:

If $e\to e\text{'}$then L(e) < L(e'), but the converse is not true. Vector clocks address this problem. "A vector clock for a system of N processes is an array of N integers." Vector clocks are updated as follows:

VC1: Initially, V

_{i}[j] = 0 for i, j = 1, 2, ..., NVC2: Just before p

_{i}timestamps an event, it sets V_{i}[i]:=V_{i}[i]+1.VC3: p

_{i}includes the value t = V_{i}in every message it sends.VC4: When p

_{i}receives a timestamp t in a message, it sets V_{i}[j]:=max(V_{i}[j], t[j]), for 1, 2, ...N. Taking the componentwise maximum of two vector timestamps in this way is known as a merge operation.

An example is shown below:

Vector timestamps are compared as follows:

V=V' iff V[j] = V'[j] for j = 1, 2, ..., N

V <= V' iff V[j] <=V'[j] for j = 1, 2, ..., N

V < V' iff V <= V' and V !=$$V'

If $e\to e\text{'}$then V(e) < V(e') and if V(e) < V(e') then $e\to e\text{'}$.

It is often desirable to determine whether a particular property is true of a distributed system as it executes. We'd like to use logical time to construct a global view of the system state and determine whether a particular property is true. A few examples are as follows:

- Distributed garbage collection: Are there references to an object anywhere in the system? References may exist at the local process, at another process, or in the communication channel.
- Distributed deadlock detection: Is there a cycle in the graph of the "waits for" relationship between processes?
- Distributed termination detection: Has a distributed algorithm terminated?
- Distributed debugging: Example: given two processes p
_{1}and p_{2}with variables x_{1}and x_{2}respectively, can we determine whether the condition |x_{1}-x_{2}| > δ is ever true.

In general, this problem is referred to as *Global Predicate
Evaluation*. "A global state predicate is a function that maps from the
set of global state of processes in the system ρ to {True, False}."

- Safety - a predicate always evaluates to false. A given undesirable property (e.g., deadlock) never occurs.
- Liveness - a predicate eventually evaluates to true. A given desirable property (e.g., termination) eventually occurs.

Because physical time cannot be perfectly synchronized in a distributed system it is not possible to gather the global state of the system at a particular time. Cuts provide the ability to "assemble a meaningful global state from local states recorded at different times".

Definitions:

- ρ is a system of N processes p
_{i}(i = 1, 2, ..., N) - history(p
_{i}) = h_{i}= <${e}_{i}^{0}$, ${e}_{i}^{1}$,...> - ${h}_{i}^{k}$=<${e}_{i}^{0}$, ${e}_{i}^{1}$,..., ${e}_{i}^{k}$> - a finite prefix of the process's history
- ${s}_{i}^{k}$is the state of the process p
_{i}immediately before the kth event occurs - All processes record sending and receiving of messages. If a process
p
_{i}records the sending of message m to process p_{j}and p_{j}has not recorded receipt of the message, then m is part of the state of the channel between p_{i}and p_{j}. - A
*global history*of ρ is the union of the individual process histories: H = h_{0}$\cup $h_{1}∪ h_{2}∪...∪h_{N-1} - A
*global state*can be formed by taking the set of states of the individual processes: S = (s_{1}, s_{2}, ..., s_{N}) - A
*cut*of the system's execution is a subset of its global history that is a union of prefixes of process histories (see figure below). - The
*frontier*of the cut is the last state in each process. - A cut is
*consistent*if, for all events*e*and*e'*:- ($e\in C$ and $e\text{'}\to e$) $\Rightarrow e\text{'}\in C$

- A
*consistent global state*is one that corresponds to a consistent cut.

To further examine how you might produce consistent cuts, we'll use the
distributed debugging example. Recall that we have several processes, each
with a variable x_{i}. "The safety condition required in this example
is |x_{i}-x_{j}| <= δ (i, j = 1, 2, ..., N)."

The algorithm we'll discuss is a centralized algorithm that determines
post hoc whether the safety condition was ever violated. The processes in the
system, p_{1}, p_{2}, ..., p_{N}, send their states
to a passive monitoring process, p_{0}. p_{0} is not part of
the system. Based on the states collected, p_{0} can evaluate the
safety condition.

Collecting the state: The processes send their initial state to a
monitoring process and send updates whenever relevant state changes, in this
case the variable x_{i}. In addition, the processes need only send
the value of x_{i} and a vector timestamp. The monitoring process
maintains a an ordered queue (by the vector timestamps) for each process
where it stores the state messages. It can then create consistent global
states which it uses to evaluate the safety condition.

Let S = (s

_{1}, s_{2}, ..., S_{N}) be a global state drawn from the state messages that the monitor process has received. Let V(s_{i}) be the vector timestamp of the state s_{i}received from p_{i}. Then it can be shown that S is a consistent global state if and only if:V(s

_{i})[i] >= V(s_{j})[i] for i, j = 1, 2, ..., N

Sami Rollins

Date: 2008-01-15