Time Synchronization


Papers referenced:
"Timing-sync Protocol for Sensor Networks" by Ganeriwal, Kumar, and Srivastava
"Fine-Grained Network Time Synchronization using Reference Broadcasts" by Elson, Girod, and Estrin

Overview

Generally, timing is a challenging an important issue in building distributed systems. Consider a couple of examples:

In some cases, it may be sufficient to know the order in which events occur.  This is referred to as logical time.  In other cases, it is necessary that two computing devices be synchronized with respect to physical time.

Clock Synchronization

Every computer has a physical clock that counts oscillations of a crystal. This hardware clock is used by the computer's software clock to track the current time. However, the hardware clock is subject to drift -- the clock's frequency varies and the time becomes inaccurate. As a result, any two clocks are likely to be slightly different at any given time. Ganeriwal reports that motes can lose up to 40 microseconds every second.  The difference between two clocks is called their skew.

There are several methods for synchronizing physical clocks. External synchronization means that all computers in the system are synchronized with an external source of time (e.g., a UTC signal). Internal synchronization means that all computers in the system are synchronized with one another, but the time is not necessarily accurate with respect to UTC.  

In a system that provides guaranteed bounds on message transmission time, synchronization is straightforward since upper and lower bounds on the transmission time for a message are known. One process sends a message to another process indicating its current time, t. The second process sets its clock to t + (max+min)/2 where max and min are the upper and lower bounds for the message transmission time respectively. This guarantees that the skew is at most (max-min)/2.

Cristian's method for synchronization in systems that do not provide such guarantees is similar, but does not rely on a predetermined max and min transmission time. Instead, a process p1 requests the current time from another process p2 and measures the RTT (Tround) of the request/reply. Whenp1 receives the time t from p2 it sets its time to t + Tround/2.

The Berkeley algorithm, developed for collections of computers running Berkeley UNIX, is an internal synchronization mechanism that works by electing a master to coordinate the synchronization. The master polls the other computers (called slaves) for their times, computes an average, and tells each computer by how much it should adjust its clock.

The Network Time Protocol (NTP) is yet another method for synchronizing clocks that uses a hierarchical architecture where he top level of the hierarchy (stratum 1) are servers connected to a UTC time source such as a GPS unit.  The TPSN protocol is very similar to NTP.


Synchronization in WSN

Some classifications of synchronization protocols:
Performance metrics:
Decomposition of packet delay:
delay

To reduce the delay at the sender (send time + access time + transmission time) and receiver (reception time + receive time), most time synchronization protocols propose the use of timestamps taken at the MAC layer.  

TPSN Overview

TPSN is a sender-receiver synchronization algorithm that works similarly to NTP.
  1. Level Discovery - Create a hierarchical topology in the network, similar to NTP.  The root node at level 0 may be synchronized with UTC time, for example using a GPS receiver.
  2. Synchronization Phase - nodes at level i synchronize to a node at level i-1.

Level Discovery

Synchronization Phase

Special Provisions

Reference-Broadcast Synchronization (RBS)

RBS is a receiver-receiver synchronization algorithm that is an alternative to TPSN.  In RBS, nodes send reference beacons that are heard by other nodes within broadcast range.  These nodes then use the time the broadcast is received to synchronize their clocks.

Logical Time

Physical time cannot be perfectly synchronized. Logical time provides a mechanism to define the causal order in which events occur at different processes. The ordering is based on the following:

"Lamport called the partial ordering obtained by generalizing these two relationships the happened-before relation." ( )

time1

In the figure, a b and c d . Also, b c and d f , which means that a f . However, we cannot say that a e or vice versa; we say that they are concurrent

(a || e).

A Lamport logical clock is a monotonically increasing software counter, whose value need bear no particular relationship to any physical clock. Each process pi keeps its own logical clock, Li, which it uses to apply so-called Lamport timestamps to events.

Lamport clocks work as follows:

An example is shown below:

lamport

If e e ' then L(e) < L(e'), but the converse is not true. Vector clocks address this problem. "A vector clock for a system of N processes is an array of N integers." Vector clocks are updated as follows:

VC1: Initially, Vi[j] = 0 for i, j = 1, 2, ..., N

VC2: Just before pi timestamps an event, it sets Vi[i]:=Vi[i]+1.

VC3: pi includes the value t = Vi in every message it sends.

VC4: When pi receives a timestamp t in a message, it sets Vi[j]:=max(Vi[j], t[j]), for 1, 2, ...N. Taking the componentwise maximum of two vector timestamps in this way is known as a merge operation.

An example is shown below:

vector

Vector timestamps are compared as follows:

V=V' iff V[j] = V'[j] for j = 1, 2, ..., N

V <= V' iff V[j] <=V'[j] for j = 1, 2, ..., N

V < V' iff V <= V' and V != V'

If e e ' then V(e) < V(e') and if V(e) < V(e') then e e ' .


Global States

It is often desirable to determine whether a particular property is true of a distributed system as it executes. We'd like to use logical time to construct a global view of the system state and determine whether a particular property is true. A few examples are as follows:

In general, this problem is referred to as Global Predicate Evaluation. "A global state predicate is a function that maps from the set of global state of processes in the system ρ to {True, False}."

Cuts

Because physical time cannot be perfectly synchronized in a distributed system it is not possible to gather the global state of the system at a particular time. Cuts provide the ability to "assemble a meaningful global state from local states recorded at different times".

Definitions:

cuts

Distributed Debugging

To further examine how you might produce consistent cuts, we'll use the distributed debugging example. Recall that we have several processes, each with a variable xi. "The safety condition required in this example is |xi-xj| <= δ (i, j = 1, 2, ..., N)."

The algorithm we'll discuss is a centralized algorithm that determines post hoc whether the safety condition was ever violated. The processes in the system, p1, p2, ..., pN, send their states to a passive monitoring process, p0. p0 is not part of the system. Based on the states collected, p0 can evaluate the safety condition.

Collecting the state: The processes send their initial state to a monitoring process and send updates whenever relevant state changes, in this case the variable xi. In addition, the processes need only send the value of xi and a vector timestamp. The monitoring process maintains a an ordered queue (by the vector timestamps) for each process where it stores the state messages. It can then create consistent global states which it uses to evaluate the safety condition.

Let S = (s1, s2, ..., SN) be a global state drawn from the state messages that the monitor process has received. Let V(si) be the vector timestamp of the state si received from pi. Then it can be shown that S is a consistent global state if and only if:

V(si)[i] >= V(sj)[i] for i, j = 1, 2, ..., N

states


Sami Rollins

Date: 2008-01-15