Introduction


Overview

Name some of your favorite applications.

Can you name a few non-distributed applications that you use on a regular basis?

According to Coulouris et al "a distributed system is one in which hardware or software components located at networked computers communicate and coordinate their actions only by passing messages." Most of your favorite applications leverage a distributed system in some way. Peer-to-peer file sharing is a good example. Peers pass messages to other peers in order to locate and download desired content.

Characteristics of distributed systems:

Generally, in a distributed systems components are fairly loosely coupled. In contrast, Prof. Pacheco's class will largely focus on parallel computing in environments where components are tightly coupled. They likely share a fast communication link, share a clock, are under a single administrative domain, and are homogeneous.

What makes distributed systems complex?


Case Study: DNS

DNS stands for domain name system. DNS is a great example of a distributed system. It maps hostnames such as www.cs.usfca.edu to IP addresses such as 138.202.170.2. Messages sent across the Internet must contain (in the IP header) the IP address of the destination (and the source) of the message.

Clearly, you'd hate to have to remember that my web page was http://138.202.170.2/~srollins/ (since I am sure that all of you frequently surf my web page). Fortunately, you can use the more friendly URL http://www.cs.usfca.edu/~srollins/ and your browser will translate the www.cs.usfca.edu part into an IP address before sending the HTTP request to the CS web server. Your browser uses DNS for this purpose.

Your browser performs the following operations:

  1. The hostname is extracted from the URL.
  2. The browser sends a query to the DNS server. (How does it know where to find the DNS server?)
  3. The server replies with the IP address.
  4. The browser opens a TCP connection and sends the HTTP request.

How do you suppose DNS is implemented? One option would be to set up a single, high-speed computer and store all of the mappings in a database on that computer. Your browser could then query that computer for the appropriate IP address.

Why wouldn't this work?

Instead, DNS is implemented as a distributed, hierarchical database. There are three classes of DNS servers:

How it works:

  1. Your computer asks its local DNS server for a mapping.
  2. Your local DNS server (how is this found?!) contacts a root DNS server to ask for the mapping.
  3. The root DNS server responds with the IP address of the TLD DNS server for relevant domain.
  4. Your local DNS server contacts the TLD server and the TLD server responds with the address of the authoritative server for the domain in question.
  5. Your local DNS server contacts the authoritative server and (finally!) gets the correct IP address.
  6. Your local DNS server returns the address to your computer.

Other miscellaneous facts:

  1. It is possible to configure DNS to use recursive queries. In this case, the root server would contact the TLD server directly, and the TLD server would contact the authoritative server. The result would then propagate back through the chain.
  2. Each server may cache results and server the cached results later instead of resubmitting the query.

How does DNS address the complexities of distributed computing?


Case Study: CDNs

The goal of a content distribution/delivery network is to reduce the latency of delivering content to an end user by caching it throughout the Internet. When you visit a web page like MySpace you will notice that lots of images are displayed. To reduce the amount of time it takes for the end user to load those images, companies like MySpace distribute their content using CDNs provided by companies like Akamai. By caching content close to the user, the load on the origin server and the network latency are both reduced. In addition, crowded or faulty network paths can be avoided.

The process works as follows:

  1. CDNs maintain lots of servers in data centers around the world.
  2. A content provider (typically one that delivers lots of multimedia content) contracts with a CDN.
  3. When the content provider creates a new piece of content, for example a new video, it sends that content to the CDN.
  4. The CDN replicates that content on all of its servers.
  5. Typically, the content provider serves basic content (e.g., HTML pages) from its origin servers. Links to other content (e.g, videos) list the CDN as the server.
  6. When the client does a DNS lookup on the CDN hostname, the CDN DNS server gets the IP address of the client and uses a proprietary algorithm to determine the IP address of the best server for that particular piece of content.

How does a CDN address the complexities of distributed computing?

A proprietary CDN like Akamai is not necessarily heterogeneous or open. As a result, security is much easier to address.


Sami Rollins
Wednesday, 07-Jan-2009 15:13:20 PST