FastReplica: Large File Distribution within CDNs

hp labs

contact hp

hp labs home

about hp labs

research

news and events

careers @ labs

technical reports

talks and speeches

worldwide sites

Content Delivery Networks (CDNs) are based on a large-scale distributed network of servers located closer to the edges of the Internet for efficient delivery of digital content including various forms of multimedia content. The main goal of the CDN's architecture is to minimize the network impact in the critical path of content delivery as well as to overcome a server overload problem that is a serious threat for busy sites serving popular content.

For typical web documents (e.g. html pages and images) served via CDN, there is no need for active replication of the original content at the edge servers. The CDN's edge servers are the caching servers, and if the requested content is not yet in the cache, this document is retrieved from the original server, using the so-called pull model. The performance penalty associated with initial document retrieval from the original server, such as higher latency observed by the client and the additional load experienced by the original server, is not significant for small to medium size web documents.

For large documents, software download packages and media files, a different operational mode is preferred: it is desirable to replicate these files at edge servers in advance, using the so-called push model. For large files it is a challenging, resource-intensive problem, e.g. media files can require significant bandwidth and download time due to their large sizes: 20~min media file encoded at 1 Mbit/s results in a file of 150 MBytes.

While transferring a large file with individual point-to-point connections from an original server can be a viable solution in the case of limited number of mirror servers (tenths of servers), this method does not scale when the content needs to be replicated across a CDN with thousands of geographically distributed machines.

In our work, we consider a geographically distributed network of servers and a problem of content distribution across it. Our focus is on distributing large size files such as software packages or stored streaming media files (also called as on-demand streaming media). We propose a novel algorithm, called FastReplica, for efficient and reliable replication of large files. There are a few basic ideas exploited in FastReplica.

Related Papers and Reports

L.Cherkasova Optimizing the Reliable Distribution of Large Files within CDNs. Proceedings of the 10th IEEE Symposium on Computers and Communications (ISCC'2005), Spain, 2005.
L.Cherkasova ALM-FastReplica: Optimizing the Reliable Distribution of Large Files within CDNs. HP Laboratories Report No. HPL-2005-64, 2005.
L.Cherkasova, J. Lee. FastReplica: Efficient Large File Distribution within Content Delivery Networks. Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems (USITS), Seattle, Washington, March 26-28, 2003.

printing instructions


Privacy Statement	Legal Notices	© 1994-2001 Hewlett-Packard Company