APGen Documentation Previous Topic: Web Farms Next Topic: APG Script Syntax Reference Parent Topic: System Architecture    System Architecture
Replication
See Also:

A replication system is commonly needed when using APGen in web farms - generated content is replicated to all web servers, so that they all display the same content.  This topic reviews some of the replication options for your system.  In addition, caching and content distribution are covered briefly.

Replication is defined as "the act or process of duplicating or reproducing something".  In this context, we mean file replication.  There are a number of systems that can be used to replicate files between web servers.  Most are "multi-master" replication systems, meaning changed files on any web server are replicated to all the other servers.

Caching and Content Distribution

Caching and content distribution are related to replication.  In fact, content distribution can be defined as replication to remote caches.  Caching is different from replication, in that caches are updated when the content passes through the cache - this is a "pull" model of duplicating content.  Both replication and content distribution operate on a "push" model, in that content is updated as soon as changes are detected on the origin server.  Replication and content distribution systems install agents on the origin servers, which monitor for changed content.  One important difference between content distribution and replication is that content distribution is normally single-master, whereas file replication is normally multi-master.

Some vendors use the terms content delivery or content management in place of content distribution, but the meaning is similar.  Content management may imply more control, or nicer management tools, than content distribution, but the end result is the same: Content is replicated to distributed caches.

Most vendors package content distribution as a feature of their caching system.  Though in Akamai's case, caching is a feature of their content distribution service.  Vendors of caching and content distribution systems include Akamai, Inktomi, NetworkAppliance, Cisco, Novell, epicRealm, and others.

APGen complements all caching and content distribution systems.  All of the major caching and content distribution systems cache static files to improve end-user performance, and reduce load on the origin servers.  In practice, most web pages are not static, so caching can only be used on image files, multimedia files, and the occasional static text file.  When APGen is used to generate static web pages, much more of the web site content becomes cacheable.  This gives companies a significantly better return on their caching investment.

Replication Systems

This is a brief summary of some of the replication systems that are commonly used on Windows NT and Windows 2000.  There are a number of other replication systems not discussed here.  All replication systems work well with APGen - APGen only requires simple file replication.

Windows 2000 DFS Replication

Windows 2000 DFS Replication is probably the most accessible replication system, as long as all web servers run Windows 2000.  To set up replication, add a DFS share using the Distributed File System MMC Add-In.  Replicas can be added on every web server in the farm - these are network shares that are synchronized with the other replicas.  Remember to enable automatic replication for each replica.  Once this is done, all content on the shares will be replicated to all replicas.

To write APG script output to a DFS share, set the APGen.OutputDir property or the Output.Dir property as follows:

oAPGen.OutputDir = "\\domain\dfs_share\subdir\" ' DFS share

The advantages and disadvantages of DFS replication are:

Advantages:
  1. Cheap: It comes with Windows 2000.
  2. It is easy to administer and setup.
  3. Automatic multi-master synchronized replication.
  4. Efficient for incremental updates.
Disadvantages:
  1. There is no way to programmatically initiate a replication.
  2. Configuration options are limited.
  3. Throughput is not as good as other options (i.e. robocopy or xcopy).
  4. Does not support deployment over slow WAN links.
Requirements:
  1. Windows 2000 Server.
  2. Web servers must be in a Windows 2000 domain. In order to use DFS replication, a domain DFS share needs to be setup in which the servers all participate.

Robocopy

Robocopy (short for "Robust File Copy") is part of the NT 4.0 Resource Kit.  It runs as a service, and includes a command line tool for initiating replication.  For more information, see the NT 4.0 Resource Kit, or Microsoft Knowledge Base Article Q160513.

The advantages and disadvantages of Robocopy are:

Advantages:
  1. Comes with the NT 4.0 Resource Kit (inexpensive).
  2. It is efficient; it only copies changed files and folders.
  3. Deletes destination files and directories that no longer exist in the source.
  4. It is robust; it can automatically recover from failed copies.
  5. Outputs a log of all file and folder transactions that can be piped to a file.
Disadvantages:
  1. Does not provide automatic replication, it must be scheduled or executed by another program.

Site Server 3.0 Content Deployment Service (SSCD)

The Site Server 3.0 Content Deployment ships with Microsoft Site Server 3.0 and Microsoft Site Server Commerce Edition 3.0.  The official acronym is SSCD, but CDS (Content Deployment Service) and CRS (Content Replication Service - the name used in Site Server 2.0) are also used.  If you are using Site Server or Site Server Commerce Edition on your site, you already have SSCD.  SSCD provides flexible and programmable replication.  For more information, see the Microsoft Site Server 3.0 web site, or see the Site Server documentation.

The advantages and disadvantages of SSCD are:

Advantages:
  1. Flexible and programmable deployment and administration architecture. Using the SSCD COM object model you can manage replication projects and routes. SSCD can log events to a database where you can easily monitor project activity and server load for your entire site.
  2. Supports deployment over slow or unreliable WAN connections.
  3. Supports transactional replications. You can replicate content but not apply it or make it available until a specified time. You can also rollback deployments to a previous deployment.
  4. You can replicate files automatically (on file change or on a schedule), manually, or programmatically on a project or file level.
Disadvantages:
  1. The cost of purchasing a SS3 license for each server.
  2. When a failure occurs during SSCD's automatic replication (ex: a dropped connection), the automatic replication is not resumed. It must be explicitly resumed administratively or programmatically. Thus SSCD automatic replication requires a little more baby-sitting than DFS replication does.
Requirements:
  1. Site Server 3.0 or Site Server Commerce Edition 3.0.

Microsoft Application Center 2000

Microsoft Application Center 2000 (AppCenter) provides distributed application deployment and management.  Replication is one feature of AppCenter - AppCenter automatically keeps application content and configuration settings consistent across all computers in the cluster.  If you are using AppCenter for application management, you can also use it for your replication needs.  For more information, see Microsoft's Application Center web site.