Dolly — A program to clone disks / partitions

Version 0.57
8 may 2003
Felix Rauch <rauch@inf.ethz.ch>

This document describes the program "dolly", its purpose and the format of the required config-file.

Purpose

Dolly is used to clone the installation of one machine to (possibly many) other machines. It can distribute image-files (even gnu-zipped), partitions or whole hard disk drives to other partitions or hard disk drives. As it forms a "virtual TCP ring" to distribute data, it works best with fast switched networks (we were able to clone a 2 GB Windows NT partition to 15 machines in our cluster over Gigabit Ethernet in less than 4 minutes).

As dolly clones whole partitions block-wise it works for most filesystems. We used it to clone partitions of the following type: Linux, Windows NT, Oberon, Solaris (most of our machines have multi boot setups). We have a small (additional) Linux installation on all of our machines or use a small one-floppy-disk-linux (e.g. muLinux) to do the cloning. On newer machines we use PXE to boot a small system in a RAM disk. From that system we then clone the hard disks in the machines.

How it works

Setting up or upgrading a cluster of PCs typically leads to the problem that many machines need the exact same files. There are different approaches to distribute the setup of one "master" machine to all the other machines in the cluster. Our approach is not sophisticated, but simple and fast (at least for fast switched networks). We send the data around in a "virtual TCP ring" from the server to all the clients which store tha received data on their local disks.

One machine is the master and distributes the data to the others. The master can be a machine of the cluster or some other machine (in the current version of dolly it should be the same architecture though). It stores the image of the partition or disk to be cloned or has the partition on a local disk. The server should be on a fast switched network (as all the other machines too) for fast cloning.

All other machines are clients. They receive the data from the ring, store it to the local disk and send it to the next machine in the ring.

The cloning process is depicted in the following two figures. Usually there are more than two clients, but you get the idea:

      +--------+  +----------+ +----------+
      | Master |  | Client 1 | | Client 2 |
      +----+---+  +---|------+ +----+-----+
            \         |            /
             \    +---+----+      /
              +---+ Switch |-----+
                  +--------+

Above: Cloning process, physical network


     +--------+  Data   +----------+  Data  +----------+
     | Master |-------->| Client 1 |------->| Client 2 |
     +--------+         +----------+        +----------+
         ^                   |                   |
         | Data              | Data              | Data
         |                   V                   V
      +------+            +------+            +------+
      | Disk |            | Disk |            | Disk |
      +------+            +------+            +------+

Above: Cloning process, virtual network with TCP connections

We choose this method instead of a multicast scheme because it is simple to implement, doesn't require the need to write a reliable multicast protocoll and works quite well with existing technologies. One could also use the master as an NFS server and copy the data to each client, but this puts quite a high load on the server and makes it the bottleneck. Furthermore, it would not be possible to directly clone partitions from one machine to some others without any filesystem in the partition.

Different cloning possibilities

There are different possibilities to clone your master machine: WARNING: You can NOT clone an OS which is currently in use. That's why we have a small second Linux installation on all of our machines so that we can boot this to clone our regular Linux partition.

Changes since version 0.2

We applied some changes to Dolly since version 0.2. Most of them are not very important.

Change in version 0.57

Besides some bug-files and smaller improvements, it's now possible to split an image in multiple files for archival and send the multiple-file image to the clients. This allows to story arbitrary long partitions on file systems with a file size limit. For details and examples, see the section about the configuration file below (parameters infile and outfile).

Configuration file

You need a configuration file for the cloning process. Its format is strict, but easy. It contains the following entries (note that the order of the entries is fix):
(The text after "Syntax:" explains the syntax of the entry, the lines following "EG:" are example lines)
  1. The file/partition you want to clone, preceeded by the keywords "infile" or "compressed infile" in case of a compressed image. This file or partitions needs to be available on the master only. Dolly will warn you if you try to use a compressed infile which does not end with ".gz". The compressed keyword is important so that the master can inform the clients when they have to use gunzip before writing a file. The optional keyword "split" after the filename instructs Dolly to read all files with the given name and an appended number, separated by an underscore.
    Syntax: [compressed] infile input file or device [split]
    EG: infile /dev/sda10
    -> Will just send the partition /dev/sda10 to all clients.
    EG: compressed infile /images/cloneimages/sda10_WinNTRes.gz
    -> Will send the given file compressed to all the clients, instructing them to uncompress the image before writing it.
    EG: infile /images/cloneimages/sda split
    -> Will send all files of the form /images/cloneimages/sda_<number> in order to the clients.
    EG: compressed infile /images/cloneimages/sda.gz split
    -> Will send all files of the form /images/cloneimages/sda.gz_<number> in order to the clients, instructing them to decompress the incoming stream before writing it.

  2. The file or partition you want to write (usually its a partition, but you can also write to a file) after the keyword "outfile". This file needs to be available on the clients only. The optional keyword "compressed" instructs the server to compress the data before sending it, so the client will store the data compressed. The optional keyword "split" after the filename, followed by a number and a multiplier, instructs the client to write the data in junks of no more than the given size. This is useful if the file system on your client does not allow files greater than a certain size. The files will be stored with the given namen and an appended _<number>.
    Syntax: [compressed] outfile output file or device [split n(k|M|G|T)]
    EG: outfile /dev/sda10
    -> Will store the incoming data stream to the partition sda10.
    EG: compressed outfile /images/cloneimages/sda10_SuSE81.gz
    -> Will store the compressed data stream in the given file.
    EG: compressed outfile /images/cloneimages/sda_all.gz split 2G
    -> Will store the incoming compressed data stream in the directory /images/cloneimages/ in files sda_all.gz_0, sda_all.gz_1 and so on.

    Instead of the first two entries ("infile" and "outfile") it is also possible to use the single line "dummy [MB]", where MB is the number of Megabytes to transfer in dummy mode. If <MB> is set to 0, then the clients will just terminate. This is useful when benchmarking with different options, so the clients can run all the time. To finally terminate them on all clients, just set dummy to 0.
    NOTE: It is probably better to use the newer "-t" switch on the server to specify the number of seconds the benchmarks should run. In that case you can leave the <MB> blank.
    Syntax: dummy [MB]
    EG: dummy 128

    The optional keyword "segsize" is mostly used to benchmark switches. It specifies the maximal size of TCP segments during the network transfer. Usually you don't need to specify this option at all.
    Syntax: segsize TCP_MAXSEG_size
    EG: segsize 128

    With the optional keyword "add" it is possible to add more interfaces to use. The network traffic is then evenly distributed across the interfaces. This option is useful if you have for example two fast ethernet interfaces in your machines: One for administrative purposes and one for your main application on the cluster. This option is not so useful if you have multiple interfaces with different bandwidths. In this case just use the fastest available.
    You have to specify the number of additional interfaces and the suffixes of thouse interfaces. For example, in a cluster where the machines are named slave0..slave15 on their default interfaces and all the machines have a second interface named slave0-fast..slave15-fast, you should use the line specified below (EG).
    Syntax: add nr:suffix{:suffix}
    EG: add 1:-fast

    The optional keyword "fanout" was mostly used during performance tests of different network topologies. You barely need it in practice. Fanout specifies the number of outlinks from the server and the following machines (except the leafes). A fanout of 1 is a linear list (the default behaviour of Dolly and usually the fastest), 2 is a binary tree, 3 is a ternary tree, etc. Dolly automatically connects all the specified clients with the desired topology.
    Syntax: fanout fanout
    EG: fannout 1

    The optional keyword "hypennormal" instructs Dolly to treat the '-' character in hostnames as any other character. By default the hyphen is used to separate the base hostnames from the names of the different interface (e.g. "node12-giga"). You might use this paramater if your hostnames include a hypen (like e.g. "node-12").
    Syntax: hyphennormal
    EG: hyphennormal

  3. After the keyword "server" follows the hostname of the server (or master). This is required for the last machine in the ring to be able to send the end-acknowledge back to the server.
    Syntax: server master machine
    EG: server cluster-master

  4. This entry has the keyword "firstclient" followed by the hostname of the first client in the ring. You should use the hostname of the machine here, not the name of the interface where you want to connect.
    Syntax: firstclient name of first machine
    EG: firstclient cluster-1

  5. This entry has the keyword "lastclient" followed by the hostname of the last client in the ring. You should use the hostname of the machine here, not the name of the interface where you want to connect.
    Syntax: lastclient name of last machine
    EG: lastclient cluster-9

  6. This entry specifies how many clients are in the ring. The keyword is "clients" followed by the actual number of clients. This number does not include the master.
    Syntax: clients number of clients
    EG: clients 9

  7. The following lines contain the interface-names of the client machines. The number of machines must match the above number of clients (see 6.). You should use the name of the interface on which the machines will receive the data.
    Syntax: name of client 1
    name of client 2
    [...]
    name of client n

    EG: cluster-1-giga
    cluster-2-giga
    [...]
    cluster-9-giga

  8. The last entry in the config file consists of the keyword "endconfig" and marks the end of the configuration file.
    Syntax: endconfig
    EG: endconfig

Dolly options

Dolly has a few options which are explained here:

Starting the process

To start the cloning, you need to start dolly on each machine. It is recommended to start it with the "-v" (verbose) option. The order in which you start the programs on the master and the clients doesn't matter. You must give the "-s" (server) option on exactly one machine (the master).

When the machines have found each other and the ring is completed, the cloning starts. Dolly will print some progress information every 10 MBytes.

Example

In this example we assume a cluster of 16 machines, named node0..node15. We want to clone the partition sda5 from node0 to all other nodes. The configuration file (let's name it dollytab.cfg) should then look as follows:
  infile /dev/sda5
  outfile /dev/sda5
  server node0
  firstclient node1
  lastclient node15
  clients 15
  node1
  node2
  node3
  node4
  node5
  node6
  node7
  node8
  node9
  node10
  node11
  node12
  node13
  node14
  node15
  endconfig
Next, we start Dolly on all the clients. No options are required for the clients (but you might want to add the "-v" option for verbose progress reports). Finally, Dolly is started on the server as follows:
  dolly -v -s -f dollytab.cfg
That's all.
ICS [Laboratory for Computersystems], DINFK [Dept. of Computer Science], ETHZ [Swiss Institute of Technology], Patagonia Cluster Project.

Best viewed with any browser. Valid HTML 3.2!
Maintained by Felix Rauch.
Last changed: 8-may-2003