|
Patagonia - A Dual Use Cluster of PCs for Computation and Education
|
|
Computer clusters in major research universities are used for research
and education. While both kinds of installations can be called
clusters, the installations are far from looking identical.
The PCs of an education cluster are workstations that are
sparsely distributed across tables in a classroom and do have screens,
keyboards and mice, while clusters of PCs for high performance
computing are densely packed rack mounted systems kept in a cooled
machine room and wired to one central operator console. Despite the
completely different look of the two kinds of clusters, the current
trends in technology mandate that they are built with nearly the same
components.
Typically the education
clusters are only used during the day and maybe in the evenings
whereas the compute clusters are often idle. Complementary use
would therefore hardly improve the cost effectiveness of the installations
As a consequence we initiated the Patagonia cluster project at ETH
Zürich to build a cluster that fits both the needs of education and
the needs of research. While education has priority during the day on
our cluster, research has priority during the night and with some
limitations during vacations.
While our ideas and the corresponding experimental study clearly
originates from a university environment we would like to mention the
striking similarities to many modern corporate computing environments
using PCs. In those environments the computing needs are quite
similar. Most companies rely on a rapidly growing number of compute
intensive tasks in data mining, combinatorial optimization and process
simulations in addition to the typical personal computing needs of a
large number of employees at their desks.
Patagonia Talk:
acrobat,
compressed postscript,
html.
|
Patagonia CloneSys - A Tool to Install Multi-Boot Environments
Everybody knows the drudgery involved in manually setting up and
rolling out new PCs, updating existing PCs, and recovering failed PCs.
CloneSys makes a big dent in the time required to conduct these jobs.
It first creates an exact image of a PC's hard drive or disk partitions,
effectively taking a snapshot of all the files - hidden, visible,
and active - that make up the operating system, applications, and
configuration settings. The image can then be copied to any number of
PCs, thereby creating completely identical installations. Moreover, it
can be copied to many PCs simultaneously. This process is completely
operating system independent which makes the system very easy and
fast. As the raw disk data is copied, no file system initialization is
needed, even the partitioning is implicitly done with a total clone.
To support multi-boot environments the installation of single
partitions is supported as well.
To keep the system even easier as other commercially available tools
like Norton Ghost from Symantec, ImageCast from Innovative
Software Ltd. or DriveImagePro form PowerQuest,
we based it on freely available UNIX tools and Linux. For an initial
boot of an uninstalled machine we use
muLinux which is a
minimalistic, but mostly complete, script-based Linux distribution that
fits on a single 1722k floppy disk. It includes many basic system functions,
such as Ethernet support, NFS, Samba, FTP, DHCP etc which is all we
need. Further we install a small Linux distribution permanently on
the harddisk which lets us update the cluster very fast by simply
booting this Linux on all machines and remotely executing our cloning
scripts.
Locking mechanisms on partitions allow to install special setups on a
part of the cluster preventing the image to be overwritten by others.
|
Relevant Materials:
-
F. Rauch, Ch. Kurmann, T. Stricker:
Partition Repositories for Partition Cloning - OS Independent
Software Maintenance in Large Clusters of PCs.
Proceedings of the IEEE International Conference on Cluster Computing 2000, Chemnitz, Germany, Nov 28 - Dec 2, 2000.
Available formats:
abstract,
acrobat,
compressed postscript.
-
F. Rauch, Ch. Kurmann, T. Stricker:
Partition Cast - Modelling and Optimizing the Distribution of
Large Data Sets in PC Clusters.
Euro-Par 2000 -- Parallel Processing, Arndt Bode and Thomas Ludwig (Editors),
Springer, Lecture Notes in Computer Science 1900, ISBN 3-540-67956-1.
Presented as distinguished paper at: European Conference on Parallel Computing,
Euro-Par 2000, Munich, Germany, August 29 - September 1, 2000.
A similar paper appeared as technical report No. 343,
Department of Computer Science, ETH Zürich.
Available formats:
abstract,
acrobat,
compressed postscript.
Slides:
acrobat,
postscript,
compressed postscript.
-
F. Rauch, C. Kurmann, B. M. Müller-Lagunez, T. M. Stricker
Patagonia - A Dual Use Cluster of PCs for Computation and Education
Proc. of the second workshop on Cluster-Computing, 25./26. March 1999,
Karlsruhe, Germany.
Available formats:
abstract,
acrobat,
postscript,
compressed-postscript.
Talk:
acrobat,
postscript,
compressed-postscript.
Download
 |
The Patagonia CloneSys is designed to provide easy system
installations on multi-boot and multi-purpose Clusters of PC's. Do
keep it simple and circumvent the development of a proprietary system was
our aim. Therefore it is based on standard UNIX tools and can easily be
ported to other flavours of UNIXes.
The scripts and ideas are subject to
change and are provided as they are without any support.
You can download our Cloning System Scripts
clonesys_v0.9.tar.gz (91 kB). It runs on standard
UNIX systems, but it is especially designed for Linux-Systems to install
multiboot PC Clusters. Read the Quick Reference
(quickinit.pdf) and the
CloneSys Documentation
(clonesys.pdf)
for further information. |
Other tools and helpful documentation:
Please note that the materials listed here are helpful for our own
cluster maintenance. We provide them in the hope that they can be
helpful for others too, but we can not give any support for them.
- CAT: Cluster Administration Tool. It
helps to gain an overview of the current state of a multi-use cluster
and shows the currently running operating system, machine load and
users with their corresponding idle times for each node. The client
part of the tool as well as the daemon for Linux was mostly written in
Tcl/Tk, while the daemon for Windows NT is in C.
Files: The tool including source CAT.tar.gz (234 kB), the documentation rspuler.pdf (95 kB, in german! Includes
some screenshots), the english
abstract.
- Dolly. This is a program to clone harddisks or partitions
over a fast switched network. It does so by building a virtual TCP
ring between the machine with the disk/partition to be cloned and all
the machines where the disk/partition should be written. It works with
raw partitions as well as (possibly compressed) image files. We were
able to clone a 2 GB Windows NT partition using dolly to 15 machines
over Gigabit Ethernet in less than 4 minutes.
Files: The latest stable version of Dolly including source and README
is 0.57: dolly.0.57.tar.gz (21 KB).
A newer, but not yet so thoroughly tested version is 0.58C: dolly.0.58C.tar.gz (25 KB). The
most notable feature of version 0.58 is that it now allows the use of
the standard input and standard output to read and write
files. I.e. you can now use tar to clone directory
trees. This version might break some third-party scripts, because
dolly now prints all of its regular output to standard error instead
of standard output. Version 0.58C introduces a flag that tells Dolly
not to sync() before it exits, thereby reducing the runtime
for smaller files.
The current documentation is in HTML or ASCII. There is also a directory with other versions of Dolly.
A research paper on Dolly was presented at
Euro-Par 2000
(29.8.-1.9.2000 in Munich, Germany). See above (relevant materials) for more information.
Atsushi Manabe of KEK (Tsukuba, Japan) provides an alternative
implementation called Dolly+ with some
other features (e.g. multi-file transfers and a fail-safe mechanism to
bypass crashed nodes).
A new utility based on Dolly is nettee, a network 'tee' program maintained by David Mathog. It is a simpler, cleaned-up program based on Dolly's source code.
- Zip bootdisk which runs Linux in a ramdisk. This
documentation describes the required steps the generate a pair of a
floppy- and a Zip-disk which boots Linux completely from a
RAM-disk. This is useful to boot a whole cluster of new (or empty)
machines to a small Linux system and install the full installation
over the network.
Available formats: HTML
Dolly together with the Zip bootdisk can be useful to install the
harddisks of all machines in a new cluster within reasonable time.
|
Departement Informatik der ETH Zürich
May 2005
Christian Kurmann,
< kurmann@inf.ethz.ch >
Felix Rauch,
< rauch@inf.ethz.ch >
|