This document contains information on the operational details of the maitre-d; for information on setting up the schedule and its interface see the scheduling discussion.
Recall that dispatcher queries regarding remote server availability and launch restrictions are handled by the maitre_d. This daemon-mode alfred is typically launched on a centralized server machine by a system administrator or automatically when a system reboots. The recommended launch options for the daemon are:
alfred -maitre_d -v -log /usr/tmp/maitre_d.logThe log file will contain important error messages and brief job history information. The name of the log file is arbitrary, however it should be in a directory that won't be cleared on reboot. Also, if multiple users are restarting the maitre-d then the log needs to have the proper write-permissions.
There is typically a single maitre_d running on a single host somewhere on the network, this host is listed in the alfred.ini file so that dispatchers know where to look for it (connections are TCP sockets to a predefined port). If there is no network-wide maitre_d defined, or running, the individual dispatchers can do their own scheduling, with the loss of some functions such as global job prioritization (this is known as chaos mode). It is also possible to start several maitre-d processes such that there is a primary daemon and one or more fallback maitre-d servers.
The scheduling information is organized as a hierarchy of groups which define Crews: groups of people who have access to groups of servers during blocks of time.
All of the remote service information is kept in the alfred
master schedule file, which by default is installed into
$RMANTREE/lib/alfred/alfred.schedule
(if the environment
variable RMANTREE isn't set, /usr/local/prman
is used).
In general, it is a good idea to limit the number of people who can alter the master schedule. Since this schedule represents the global resource allocation scheme for all alfred dispatchers, it usually makes sense for it to be under the control of a project coordinator who is making all the other system-wide resource decisions. Ultimately, access to the schedule is controlled simply by file read/write permissions. Alfred recognizes when a user has write permission to the file, and enables the editing portions of the interface accordingly. Hence the access can be restricted to a single individual, a group, or everyone. If there is a centralized maitre_d running, the schedule file need only be readable by whoever starts the daemon. If there is no centralized maitre_d host specified (chaos mode), then the schedule needs to be readable by every user who might start an alfred dispatcher.
The maitre-d periodically monitors the last modification time on the master schedule file, as returned by stat(2), and rebuilds its internal data when the file is updated. Note: if the maitre-d is reading the schedule file via NFS, then it may take a few moments for file updates to propagate to the maitre-d's host.
Each time a dispatcher makes a check-out request for a server, the maitre-d performs a series of checks before assigning an appropriate server to the dispatcher. All of the checks must pass:
If a check fails, and all of the available servers have been checked or are in use, then the dispatcher's request is put onto a waiting queue. The dispatcher will look for other types of work to do and periodically check its queued request. The priority mechanism determines the actual order in which queued requests from multiple dispatchers are handled.The maitre-d also does performs a few other tasks, most notably generating current server status reports for the "watch servers" display. It also handles HTTP connections from users who want to use their web browser to see job updates. This includes handling wrangler mode requests from Alfred "superusers".
Pixar Animation Studios
|