What is Alfred
- Alfred is a Scriptable Work Distribution System
- Provides Fully Integrated Network Rendering for
MTOR
- Automatically Handles Parallel Rendering using
Pixar's RenderManTM
- Easily Scriptable for Custom Applications
- User scripts specify what to launch,
and keywords which describe server requirements; the script
structure also describes a hierarchy of task dependencies.
- Alfred chooses where and when
to run the task, based on the best available remote servers,
task dependencies, and job priority.
- A fully integrated User Interface allows users to easily
track job progress, detect errors, browse output logs, reprioritize
and delete jobs.
The Basic Alfred 4.0 Architecture
Please see the Alfred 4.x Overview
for brief description of the features and capabilites of the
Alfred 4.0 architecture.
Alfred 4.5 - Release Notes
New Features
- Throughput and Responsiveness
-
Several significant improvements have been made to both the dispatcher
and maitre-d transaction handling. As a result, slot check-out
latency has been greatly reduced, and more state management
activity occurs asynchronously. At large sites, these speed-ups
should dramatically improve slot assignment throughput and reduce
or eliminate many time-out disconnects. Also, user interface response
times will be much better on loaded networks.
- DIRMAP - Per-server path mapping -
Support has been added to allow certain platform-specific pathnames
to be replaced with an appropriate network path on a per-server
basis. See below.
- Monitoring "live" task output -
Output generate by tasks is accumulated by Alfred, and it is typically
viewed by clicking on any specific task node and selecting "See
Output Log". There is now also a way to immediately see new output,
from all tasks, as it is received. There is a new checkbox in the
Preferences dialog which opens a scrolling output area just below
the scrolling launch log in the main Alfred window. This new log
window shows the live combined output from all running tasks. This
new window is sort of like watching the output logs with the unix
"tail" command. This log provides a handy way to detect that
something odd or interesting has happened.
- Super-slot checkouts -
Support has been added for checking out multiple slots on the
same physical host as a group. The new "-samehost 1"
option to Cmd and RemoteCmd triggers this
mode.
- Shared ping status
- Pings defined in the alfred.schedule are used to confirm
that a server slot is really ready to accept new work. Even though
a given slot may be available in the schedule, there are often external
factors which should be considered before really dispatching work to
the associated host. Alfred server pings perform these tests. There
are built-in pings, plus a mechanism for launching arbitrary
site-specified queries. Since multiple server slots can be defined
for each real host (usually one per CPU), a ping failure on one slot
will often fail on all of the other slots on that host too. The
cumulative effect of running the same failing test on many slots is
that the maitre-d becomes bogged down. The 4.2 release now automatically
causes a disqualifying ping from one slot to disqualify all slots on
the same host. This results in improved performance. However, there
are a few important cases in which a ping should only disqualify one
slot at a time, these pings can be indicated with a small bit of
additional ping syntax:
set alfServerPing(somekey) {/usr/local/bin/someping %s}
set alfServerPing(somekey) {+/usr/local/bin/someping %s}
In this example, the first ping affects all slots on the same
host (the new default behavior), while the second ping will be
retried on each slot. The '+' prefix on the ping executable
indicates the per-slot mode.
- Less stringent nameserver requirements -
The way in which alfred finds its fully-qualified hostname has been
relaxed. Now use the default nameserver value, however misconfigured,
for the initial discovery phases. After the initial setup is complete,
a new setting called "localDomain" in alfred.ini comes into play. It
is used primarily to ensure that http-mode URLs and authentication
cookies are generated for the correct domain. If you specify a domain,
e.g. "pixar.com", then it will be used in all URLs. If you use an empty
string, then nameserver results will be used directly, which are often
already fully qualified at many sites. Use the string "?" to indicate
that alfred should guess at the domain name using some built-in
heuristics.
- Forcing inactive dispatchers to exit
-
there is a new setting in alfred.ini which can be used to
force inactive dispatchers to exit. This can be useful when there
are a limited number of dispatcher licenses available, or to just
reduce the "clutter" of uninteresting maitre-d connections.
The parameter "timerDoneJobDispatchExit" specifies
the number of minutes that a dispatcher should remain inactive
before shutting itself down. The default value "0" (zero) indicates
that it should continue running even when inactive. The term
inactive means that the dispatcher has no connected UI and
that the all of the jobs remaining on the local queue are done.
Note: in most circumstances the dispatcher will exit when the
user closes their UI, this new setting controls the behavior of
dispatchers that are running without a connected UI.
- Maitre-d start-up wait period -
another new parameter in alfred.ini is
"timerMaitredInitialWait" which is used to
set the number of seconds that a restarted maitre-d should wait
before allocating new slots. This interval should be long enough
to allow all previously running dispatchers to register their
current slot usage with the new maitre-d. Registering current
usage ensures that the new maitre-d won't assign server slots to
new jobs when they are already in use by previously running jobs.
- Distinct displays for UI and Dispatcher environment variables
- The About Alfred menu item provides a button
which displays the current environment variable settings, as seen by
the user interface. There is a new button which fetches the environment
from the dispatcher, which can be on a remote system in some cases.
- Skip all tasks with errors -
a new menu entry on the job control menu allows you to mark all
tasks which have errors as done. This allows any tasks which
depend on these errored-out tasks to proceed as if they had
completed successfully. This can be useful when there are a
lot of trivial errors pending during test renders, etc.
- Tunable retry interval for problem slots -
There is a new "tainting" scheme which causes a slot to be marked as
problematic and become unscheduled for a short period when it
turns out to be overcommited. The interval can be changed with
a new alfred.ini setting: alfConfig(timerProblemSlotReuseInterval)
- Process sweeping during job deletion -
a mechanism for quickly shutting down all running tasks has been
implemented. When a job is deleted, Alfred must shut down any remainging
running tasks. Since each launched process may require some time to
do its own clean-ups, the shut down process must wait before applying
its successively more severe shut down mechanisms. The new process
sweeping scheme applies each shut down pass to all running processes
in parallel, rather than the previous serial approach which could
take a long time if there were a lot of running tasks.
- Detecting a retry launch -
A new substitution pattern "%r" has been added for use
in the command arguments for applications launched by alfred, via Cmd
and RemoteCmd. Before execing the command, Alfred will substitute
a zero or a one in place of the %r depending on whether the launch is
the first attempt, or a retry of a previously failed or ejected command,
respectively.
Bug Fixes
- The job-parallel dispatching scheme was
incorrectly making use of one dispatching rule from the spill-over
scheme. This would sometimes result in one job on the local queue
acquiring more slots than the others, rather than the roughly equal
progress expected in parallel mode. Fixed.
- Alfserver failure under Windows 2000
- When Alfserver was started as a service under
Windows 2000 and a userid other than "LocalSystem"
was applied to the server, it would fail to register properly with
the Service Control Manager during start-up. The SCM would then
kill the (running) alfserver after several minutes.
This has been fixed (4.1.1 patch).
- RemoteCmd set-up -
Several types of RemoteCmd initialization problems,
due to unresponsive servers etc., are now handled differently
to help avoid error conditions and to requeue remote tasks.
- Web interface now uses client-pull refreshes -
The HTTP interface provides several mechanisms for automatically
updating a browser view of an active job. Previously, the mechanism
was automatically chosen based on the browser-supplied "User-Agent"
value, and a server-push mechanism was used for Netscape
browsers, as per the Netscape developer recommendations. This
push mechanism appears to cause problems with some newer browsers,
and so the new default mechanism for all browsers is a client-pull
scheme in which the browser is directed to auto-refresh the page
periodically using a "Refresh" directive in the http headers. To
disable this new behavior and revert to server-push, add the following
to the alfred.ini file:
set alfConfig(httpDoServerPush) 1
- Empty ping definitions -
A maitre-d coredump caused by empty ping definitions in the
schedule file has been fixed. An empty definition now
correctly indicates that no ping is required for the given
service key. This is the same behavior as if no ping definition
were present, except in the case of the "default" ping. You
can now completely disable the default low-overhead test of
whether a host is responding on the network by using:
set alfServerPing(default) {}
This is not recommended in general since the default test
provides a very useful server sanity check.
- Illegal slot names -
Slot names containing "-" (hyphen) where incorrectly being
rejected as illegal by the syntax checker used by the external
editor interface from the Advanced schedule interface. The
following characters are now considered valid:
a-z A-Z 0-9 _ - .
- Incomplete Watch-Servers windows -
Alfred scripts in which Task title strings contained certain "problematic"
characters were sometimes causing protocol problems during watch-servers
updates. These errors could cause the status display to be incomplete or
blank. This has been fixed.
- Multiple dispatchers -
The environment variable TMPDIR (or TEMP for Windows) was incorrectly being
used to establish the location of temporary files used by Alfred
to detect previously launched instances of the application. This
could sometimes result in multiple dispatchers or UIs being started
when Alfred was launched from environments which had different values
for this variable. These small files are now always placed in
/tmp (or C:/TEMP on Windows) so that they can be found
reliably by subsequent invocations.
- Corrected lookup for RemoteCmdFallback -
The documented alfred.ini setting "alfConfig(RemoteCmdFallbackCommand)"
was being ignored. The value "RemoteCmdFallback" was being queried
instead, and since it was not defined in alfred.ini, a default value
was always used instead. This has been fixed, and
"RemoteCmdFallbackCommand" will be read correctly now. For
compatibility with sites that used "RemoteCmdFallback" as a workaround,
that value will be accepted as well, and takes precedence.
- Task-Tree view of restarted jobs -
When "Restart entire job" was selected from the Job Control menu, the
task-tree diagram (the "DAG" view) was sometimes lost. This has been fixed.
- Metrics reports handled more efficiently
-
the inbound queue of metrics updates from remote alfservers is now
drained more efficiently; this helps to keep the maitre-d status more
accurately up to date and also helps prevent the loss some metrics
updates (UDP multicast packets) in high-load situations.
- Watch-Servers UI lock-up on Linux
-
changing from the "Slot List" tab to any other tab would cause an
infinite callback loop in the tk sliderbar handler. This would
make the UI freeze, and peg the CPU at full usage. This has been
fixed.
- Remote dispatcher behavior preferences
-
when an Alfred UI is directed to connect to a remote dispatcher, the
displayed preference values for settings controlling behavior such as
"dispatching scheme" was previously taken from the UI owner's preferences.
They are now taken from the current settings of the dispatcher, regardless
of owner. Furthermore, applying new preferences to remote dispatchers
only affects dispatching behavior preferences on the dispatcher, UI
preferences are ignored.
- Alfserver -rmap would sometimes have
trouble running RemoteCmd applications following another job
which used "netrender -R". This has been fixed.
- Tilde expansion was sometimes being performed, incorrectly, on
tildes that appeared within a word, rather than just those that appear at
the beginnings of blank-delimited words.
- Maitre-d crash due to a problem in the way unresponsive alfservers
was being handled was fixed. Usually triggered by the server suffering a
kernel panic or having its network cable unplugged, both of which leave the
peer socket open.
- Userids containing dots were sometimes causing the dispatcher to
refuse to authenticate alfred UI connections; fixed.
- Mixed-case official hostnames were sometimes causing the
maitre-d to skip valid slots during huntgroup searches; fixed. This
problem was caused by using mixed case names in the underlying system
definition and/or nameserver, not in the alfred.schedule definition.
Dispatcher DIRMAP Support
As one part of the "dirmap" support in the RAT applications, Alfred now
allows sites to to define mappings between platform-specific pathnames
which are applied before commands are dispatched.
Alfred scripts describe commands to be executed and a hierarchical
execution order. For commands which are executed remotely the script
defines the type of remote server required using abstract keywords such
as "pixarRender". The centralized Alfred "maitre-d" keeps track of
server availability and assigns server slots to requesting dispatchers.
The current dirmap implementation requires site configuration of
the Alfred job scripts and of the remote server keyword lists in the
master alfred.schedule file.
The first step is to annotate each server slot definition in the master
schedule with a "dirmap zone" name. These zone names are arbitrary
strings which are used to categorize path mappings according to server
type. The dispatcher will pick mappings from a job-specific list
after a remote host has been bound (see below). The two zones refered
to by MTOR-generated scripts are "UNC" and "NFS" which are intended to
correspond the typical Windows and Unix network access modes respectively.
The appropriate mode for a given server is the one used natively by
that system when accessing remote files.
In order to minimize protocol and file-format changes required to
support dirmapping, zone specification is done by simply adding
an additional keyword to each server slot definition. This new
keyword must have the format "zone=NAME". Most slots will already
have keywords such as "pixarNRM" or "pixarRender", the zone name
can just be added to the list:
"pixarNRM pixarMTOR pixarRender zone=NFS"
The zone name is used as an "index" into the dirmap list associated
with each job.
Here's a very simple example Alfred script which contains NO
dirmap functionality:
##AlfredToDo 3.0
Job -title "a simple job" -subtasks {
Task -title {frame 1} -cmds {
RemoteCmd {prman -Progress f:/rib/test.rib} -service {pixarRender}
}
}
|
To make this script "dirmap-enabled" we must change indicate which
pathnames are subject to mapping using the new "%D()" notation. We
must also add a path mapping definition to the Job using the new
"-dirmaps" option. Here's the revised script:
##AlfredToDo 3.0
Job -title "a simple job" -dirmaps {
{{f:/rib} {//myhost/ribshare} UNC}
{{x:/textures} {//shared/textures} UNC}
} -subtasks {
Task -title {frame 1} -cmds {
RemoteCmd {prman -Progress %D(f:/rib/test.rib)} -service {pixarRender}
}
}
|
The format of the -dirmaps option is set of nested lists. The top
level is just a sequence of mappings:
-dirmap {map1 map2 ...}
each mapping itself has three components:
map1: {FromPattern ToPattern ZoneName}
map2: {FromPattern ToPattern ZoneName}
When the dispatcher requests a remote server from the maitre-d it
receives the both the assigned hostname and the zone string as part
of the reply. The dispatcher then uses the zone name to find applicable
mappings in the -dirmap list, when doing %D() substitutions.
The dirmap list can contain several mappings for the same zone, the
patterns are applied in sequence until a match is found.
The example script above illustrates how paths are marked for
substitution. The path is wrapped with "%D(path)" in the command
specification. The leading part of the path string is compared to
each FromPattern entry in the dirmap (in the appropriate zone).
If a match is found, then the matching prefix is replaced with the
corresponding ToPattern and the new path is substituted into the
command line. If no dirmap entry matches the given path, it is
inserted verbatim.
Known Problems and Workarounds
- Alfred Scripting: Tasks in Iterate templates must be grouped
as subtasks under a single Task node. If the template contains
several "top-level" Tasks then the current release has
trouble both traversing the tasks and drawing a correct diagram
of the job. For example, a job structured like this:
Iterate n -from 1 -to 10 -by 1 -template {
Task -title {Step $n part 1} -cmds {
Cmd {process -step $n -part 1}
}
Task -title {Step $n part 2} -cmds {
Cmd {process -step $n -part 2}
}
}
Should be restructured to have a container Task, like this:
Iterate n -from 1 -to 10 -by 1 -template {
Task -title {Step $n} -subtasks {
Task -title {part 1} -cmds {
Cmd {process -step $n -part 1}
}
Task -title {part 2} -cmds {
Cmd {process -step $n -part 2}
}
}
}
|