ALFRED RELEASE NOTES - Version 4.5
 
The Current Release
Version 4.5 Notes
The 4.0 Architecture
Earlier Releases
Version 4.1 Notes
Version 3.2 and older

What is Alfred

  • Alfred is a Scriptable Work Distribution System

     

  • Provides Fully Integrated Network Rendering for MTOR

     

  • Automatically Handles Parallel Rendering using Pixar's RenderManTM

     

  • Easily Scriptable for Custom Applications

     

  • User scripts specify what to launch, and keywords which describe server requirements; the script structure also describes a hierarchy of task dependencies.

     

  • Alfred chooses where and when to run the task, based on the best available remote servers, task dependencies, and job priority.

     

  • A fully integrated User Interface allows users to easily track job progress, detect errors, browse output logs, reprioritize and delete jobs.

 

The Basic Alfred 4.0 Architecture

Please see the Alfred 4.x Overview for brief description of the features and capabilites of the Alfred 4.0 architecture.

 


Alfred 4.5 - Release Notes

New Features

  • Throughput and Responsiveness   -   Several significant improvements have been made to both the dispatcher and maitre-d transaction handling. As a result, slot check-out latency has been greatly reduced, and more state management activity occurs asynchronously. At large sites, these speed-ups should dramatically improve slot assignment throughput and reduce or eliminate many time-out disconnects. Also, user interface response times will be much better on loaded networks.

  • DIRMAP - Per-server path mapping   -   Support has been added to allow certain platform-specific pathnames to be replaced with an appropriate network path on a per-server basis. See below.

  • Monitoring "live" task output   -   Output generate by tasks is accumulated by Alfred, and it is typically viewed by clicking on any specific task node and selecting "See Output Log". There is now also a way to immediately see new output, from all tasks, as it is received. There is a new checkbox in the Preferences dialog which opens a scrolling output area just below the scrolling launch log in the main Alfred window. This new log window shows the live combined output from all running tasks. This new window is sort of like watching the output logs with the unix "tail" command. This log provides a handy way to detect that something odd or interesting has happened.

  • Super-slot checkouts   -   Support has been added for checking out multiple slots on the same physical host as a group. The new "-samehost 1" option to Cmd and RemoteCmd triggers this mode.

  • Shared ping status   -   Pings defined in the alfred.schedule are used to confirm that a server slot is really ready to accept new work. Even though a given slot may be available in the schedule, there are often external factors which should be considered before really dispatching work to the associated host. Alfred server pings perform these tests. There are built-in pings, plus a mechanism for launching arbitrary site-specified queries. Since multiple server slots can be defined for each real host (usually one per CPU), a ping failure on one slot will often fail on all of the other slots on that host too. The cumulative effect of running the same failing test on many slots is that the maitre-d becomes bogged down. The 4.2 release now automatically causes a disqualifying ping from one slot to disqualify all slots on the same host. This results in improved performance. However, there are a few important cases in which a ping should only disqualify one slot at a time, these pings can be indicated with a small bit of additional ping syntax:

        set alfServerPing(somekey) {/usr/local/bin/someping %s}
        set alfServerPing(somekey) {+/usr/local/bin/someping %s}

    In this example, the first ping affects all slots on the same host (the new default behavior), while the second ping will be retried on each slot. The '+' prefix on the ping executable indicates the per-slot mode.

     

  • Less stringent nameserver requirements   -   The way in which alfred finds its fully-qualified hostname has been relaxed. Now use the default nameserver value, however misconfigured, for the initial discovery phases. After the initial setup is complete, a new setting called "localDomain" in alfred.ini comes into play. It is used primarily to ensure that http-mode URLs and authentication cookies are generated for the correct domain. If you specify a domain, e.g. "pixar.com", then it will be used in all URLs. If you use an empty string, then nameserver results will be used directly, which are often already fully qualified at many sites. Use the string "?" to indicate that alfred should guess at the domain name using some built-in heuristics.

  • Forcing inactive dispatchers to exit   -   there is a new setting in alfred.ini which can be used to force inactive dispatchers to exit. This can be useful when there are a limited number of dispatcher licenses available, or to just reduce the "clutter" of uninteresting maitre-d connections. The parameter "timerDoneJobDispatchExit" specifies the number of minutes that a dispatcher should remain inactive before shutting itself down. The default value "0" (zero) indicates that it should continue running even when inactive. The term inactive means that the dispatcher has no connected UI and that the all of the jobs remaining on the local queue are done. Note: in most circumstances the dispatcher will exit when the user closes their UI, this new setting controls the behavior of dispatchers that are running without a connected UI.

  • Maitre-d start-up wait period   -   another new parameter in alfred.ini is "timerMaitredInitialWait" which is used to set the number of seconds that a restarted maitre-d should wait before allocating new slots. This interval should be long enough to allow all previously running dispatchers to register their current slot usage with the new maitre-d. Registering current usage ensures that the new maitre-d won't assign server slots to new jobs when they are already in use by previously running jobs.

  • Distinct displays for UI and Dispatcher environment variables   -   The About Alfred menu item provides a button which displays the current environment variable settings, as seen by the user interface. There is a new button which fetches the environment from the dispatcher, which can be on a remote system in some cases.

  • Skip all tasks with errors   -   a new menu entry on the job control menu allows you to mark all tasks which have errors as done. This allows any tasks which depend on these errored-out tasks to proceed as if they had completed successfully. This can be useful when there are a lot of trivial errors pending during test renders, etc.

  • Tunable retry interval for problem slots   -   There is a new "tainting" scheme which causes a slot to be marked as problematic and become unscheduled for a short period when it turns out to be overcommited. The interval can be changed with a new alfred.ini setting: alfConfig(timerProblemSlotReuseInterval)

  • Process sweeping during job deletion   -   a mechanism for quickly shutting down all running tasks has been implemented. When a job is deleted, Alfred must shut down any remainging running tasks. Since each launched process may require some time to do its own clean-ups, the shut down process must wait before applying its successively more severe shut down mechanisms. The new process sweeping scheme applies each shut down pass to all running processes in parallel, rather than the previous serial approach which could take a long time if there were a lot of running tasks.

  • Detecting a retry launch   -   A new substitution pattern "%r" has been added for use in the command arguments for applications launched by alfred, via Cmd and RemoteCmd. Before execing the command, Alfred will substitute a zero or a one in place of the %r depending on whether the launch is the first attempt, or a retry of a previously failed or ejected command, respectively.

Bug Fixes

     

  • The job-parallel dispatching scheme was incorrectly making use of one dispatching rule from the spill-over scheme. This would sometimes result in one job on the local queue acquiring more slots than the others, rather than the roughly equal progress expected in parallel mode. Fixed.

     

  • Alfserver failure under Windows 2000   -   When Alfserver was started as a service under Windows 2000 and a userid other than "LocalSystem" was applied to the server, it would fail to register properly with the Service Control Manager during start-up. The SCM would then kill the (running) alfserver after several minutes. This has been fixed (4.1.1 patch).

     

  • RemoteCmd set-up   -   Several types of RemoteCmd initialization problems, due to unresponsive servers etc., are now handled differently to help avoid error conditions and to requeue remote tasks.

     

  • Web interface now uses client-pull refreshes   -   The HTTP interface provides several mechanisms for automatically updating a browser view of an active job. Previously, the mechanism was automatically chosen based on the browser-supplied "User-Agent" value, and a server-push mechanism was used for Netscape browsers, as per the Netscape developer recommendations. This push mechanism appears to cause problems with some newer browsers, and so the new default mechanism for all browsers is a client-pull scheme in which the browser is directed to auto-refresh the page periodically using a "Refresh" directive in the http headers. To disable this new behavior and revert to server-push, add the following to the alfred.ini file:   set alfConfig(httpDoServerPush) 1

     

  • Empty ping definitions   -   A maitre-d coredump caused by empty ping definitions in the schedule file has been fixed. An empty definition now correctly indicates that no ping is required for the given service key. This is the same behavior as if no ping definition were present, except in the case of the "default" ping. You can now completely disable the default low-overhead test of whether a host is responding on the network by using:     set alfServerPing(default) {}     This is not recommended in general since the default test provides a very useful server sanity check.

     

  • Illegal slot names   -   Slot names containing "-" (hyphen) where incorrectly being rejected as illegal by the syntax checker used by the external editor interface from the Advanced schedule interface. The following characters are now considered valid:   a-z A-Z 0-9 _ - .

     

  • Incomplete Watch-Servers windows   -   Alfred scripts in which Task title strings contained certain "problematic" characters were sometimes causing protocol problems during watch-servers updates. These errors could cause the status display to be incomplete or blank. This has been fixed.

     

  • Multiple dispatchers   -   The environment variable TMPDIR (or TEMP for Windows) was incorrectly being used to establish the location of temporary files used by Alfred to detect previously launched instances of the application. This could sometimes result in multiple dispatchers or UIs being started when Alfred was launched from environments which had different values for this variable. These small files are now always placed in /tmp (or C:/TEMP on Windows) so that they can be found reliably by subsequent invocations.

     

  • Corrected lookup for RemoteCmdFallback   -   The documented alfred.ini setting "alfConfig(RemoteCmdFallbackCommand)" was being ignored. The value "RemoteCmdFallback" was being queried instead, and since it was not defined in alfred.ini, a default value was always used instead. This has been fixed, and "RemoteCmdFallbackCommand" will be read correctly now. For compatibility with sites that used "RemoteCmdFallback" as a workaround, that value will be accepted as well, and takes precedence.

     

  • Task-Tree view of restarted jobs   -   When "Restart entire job" was selected from the Job Control menu, the task-tree diagram (the "DAG" view) was sometimes lost. This has been fixed.

     

  • Metrics reports handled more efficiently   -   the inbound queue of metrics updates from remote alfservers is now drained more efficiently; this helps to keep the maitre-d status more accurately up to date and also helps prevent the loss some metrics updates (UDP multicast packets) in high-load situations.

     

  • Watch-Servers UI lock-up on Linux   -   changing from the "Slot List" tab to any other tab would cause an infinite callback loop in the tk sliderbar handler. This would make the UI freeze, and peg the CPU at full usage. This has been fixed.

     

  • Remote dispatcher behavior preferences   -   when an Alfred UI is directed to connect to a remote dispatcher, the displayed preference values for settings controlling behavior such as "dispatching scheme" was previously taken from the UI owner's preferences. They are now taken from the current settings of the dispatcher, regardless of owner. Furthermore, applying new preferences to remote dispatchers only affects dispatching behavior preferences on the dispatcher, UI preferences are ignored.

     

  • Alfserver -rmap would sometimes have trouble running RemoteCmd applications following another job which used "netrender -R". This has been fixed.

     

  • Tilde expansion was sometimes being performed, incorrectly, on tildes that appeared within a word, rather than just those that appear at the beginnings of blank-delimited words.

     

  • Maitre-d crash due to a problem in the way unresponsive alfservers was being handled was fixed. Usually triggered by the server suffering a kernel panic or having its network cable unplugged, both of which leave the peer socket open.

     

  • Userids containing dots were sometimes causing the dispatcher to refuse to authenticate alfred UI connections; fixed.

     

  • Mixed-case official hostnames were sometimes causing the maitre-d to skip valid slots during huntgroup searches; fixed. This problem was caused by using mixed case names in the underlying system definition and/or nameserver, not in the alfred.schedule definition.

 


Dispatcher DIRMAP Support

As one part of the "dirmap" support in the RAT applications, Alfred now allows sites to to define mappings between platform-specific pathnames which are applied before commands are dispatched.

Alfred scripts describe commands to be executed and a hierarchical execution order. For commands which are executed remotely the script defines the type of remote server required using abstract keywords such as "pixarRender". The centralized Alfred "maitre-d" keeps track of server availability and assigns server slots to requesting dispatchers.

The current dirmap implementation requires site configuration of the Alfred job scripts and of the remote server keyword lists in the master alfred.schedule file.

The first step is to annotate each server slot definition in the master schedule with a "dirmap zone" name. These zone names are arbitrary strings which are used to categorize path mappings according to server type. The dispatcher will pick mappings from a job-specific list after a remote host has been bound (see below). The two zones refered to by MTOR-generated scripts are "UNC" and "NFS" which are intended to correspond the typical Windows and Unix network access modes respectively. The appropriate mode for a given server is the one used natively by that system when accessing remote files.

In order to minimize protocol and file-format changes required to support dirmapping, zone specification is done by simply adding an additional keyword to each server slot definition. This new keyword must have the format "zone=NAME". Most slots will already have keywords such as "pixarNRM" or "pixarRender", the zone name can just be added to the list:
    "pixarNRM pixarMTOR pixarRender zone=NFS"

The zone name is used as an "index" into the dirmap list associated with each job.

Here's a very simple example Alfred script which contains NO dirmap functionality:


  ##AlfredToDo 3.0
  Job -title "a simple job" -subtasks {
      Task -title {frame 1} -cmds {
          RemoteCmd {prman -Progress f:/rib/test.rib} -service {pixarRender} 
      }
  }

To make this script "dirmap-enabled" we must change indicate which pathnames are subject to mapping using the new "%D()" notation. We must also add a path mapping definition to the Job using the new "-dirmaps" option. Here's the revised script:

  ##AlfredToDo 3.0

  Job -title "a simple job" -dirmaps {

	{{f:/rib}       {//myhost/ribshare}  UNC}
	{{x:/textures}  {//shared/textures}  UNC}

  } -subtasks {

      Task -title {frame 1} -cmds {
          RemoteCmd {prman -Progress %D(f:/rib/test.rib)} -service {pixarRender} 
      }

  }

The format of the -dirmaps option is set of nested lists. The top level is just a sequence of mappings:
    -dirmap {map1 map2 ...}

each mapping itself has three components:
    map1: {FromPattern ToPattern ZoneName}
    map2: {FromPattern ToPattern ZoneName}

When the dispatcher requests a remote server from the maitre-d it receives the both the assigned hostname and the zone string as part of the reply. The dispatcher then uses the zone name to find applicable mappings in the -dirmap list, when doing %D() substitutions. The dirmap list can contain several mappings for the same zone, the patterns are applied in sequence until a match is found.

The example script above illustrates how paths are marked for substitution. The path is wrapped with "%D(path)" in the command specification. The leading part of the path string is compared to each FromPattern entry in the dirmap (in the appropriate zone). If a match is found, then the matching prefix is replaced with the corresponding ToPattern and the new path is substituted into the command line. If no dirmap entry matches the given path, it is inserted verbatim.

 


Known Problems and Workarounds

  • Alfred Scripting:   Tasks in Iterate templates must be grouped as subtasks under a single Task node. If the template contains several "top-level" Tasks then the current release has trouble both traversing the tasks and drawing a correct diagram of the job. For example, a job structured like this:
        Iterate n -from 1 -to 10 -by 1 -template {
            Task -title {Step $n part 1} -cmds {
                Cmd {process -step $n -part 1}
            }
            Task -title {Step $n part 2} -cmds {
                Cmd {process -step $n -part 2}
            }
        }
    	
    Should be restructured to have a container Task, like this:
        Iterate n -from 1 -to 10 -by 1 -template {
    
            Task -title {Step $n} -subtasks {
    
                Task -title {part 1} -cmds {
                    Cmd {process -step $n -part 1}
                }
                Task -title {part 2} -cmds {
                    Cmd {process -step $n -part 2}
                }
    
            }
    
        }
    	

 


 

Pixar Animation Studios
(510) 752-3000 (voice)   (510) 752-3151 (fax)
Copyright © 2000 Pixar. All rights reserved.
RenderMan® is a registered trademark of Pixar.