| 
 
  Codename Project BatCave
  	Alfred MySQL Logging and Web-based Database Browsing
 
  
	The project codenamed "BatCave" is an effort to export Alfred status
	and task execution history to a generic database in a form which allows
	people to develop monitoring and analysis tools which are specific
	to their needs and interests.
    
	Alfred components, such as the dispatcher, maitre-d, and alfserver
	have been modified to (optionally) log various data and events directly
	to a MySQL database.
	 
	The database provides a highly programmable environment for creating
	custom status queries and for doing historical analysis.  The database
	server can also offload the live status reporting/formatting function
	performed by the maitre-d in "watch-servers" mode, which can improve
	the maitre-d performance at large sites.
	 
	A php-based web interface for viewing the records in this SQL database
	is also shipped as part of the project. The web interface provides a
	broad range of basic functions and is fully customizable, so new
	site-specific queries and reports can easily be added.
     
	Several new alfred.ini settings control the basic database logging
	capability.  Bootstrap scripts are also provided for creating the
	database tables and the web interface.
	 
	This capability has broad implications for integrating Alfred with
	existing production databases, as well as with site control and
	planning projects.  Alfred features in this area will continue to
	grow and evolve from the initial groundwork being released here.
	 
	For more details, see the Project BatCave
	
	documentation.
     
	 
	 
	Your job scripts can update MySQL records too   --  
	In addition to the built-in Alfred data logging, Alfred scripts can
	also add or modify records in the database.  One approach is to simply
	have the one of the job tasks invoke a command-line application that
	makes the desired updates; this could be a custom application that
	makes use of the MySQL API, or it could be a direct or indirect
	invocation of the 'mysql' command-line client which comes with MySQL.
	There are also new Alfred scripting options,
	Job -sqlset and Task -sqlset
	which can update the BatCave tables directly.  The typical usage
	would involve creating one or more site-specific columns in the
	existing Job or Task tables; these custom columns
	would then be updated with values from the job script when the
	corresponding Job or Task record is created.  One generic column
	called "jobgroup" is already provided in the default
	Job table, it is intended for arbitrary site use in this
	way.  For example:
	 Job -title "my job" \
	-sqlset {jobgroup='shot-c57a'} \
	-subtasks ...   
  
   The Cmd/RemoteCmd "launch expression" which defines the command-line
  to be executed may contain several special "%" substitution macros
  that are expanded by Alfred before the command is launched.  There
  are substitutions for the names of bound servers, etc.  A few new
  ones have been added which are especially useful when dealing with
  the batcave databases:
   
  So an example usage might be the following pointless, and somewhat recursive,
  command:%J   expands to the batcave MySQL Job "jid" for the current job.
  %j   expands to the internal dispatcher job-id
	  for the current job.  Note that these are not globally unique.
  %t   expands to the Task "tid" for the current task.
  	While not globally unique, it is unique within the job, and is
	used both internally by Alfred and in the batcave tables (as "tid").
  %c   expands to the batcave MySQL Cmd "cmdid" for the current
	  Cmd/RemoteCmd.
   Cmd {mysql -h host -u user -D db \
      -e {select commandline from Cmd where jid=%J \
          and tid=%t and cmdid=%c}}   
  New and Different settings in alfred.ini 
  
   Alfred administrators are encouraged to browse through the
  new alfred.ini and perhaps "diff" it with the one
  currently in use.  There have been several changes and additions
  which may be relevant.  For example, there is now a way to limit
  the number of output log records retained on disk for each task
  in a job;  this can help reduce log file sizes when tasks have
  "run-away" diagnostics (like prints in shaders).
     
  Enable/Disable the "Watch Servers" and
  		"Master Schedule" menu items 
  
    
	There are new configuration settings in alfred.ini for
	controlling whether the Watch Servers and 
	Master Schedule menu items are enabled or
	disabled in the Alfred user interface.  Large sites
	may want to consider disabling Watch Servers
	in particular since it can add considerable load to the
	maitre-d process, when there are a lot of servers to
	monitor.  Also, many of the Project BatCave features (above)
	are intended to provide similar or improved functionality.
     
  
    
    Alfserver and the Alfred maitre-d "discover" each other on the network
    using multicast packets addressed to a particular multicast "session"
	address.
	 
	Having found each other, alfservers then deliver periodic status
	updates called metrics to the maitre-d (and now also to a MySQL
	database, see the BatCave discussion above).  These metrics are
	used as a basic measure of server health and the values can be
	used to make specific server assignment decisions. Starting with
	Alfred 6.5, metrics are reported to the maitre-d using point-to-point
	"unicast" udp packets.  In previous releases the metrics were also
	multicast back to the discovery address, for use by potentially many
	interested listeners.  The unicast approach can reduce some network
	overhead at large sites, especially in situations where the one-to-many
	nature of multicast traffic causes problems for smart network
	switches that try to optimize one-to-one communications.
	 
    Routers on the network ensure that the mulicast discovery messages
	are delivered to all "subscribed" systems.  By default, Alfred and
	alfserver use the multicast "session" address 239.255.224.99, port
	9002/udp.  Sites can change this multicast address by adding the
	hostname "alf-status" to the site nameserver (e.g. DNS, NIS,
	/etc/hosts, etc), and picking a new multicast address for it from
	the multicast range (224.0.0.0 - 239.255.255.255).  Note that there
	are IANA numbering conventions which apply to multicast addresses.
	 
    Alternatively, conventional "unicast" communications can be used
	for both discovery as well as metrics delivery. This is done by
	simply adding "alf-status" as a hostname alias for the maitre-d
	host's regular IP address, rather than using a multicast address.
	This approach is actually a way to bypass the "discovery" phase.
	The alfserver metrics will be sent as standard UDP packets directly
	to the named maitre-d.  Note that this approach should not be used
	with fallback maitre-ds, since alfservers would only know about
	the one named host, and metrics would only be delivered to that
	one host.
	 
	A new alfserver configuration setting, "metricsDelivery" can now
	be set to "multicast" to force metrics to be sent to the multicast
	address (as in releases prior to 6.5), so that any other interested
	listeners can receive them simultaneously.
	 
	There is also a new way to deliver configuration overrides to all
	alfservers from the maitre-d:   Create a file called
	$RATTREE/etc/alfsite.ini containing the overrides in
	an ini location accessible to the maitre-d, its
	contents will be sent by the maitre-d to the alfservers as part of
	the discovery process, along with the site metrics definitions.
   
  New Task Menu Item:  Try this task next 
  
  There is a new item that appears on the Task menu when you click
  on a particular task's box in the job diagram window.  This new
  entry allows you to request that the given task should be dispatched
  next, if possible.  This simply changes the local dispatcher's
  &next task& logic and does not affect the actual job
  priorities relative to jobs from other dispatchers.  This action
  is only available on tasks that are "Ready" to execute.
   
  
  The "inner loop" of the maitre-d server assignment algorithm has
  been changed.  The new code is both more uniform (there's only
  one assignment entry point), and more accurate in the face of
  a wide ranging mix of incoming request types and frequencies.
  
  The assigner code is also provided in source-code form, as
  has been true in prior releases, so sites can create an 
  "assigner plug-in" that implements an alternative set of policies.
   
  Note: there is currently no backward compatibility support for
  assigner plug-ins written for prior releases.  Sites that have
  such plug-ins will need to port the relevant changes to the new
  algorithm.  The existence of old plug-ins will not cause errors,
  since the maitre-d will just fall back to using the default 
  built-in scheme.  It distinguishes new plug-ins by searching for
  the new, versioned, name of the assigner object factory method.
   
   Improved the handling of assignment requests, see above.
  
   Improved the load-balancing among jobs on the local dispatching 
    queue when in "job parallel" mode.
  
  Connections to remote Alfred dispatchers, using "alfred -h user@host",
  now use the site maitre-d, if available, to determine the remote connection
  port.  The prior use of 'rsh' for this purpose, while nominally
  somewhat more secure, added unnecessary complexity at most sites and
  is increasingly unlikely to work as rsh support dwindles.  A new ini
  setting (rshForDispatcherDiscovery) can be used to restore the old
  behavior.
  
  Certain "alfserver not responding" situations are now handled
  	more correctly, and those servers are more consistently taken out of the
	assignment pool for a period specified by "timerAvoidNoListener".
  
   A bug was fixed in the handling of Alfred "maitredHost" lists in
    configuration files other than $RATTREE/etc/alfred.ini, such as found
	via $RAT_SCRIPT_PATHS.  If the primary maitre-d went offline, dispatchers
	using the alternate configuration file locations would sometimes end
	up in "chaos mode" (using a private, local, maitre-d), and then be
	unable to reconnect to the main maitre-d when it came back online.
  
  Fetching task output logs with lines longer than 1024 characters
  	sometimes failed due to faulty encoding for transmission. This
	has been fixed.
  
  Support for handling log files greater than 2GB in size has been
  	enabled on Linux systems.  This should fix problems loading
	existing job checkpoints for large jobs, and address crashes
	or other misbehavior when logs grew about 2GB.
  
   A new alfred.ini setting, "maxTaskOutput",
  	limits the number of records logged on a per-task basis.
	Some problems with large task output logs can be avoided
	if no individual task is allowed to log more than 5000
	records, for example.
  
   Upon receipt of SIGHUP on unix-style systems,
      the Alfred dispatcher and maitre-d now explicitly close and
  	reopen their diagnostic log files (as specified by the 
	"-log filename" command-line option).
	This should allow them to interoperate better with log
	rotation facilities such as logrotate
	on Linux systems.
  
  Fixed some cases where paths containing blank spaces would cause
    problems for launching certain applications.
  
  Fixed several problems involving Alfserver's handling of RMANCONFIG.
  
  Better temporary file names for Alfred jobs and logs are now chosen,
  to prevent occasional collisions on some systems.
  
   An issue with retrying preflight tasks in a job that caused Alfred 
  to crash has been fixed.
  
   Several problems related to "skipping" subtasks of a "shared server"
  parent task have been fixed.
  
   The timerMaitredQueue setting from alfred.ini is now obeyed properly.
  
   Alfred's Help menu now uses the HelpURLs set of preferences.
   
   There are several minimum version requirements for the servers
  used for the "BatCave" functions (e.g. php, mysql, alfserver).
  See the 
  Project BatCave documentation for details.
  
   Alfserver support for scriptable, key-based, per-command,
  environment configuration was recently extended to include
  support for "netrender -R key ..."  This includes the ability
  to select which user will own the resulting prman process.
  Currently, the alfserver ownership mode "login" is not a viable option
  for netrender connections.  This is because netrender and prman exchange
  some of their data on sockets which are connected to stdin and stdout;
  these connections do not survive the login set-up.  The alternative
  "setuid" mode works as expected, and is frequently a better choice
  anyway, from an administrative point of view.  Note that "login"
  continues to be supported for RemoteCmd usage, although again,
  "setuid" is usually the more manageable choice.
   
 Bug Fixes
	 An intermittent problem spooling new rendering jobs to Alfred
	from within Maya on Mac OS X as been fixed.
	 A problem which prevented Alfred from being able to open
	"../resources/alfred.brt" on Mac OS X when installed remotely on
	a case-sensitive file system has been addressed.
	 An issue with unicast metrics delivery interruption from Windows
	alfservers that occured when the maitre_d shut down has been fixed.
	Note that the full fix for this problem requires an updated
	alfserver.exe (12.5.2 and beyond).
	 An problem causing alfserver to sometimes repeatedly request
	alfsite.ini from the maitre-d has been fixed.
	 An issue with the envkey settings in alfserver.ini on Windows
	that prevented establishing a correct RATTREE has been fixed.
	 Job queries by hostname on the BatCave's main page now execute properly.
	 A potentially serious bug has been fixed regarding the way in
	which very long metrics definitions are buffered during transmission.
   
 Enhancements
       Threaded metrics processing  The Alfred maitre-d can now be configured to handle inbound metrics reports from Alfservers using a separate thread within the maitre-d process. See the alfred.ini setting "metricsReceiveThreaded" for the configuration information. This option should be considered at sites where the maitre-d's slot assignment throughput is diminished by large numbers of inbound metrics reports. (NOTE: This is for Linux and OS X only.)
       New low-latency pings have been implemented for the Alfred maitre-d. This allows sites that are using metrics to rely on them as preflights for costly pings. See the Low-Latency Pings discussion for details. 
 Bug Fixes
	 A bug in the "maitre-d initializing" wait period has been fixed. An incorrect conditional test allowed some slot assignments to occur before the entire initial-wait period had expired.
        A bug that caused the Alfred maitre-d to crash due to a dispatcher submitting "reconstruct current state" messages before disconnect messages from the prior instance of that dispatcher were fully processed has been fixed. 
   |