Job Priority and the Maitre-d Fairness Scheme


Priorities come into play when there several users running jobs on the network simultaneously. They are used to determine who should be allowed to check out a particular server which has become available, and who should wait for another one.

Alfred 3.0 differs substantially from previous versions in the way it uses priority values when allocating servers to dispatchers. Specifically, the new scheme is both more "fair" by default, and much easier to tune while the system is running. Both the regular alfred user interface and the web browser interface can be used to change priorities on a per-job basis.

When there are active jobs on the system, the "watch-servers" window will show a brief listing of server usage, including the priority at which each server was acquired (shown here in red):


 Animus     --> 1.1hr,  bones@lluvia  +10  "Shot 22b" (netrender) 
 Ormolu     --> 1.9min, david@sssph    +5  "Frame 3"  (netrender) 
 Grotto1    -->  28sec, david@sssph    +5  "Frame 5"  (netrender) 
 Grotto2    --> 5.3min, doc@spatula    +1  "mirrors1" (rendrib)
 Drizzle    --> 2.0min, david@sssph    +5  "Frame 2"  (netrender) 
 Alembic    --> 1.9min, david@sssph    +5  "Frame 4"  (netrender) 
 Cistern    --          butch(nimby) - shields raised
 [...]

Dispatchers get their default priority settings from the Crew definitions in the master schedule.

In the following examples there are three Crews defined, all of which are currently enabled. Each Crew gives a group of users access to a group of servers during blocks of time at a particular priority. Users, and servers, may appear in several Crew definitions. A user can access a particular server if any one of their Crews permit it. Similarly, their priority on a given server is the maximum of those defined in the various enabled Crews, if there are overlaps.

 

Users can modify the default server access and priorities defined for them in Crews using the "huntgroup" interface.

In the example above, none of the user's Crews are selected, and so there are no accessible servers showing under the "Servers" tab.

When a Crew is enabled, by clicking on its name, all of the servers associated with that Crew become available. when the crew named "pool" is selected several server entries become enable. Note the "clock" icon next to the entries; this means that the this Crew's servers are inaccessible at this moment because the Crew definition restricts access to certain times of day, after 6:00PM for example. The user is still allowed to select this Crew because its servers may become available over the course of a job.

When the "development" crew is added, more servers become available, including some that are not time-locked and can be used right now. Note especially that the server (slot) named "Cerberus-1" was accessible but time-locked via the "pools" crew, however it is also listed in the "development" crew definition which isn't time-locked, so it becomes available.

When the "production" is also enabled even more servers become available. Again note that the user gets the highest priority available from the overlapping crew definitions. Cerberus-1 was enabled at +5 by the "development" crew but is now available at +10 due to the default priority of "production". The servers named "Codger" and "Grotto" are unchanged because they are only listed in the "development" crew.

And finally, the Server Priority Bias field is used to add 100 to the user's default priority on each of the servers. Some crew definitions may restrict the maximum allowed change that a user can apply. The change or priority bias can also be negative, which would lower the user's priority on each server.

 


The Goal of Fairness-Based Priorities

So, given a set of priorities, how are they applied?

The system was designed with a few basic goals in mind:


How Priorities Relate to Percentage of Servers

  1. the priorities of all competing jobs are summed

  2. each job receives the fraction:  (itsPriority / Sum_of_Priorities)

Here are some simple examples. For clarity, assume that each user's priority is the same on all of their servers, and that the same servers are in contention among all jobs. The notation is: user*priority

So, priorities determine the relative allocation of servers. A job's (average) allocation with respect to another job is just the ratio of their respective priorities. In the example above, A will always be given half as many servers as C when they are competing. This is independent of the number of other jobs which are also on the queue. That is: the actual number of servers will vary depending on the total number of competing jobs, but any two jobs will maintain a constant ratio of servers held.

The actual priority values used are arbitrary, in fact they can be any (positive) real number. Large numbers can be used to rush a job through the system, since the job will always require a huge percentage of servers with respect other jobs. Two rush jobs with the same high priority will share the server pool.

It is important to remember that a given job can have different priorities on each currently available server. This is due to the (desirable) scheduling feature which allows each user to be a member of several active Crews simultaneously. For example, everyone might have access to a general pool of processors at a low priority, and in addition people on particular productions might be given access to a subset of these machines at a high priority, effectively guaranteeing them access when they need it but allowing the cycles to be used by others when they don't.

For a given scheduling period, the current priority values are found be overlaying the crew definitions and taking the highest priority available, for each user. The user-defined per-job priority bias modifies these defaults, it is added to the crew values as they are merged together (some crews will also enforce a maximum allowed range of bias values). As a result, the percentages at the heart of the fairness algorithm are always computed with respect to a particular available server. The maitre-d, which performs these computations, looks at the list of dispatchers competing for a server and asks: which of these jobs needs the server the most? Where "need" is defined as the gap between a dispatcher's ideal percentage of servers and what it actually has, given it's priority on the server in question.

 

Pixar Animation Studios
(510) 752-3000 (voice)   (510) 752-3151 (fax)
Copyright © 1996- Pixar. All rights reserved.
RenderMan® is a registered trademark of Pixar.