Multi-threaded RenderMan 
March, 2007

 
Introduction

Performance Expectations

Performance Tuning


1 Introduction to Multi-threaded RenderMan

RenderMan now supports multi-threaded rendering. Within a single prman invocation, multiple threads will process the image simultaneously. If there are multiple processing units available, this will usually result in a faster time to completion for a given set of input data. The primary advantage of multi-threaded rendering over multi-process rendering (which is the method employed by netrender and -p:n modes) is reduced memory footprint. When multi-processing, each process will consume nearly the same amount of memory per frame as a single process would. With multi-threading, all threads combined will use not very much more memory than a single process.

By default, prman will determine the number of processing units available and will operate in multi-threaded mode. A single invocation of prman will only consume one license by default (to prevent consumption of many licenses on machines with many processing units). The user can override the default behavior by specifying the number of processing units that will be used with the -t:n option, where n is the number of processing units the user wants the renderer to utilize.

Because prman consumes much less memory when using threads than when launching multiple processes, the default behavior for prman is to only utilize threading. If the user specifies -p:n, it will be interpreted as -t:n. If both -t and -p are used, only the value provided via -t will be considered. This default behavior can be reverted to the old multi-process behavior by placing the following in the rendermn.ini file:

/prman/parallelmode  multiprocess

The user can also change the default number of processing units with a setting in the rendermn.ini file:

/prman/nprocessors  1

This is useful if one wants to override the default behavior, which queries the system for the number of processing units available. If the nprocessors setting is found, it will use that setting instead of querying the system. Note, this setting affects the default number of processing units utilized when using the -p:n option as well.


2 Performance Expectations

prman in multi-threaded mode should almost always result in faster render times than prman in single threaded mode. The amount of speed up is highly scene dependent, as scenes involve varying usages of resources that must be shared between processing units (RAM and disk). As a general rule of thumb, the scalability of multi-threaded rendering improves with more complicated scenes, which is fortunate since it is exactly those kinds of scenes that most need speeding up.

In the current 13.5 release, there is one area where PRMan will not exploit multi-threading to its fullest potential: when very many RiProcedurals are used, such that the render runtime is dominated by the cost of evaluating user procedurals (for example, very expensive Dynamic Shared Objects or RunProgram-style procedurals). This will be improved in future releases.

When using prman in multi-threaded mode, one has to reconcile real-time (also know as elapsed-time) statistics with the user-time statistics. When multi-threading, user-time will be the total amount of processor time utilized by all processing units on the system. This will normally be more than the real-time because multiple processing units will be active simultaneously.

In multi-threaded mode prman should use much less memory per scene than multi-processing mode. However, some of the rendering subsystems will utilize more memory in multi-threaded mode than single-threaded mode to maintain performance. One subsystem that will utilize more memory is texturing (both 2D and 3D). The texturing system will create texture caches per processing unit that will consume slightly more memory than an invocation of prman that utilizes only a single processor. Likewise, ray tracing will create a geometry cache per processor and will consume slightly more memory in multi-threaded mode than in single-threaded mode. Finally, hiding can occur simultaneously in multiple buckets, which will lead to increased visible point memory footprint; this is noticeable especially for scenes with high depth complexity.



If a scene employs shaders that use old-style RSL plugins, those should be ported to the new format. Old-style RSL plugins will cause the multi-threaded render to lock. This only allows the execution of one old-style RSL plugin to occur at a time, which can significantly impact the effectiveness of the multi-threaded renderer.


3 Performance Tuning

There are several options that can be used to control the performance of multi-threaded prman. The first option, of course, is -t:n, where the user can specify n processing units to be used. If the system has more than two processing units (the default will be to utilize up to two) a user could specify more processing units (which will, in turn, use more licenses) and peformance should increase with the number of processing units utilized. Specifying more processing units than are physically available on the system will most likely result in a slower render time.



Generally, PRMan's multi-threaded rendering does not benefit from Hyper-Threading Technology, or other similar "virtual processor" technology. This is because PRMan tends to be compute bound, and not I/O bound. To put it another way: PRMan already uses the processing core to the fullest extent, allowing little opportunity for the processor to schedule another thread for Hyper-Threading to work efficiently. We recommend that when specifying the number of processing units to be used by PRMan, please take into account the number of actual CPUs, and not the virtual CPU count as affected by Hyper-Threading.

Another option that can be used to improve the efficiency of multi-threaded prman is the bucket size. The can be controlled with the option:

Option "limits" "bucketsize" [16 16]

As in the single-threaded case, the renderer bucket size controls the trade off between speed efficiency, and memory usage. Increasing the bucket size will generally result in a faster, more efficient render, at the cost of increasing memory utilization. Decreasing the bucket size will decrease the efficiency but also decrease the memory footprint. In the multi-threaded case, this is still true; and in fact, the effects are magnified. Increasing the bucket size will decrease the likelihood of contention between threads for shared resources (increasing speed and efficiency), while at the same time increasing the amount of memory used. In particular, note that the amount of visible point memory is directly proportional to the bucket size times the number of threads. This is an issue for scenes with high depth complexity. For multi-thread rendering, we recommend the same basic guidelines as for single-thread rendering: leaving the bucketsize limit at the default setting unless memory consumption becomes an issue, at which point bucketsize should be decreased gradually until memory consumption becomes acceptable.

The ray tracer is tuned to be very effective when multi-threading. It will allocate by default a 60 Mbyte geometry cache per thread. This is controlled with the option:

Option "limits" "int geocachememory" [61440]

If a smaller cache per thread is required it can be specified with this option, but this will significantly impact the speed of ray tracing.


 

Pixar Animation Studios
(510) 752-3000 (voice)  (510) 752-3151 (fax)
Copyright © 1996- Pixar. All rights reserved.
RenderMan® is a registered trademark of Pixar.