Multiple Importance Sampling
In the previous article I described Importance Sampling, a method for reducing the variance of a Monte Carlo estimator. In this article I will discuss a technique called Multiple Importance Sampling which allows us to combine samples from multiple different probability distributions that we think match the shape of the integrand, reducing variance without introducing bias.
Let's say we're trying to estimate the integral of a product of two functions, . Suppose we have two probability density functions, and which have a similar shape to and respectively. It is not obvious how we should sample and weight the random variables to construct an unbiased estimator of given this information.
Trying Some Stuff
One way to do this would be to importance sample strictly according to , completely ignoring :
This works well if is approximately constant over the domain, as the shape of the integrand will match the shape of closely. However if this assumption does not hold, then the scheme will be poor.
Another idea would be to construct a probability density function , and use importance sampling as before, picking the according to :
This is totally reasonable, but in many cases it is not computationally feasible to sample random variables from . Typically, sampling random variables from a given PDF involves inverse transform sampling, which is only possible if you can compute the inverse CDF.
Instead of trying to construct a single PDF, we can sample from each distribution, then average the samples:
where the are picked according to , and the according to . Since each term of this sum is a Monte Carlo estimator with expected value , we sum these to obtain an unbiased estimator. Note that this means we are now evaluating the integrand times, with samples being taken according to each of and .
The problem with this method is that it can lead to unnecessarily high variance. To see this, consider the variance of the estimator:
Since the variance is additive, if either or is a bad match for the integrand, we will get high variance - even if the other PDF is a perfect match.
Generalizing
Let's try and think about the problem a bit more generally. The last technique was definitely along the right lines.
Weights
It seems reasonable to explore estimators of the form
where are some 'weighting' functions. In strategy , we tried , and in strategy we tried .
The notation at this point is starting to get a bit heavy, so try to keep in mind that we're looking for some expression such that . Clearly not all choices of weighting functions leave us with an unbiased estimator. To see what constraints are required on the weighting functions to keep the estimator unbiased, consider the expected value of :
Therefore we can see that we require for the estimator to remain unbiased. It is also required that whenever , so as to avoid a division by zero when evaluating the first term of (and likewise for and the second term).
Sampling Techniques
Currently we've been sampling from 2 PDFs (let's call each of these a 'sampling technique'), but we can generalise this. Let be the number of sampling techniques, with PDFs that each match the shape of the integrand in some way. Let be sampled according to . Given some set of weighting functions , we can then rewrite as:
Note that the order of summation is not important here. We still require the two properties from before: , and .
Sample Counts
So far we've been taking exactly samples from each sampling technique, resulting in samples in total. This can be generalised to allow greater or fewer samples for each sampling technique.
Let the number of samples taken for sampling technique (so that the total number of samples used by the estimator is ). Then:
The proof that this is results in an unbiased estimator is trivial and left as an exercise to the reader.
Weighting Heuristics
Now that I've introduced the general MIS formulation, we can discuss some different choices of weighting functions.
Balance Heuristic
Veach determined that the following choice of weights is reasonable:
When inserted into the MIS definition , this gives:
Veach shows that (assuming positive weights), the balance heuristic is close to optimal. Specifically, with the balance heuristic:
That is to say, even using the most optimal choice of weights, the variance differs by at most the term on the right hand side. A proof of this is given in the Chapter 9 Appendix of Veach's thesis.
We can interpret the balance heuristic in a more natural way by rewriting it slightly. Let , i.e the fraction of samples given to technique . Then:
Note that I have swapped the order of summation on the second line. One may notice that this resembles a standard Monte Carlo estimator with probability density function which Veach calls the 'combined sample density'. This represents the distribution of a random variable that is equal to each with probability .
Cutoff Heuristic
The balance heuristic can introduce unnecessary variance in the case that one of the sampling techniques is a very close match to the integrand (this is closer to what typically happens in computer graphics). The reason for this is explored in more detail in Veach's thesis on pages 270-273.
The way to address this is to 'sharpen' the weighting functions - i.e increase the value of high weights while decreasing the value of low weights. For brevity,we will drop the argument and write . Veach defines the cutoff heuristic with parameter by:
This has the effect of 'cutting off' any samples that are generated with a PDF less than (and therefore increasing the value of the other weights so as to sum to 1). In the case of this reduces to the balance heuristic.
Power Heuristic
The power heuristic with parameter is defined as:
In the case of , this also reduces to the balance heuristic. For larger values of , more sharpening occurs. Veach determined empirically that the value gives good results in typical scenes.
Conclusion
Although I have not directly linked multiple importance sampling to computer graphics (see this PBRT section for a brief summary), MIS is such a crucial technique for reducing variance in offline rendering that it was mentioned as a reason for Eric Veach's technical Oscar. You might be amused to watch the video of him going up on stage and receiving the award from Michael B. Jordan and Kristen Bell, who clearly have no understanding of the words they are reading off the teleprompter.
Further Reading
There is a reasonable amount of literature surrounding MIS that is worthy of note. Cornuet et al. (2012) describe a method for adaptive multiple importance sampling that effectively modify the weights according to samples obtained at runtime. Kondapaneni et al. (2019) revisit some of Veach's derivations, showing that the variance of the balance estimator can be improved upon further than the bound given by Veach if negative weights are allowed. They also describe a technique for estimating the most optimal MIS weights, since they cannot be computed directly.
Sbert et al. (2016) define an estimator based on the 'one sample model' of MIS that Veach describes but I have omitted. This involves randomly selecting one of the proposal distributions to sample from, rather than sampling from them all at the same time and combining their contributions. Their method is provably better than the balance heuristic for the same number of samples.