Monday, January 26, 2015

RANSAC with a-contrario approach

That's quite a mouthful for a blog title. Bear with me, I will explain everything. But first some motivation: I was reading a piece of code which was utilizing a class named ACRANSAC.  It turns out that the acronym stands for RANSAC with a contrario approach. But what is that? I got curious, read upon the method, and here I will describe my understanding.

ACRANSAC is an improved form of RANSAC. Which begs the question:

So, what is RANSAC?

Let's say that you have a bunch of measurements, and some mathematical model you would like to fit to your measurements. The catch is that your measurements are not only noisy, but they contain so called outliers. Outliers are data points inexplicable through your model. Unmodelled effects, egregious measurement errors, mis-identifications, and that short. They will only lead you astray if you try to fit the model to them. On the other hand, inliners are the data points which can be explained through the model, even tough they are distorted by noise. Obviously inliners and outliners doesn't carry badges to tell them apart. You have to separate them yourself while estimating the model parameters. Madness! How can a poor computer make any sense out of this?

Here comes RANSAC to the rescue. The acronym stands for 'random sample and consensus'. The idea is that you randomly sample a few data points. Just the minimal number which is sufficient to fully determine your model. Then you use this minimal sample to calculate the model parameters. Are these parameters good, or totally bogus? Who knows. If you were lucky and draw from the inliners then your estimation can be quite close to the truth. But how to tell if you were lucky?

And this is where the consensus part comes in to the picture. You use your putative model parameters, and check it against all the data points. Are any of them in agreement? (That is their computed error is smaller than a threshold value.) The idea is that the correct model will lead to the biggest consensus set (data points in agreement with the model). So what you do is just repeat the previous process from the random selection. After enough repetition you can be confident that the biggest consensus set you have found so far contains only inliners. You then take those data points, and only those, and fits your model to them. (Using least-squares for example.) Those fitted parameters is what you return with.

The algorithm is not guaranteed to be correct, but the probability of it returning the right values grows with every additional iteration.  If you are interested in the details I kindly point you towards the awesome 'RANSAC for Dummies' tutorial from Marco Zuliani.

Whoa this is kind of fuzzy, a concrete example or two please?

Sure thing! I will show you a toy example, which is easy to follow,  and the one I'm really interested in as well.

The toy example is this: You are trying to fit a line to points. Some of the points are known to belong to the line (plus-minus some noise), but a small percentage of them are totally off. 
Here your minimal sample size is 2, because two points are enough to give you a line. And the error metric we are using to filter the consensus set out would be the geometric distance between each of the points and the hypothetical line.

My application is a bit more involved. I wish to find out the movement of a camera from a video stream. To do that I have to first find the relative camera position between two images. To do so I first have to find corresponding points between these images. Unfortunately that matching is never perfect so from there I will have a lot of outliers. Then I will model the camera's movement using something called the epipolar constraint.  To do that I will need 5 to 8 point pairs minimum. Depends on who you ask. :) As you can see it's a bit more complicated than the toy example, but that's what you get if you want to do something interesting.

So far so good. What about the a contrario thingy?

The trouble with all this is that the method can be quite sensitive about the threshold you are using to select the consensus set. Too low a value and you are missing legitimate inliners, two high value and outliers will skew your final estimation. Parameter tuning can help, of course, but if you plan to explore, let's say, a martian cave with your algorithm then a bit more robustness can't hurt. 

So here is what you do: You draw a minimal sample as before. Fit the putative model, and calculate the error of all the datapoints given that model. So far no surprises. But here comes the twist! Instead of thresholding we order the datapoints by their error from the lowest to the highest (best fit to worst fit).   Then we iterate through them and ask ourselves the following question: Let's assume that these \(k\) datapoints from the very beginning to this element are all the inliners.  Their error is all being less or equal to that of the current point's error \(\epsilon_k\). If these points would be all uniformly sampled independent random points, how many times would we expect to encounter \(k\) of them to fall bellow the \(\epsilon_k\) error treshold by pure chance? 

The less this number is, the more meaningful the putative inliner set is. There is a name of this number. It's called Number of False Alarms, or NFA for short, because it describes how many times we would expect to to encounter this configuration by accident from white noise.

So what we do is find the smallest datapoint in the ordered list with the smallest NFA, then we say that that datapoint and every one before is an inliner. We repeat this until it gets boring*, and the inliner set with the smallest NFA wins!

* Obviously there is a principled approach to choose the right iteration count. I'm just joking there.


Uhmmmm......

I see that you are confused. I were too. Let's see this NFA thing on our toy example, shall we? 
To calculate the NFA we note that \({NFA}=N_{test} P( k, \epsilon ) \) That is we have to calculate the number of tests \(N_{test}\) and the probability that all \(k\) out of \(n\) points fall no further than \(\epsilon\) from the putative line. 
$$ N_{test}=(n-2)\binom {n} {k }\binom {k} {2} $$
\((n-2)\) because that's how many different choices there are for \(k\), since we know that there is at least \(2\) inliners, and there can't be more than \(n\) with which we sign the number of all datapoints. \(  \binom {n} {k } \) signifies the number of different \(k\) sized sets we can form from, and \(\binom {k} {2}\) because that's how many different ways we can choose the minimal sample from the inliners. 
The probability of one point falling closer than \(\epsilon\) to the line can be estimated by taking the ratio of the area of \(\epsilon\)  wide box around the line, and the image area. Since the points are sampled independently we can multiply these probabilities together. That gives us:
$$P( k, \epsilon ) = \left( \frac {\epsilon*l_{line}} {A_{image}} \right)^{k-2}$$

What's great about this formulation is that it expresses the counterbalancing nature of the problem very well. We would like to find the largest set to be inliners if possible, because more datapoints gives us a better estimation accuracy. But adding more points makes the active band around the line wider, therefore grows the possibility that some points just fallen there by accident. Therefore we expect that if we take a snapshot of the algorithm, and plot the NFAs as a function of \(k\), will be 'convexish'. The NFA values will start high for low \(k\)s, then they will decrease and start climbing again. 
On this picture the dots symbolize our datapoints. The blue line is the exact line we wish to find, the red dots are the currently sampled minimal set. The red line shows the putative model. As we can see it goes through exactly the red dots, and it was a lucky draw: the red line is quite close to the blue one but still not an exact match. Here is the log scale plot of NFA values vs inliner counts:
And the inliners belonging to the smallest NFA value are marked with a green dot on the image.



Things I were reading to try to understand all of this:

Automatic Homographic Registration of a Pair of Images, with A Contrario Elimination of Outliers
Lionel Moisan, Pierre Moulon, Pascal Monasse

A probabilistic criterion to detect rigid point matches between two images and estimate the fundamental matrix
L Moisan, B Stival

From Gestalt Theory to Image Analysis. A Probabilistic Approach.
A. Desolneux, L. Moisan, J.-M. Morel. 

1 comment: