IVF-PQ speeds up by increasing the data compression level / reducing the precision during build (e.g. using pq_dim, pq_bits) and during search (adjusting the internal lookup table types) (see our blog post for more info).
These knobs reduce the achieved recall levels, and sometimes the penalty is stronger than that of the main search parameter n_probes.
The problem is that it's hard to know in advance what should be the expected recall level for various combinations of parameters, input data, and distance metrics. We attempt to do that in the tests, but there's always the compromise: either set the recall too high and risk occasional CI failures, or set it too low and risk missing a bug.
This issue is to track the progress/updates to the threshold logic.