Skip to content

Neural Ratio Estimation (NRE)

Introduction

As we have seen, the output of prior + simulator is the array of pairs (xi,θi) is drawn from the joint distribution

(xi,θi)p(x,θ)=p(x|θ)p(θ)

We now consider the shuffled pairs (xi,θj), where xi is the output of the forward-modeled input θi,ij. These pairs are sampled from the product distribution

(xi,θj)p(x)p(θ)

The idea of NRE is to train a classifier to learn the ratio

r(x,θ)p(x,θ)p(x)p(θ)=p(x|θ)p(x),

which is equal to the likelihood-to-evidence ratio. The application of Bayes' theorem makes the connection between r(x,θ) and the Bayesian inverse problem:

r(x,θ)=p(x,θ)p(x)=p(θ|x)p(θ).

In other words, r(x,θ) equals the posterior-to-prior ratio. Therefore, one can get samples from the posterior distribution of θ from the approximate knowledge of r(x,θ) and prior samples from θ.

More specifically, the binary classifier dϕ(x,θ) with learnable parameters ϕ is trained to distinguish the (xi,θi) pairs sampled from the joint distribution from their shuffled counterparts. We label pairs with a variable y, such that y=1 refers to joint pairs, and y=0 to shuffled pairs. The classifier is trained to approximate

dϕ(x,θ)p(y=1|x,θ)=p(x,θ|y=1)p(y=1)p(x,θ|y=0)p(y=0)+p(x,θ|y=1)p(y=1)=p(x,θ)p(x)θ+p(x,θ)=r(x,θ)1+r(x,θ),

where we used p(y=0)=p(y=1)=0.5.

The classifier learns the parameters ϕ by minimizing the binary-cross entropy, defined as

L(dϕ)=dθdxp(x,θ)logdϕ(x,θ)p(x)θlog(1dϕ(x,θ))

References

[1]: Hermans, Joeri, Volodimir Begy, and Gilles Louppe. "Likelihood-free mcmc with amortized approximate ratio estimators." International conference on machine learning. PMLR, 2020.

[2]: Miller, Benjamin K., et al. "Truncated marginal neural ratio estimation." Advances in Neural Information Processing Systems 34 (2021): 129-143.

[3]: Anau Montel, Noemi, James Alvey, and Christoph Weniger. "Scalable inference with autoregressive neural ratio estimation." Monthly Notices of the Royal Astronomical Society 530.4 (2024): 4107-4124.