Normalizing flows¶
Introduction¶
In a nutshell, a normalizing flow is based on the idea of a set of invertible and differentible transformations between two distributions. Let's say that \(x\) follows a base distribution with density \(p(x)\), and we consider the bijection \(f: x \to z\). The transformed variable \(z\) will follow a distribution with density
The above relation is easily generalized to the multidimensional case by replacing the partial derivative with the determinant of the jacobian of the transformation \(\text{det} \frac{\partial f}{\partial x_i}\).
These transformation can be arbitrarily composed, e.g.
which amounts to multiplying the jacobian determinants,
In the context of inference, a generative model can be generally viewed as a (possibly non-linear) transformation from a simple distribution (e.g. a prior) to a more complex distribution (e.g. the posterior). The idea of the method is that a well-parameterized composition of normalizing flows will approximate the inverse transformation.
Typically, the normalizing flows can be represented by a neural network architecture, whose input parameters \(\phi\) are the subject to an optimization to maximize a suitable loss function, for instance the log-posterior, or a suitable distance metric in the case of variational inference.
More resources¶
For a more formal discussion on the subject, we refer the reader to this review paper.
References¶
[1]: Kobyzev, Ivan, Simon JD Prince, and Marcus A. Brubaker. "Normalizing flows: An introduction and review of current methods." IEEE transactions on pattern analysis and machine intelligence 43.11 (2020): 3964-3979.