Flow Model Flow Matching
Flow Models (Normalizing Flows)
Concept:
A “flow model,” more precisely known as a Normalizing Flow (NF), is a type of generative model that explicitly learns a probability distribution by transforming a simple, known distribution (e.g., a standard Gaussian) into a complex, target data distribution (e.g., images, audio) through a sequence of invertible transformations.
Key Characteristics:
-
Invertibility: Each transformation in the sequence must be invertible, meaning you can easily go from the simple distribution to the complex data distribution and back. This is crucial for both sampling and density estimation.
-
Jacobian Determinant: To compute the likelihood of a data point, NFs rely on the change-of-variable formula, which requires calculating the determinant of the Jacobian matrix of the transformation. This can be computationally expensive, and designing architectures that allow for efficient Jacobian calculation is a major challenge in NF research (e.g., NICE, RealNVP, Glow).
-
Explicit Likelihood: NFs explicitly model the probability density function of the data, allowing for direct likelihood evaluation. This is an advantage over GANs, which don’t directly model likelihood.
-
Training: Typically trained by maximizing the likelihood of the training data.
-
Continuous Normalizing Flows (CNFs): A special type of Normalizing Flow where the sequence of discrete transformations is replaced by a continuous ordinary differential equation (ODE). The transformation is defined by a learned “velocity field” that describes how data points move through time. This can offer greater flexibility and expressivity.
Analogy: Imagine stretching, squishing, and twisting a simple rubber sheet (your simple distribution) to perfectly match the shape of a complex object (your data distribution).
Flow Matching
Concept:
Flow Matching (FM) is a newer paradigm for training generative models, particularly Continuous Normalizing Flows (CNFs) and models that learn a continuous transformation from noise to data. It addresses some of the challenges in traditional CNF training and also offers a more unified view with Diffusion Models.
Instead of directly maximizing the likelihood (which can involve complex Jacobian computations and ODE solving during training), Flow Matching works by regressing a neural network to match a predefined “vector field” that describes the desired continuous transformation between a source distribution (e.g., noise) and the target data distribution.
Key Characteristics:
-
Direct Vector Field Regression: The core idea is to train a neural network to predict the velocity at which a data point should move at a given time and position, rather than learning the transformation function itself directly or computing complex Jacobian determinants.
-
Predefined Paths: Flow Matching defines a reference probability path (often a simple interpolation, like a straight line or a Gaussian path) between the source and target distributions. The model then learns the vector field that makes samples follow this predefined path.
-
Simplified Training Objective: The training objective typically becomes a simple mean-squared error (MSE) loss, where the learned vector field is compared to the target vector field defined by the reference path. This simplifies training, making it more stable and efficient compared to traditional CNFs (which require ODE solving during training for likelihood calculation) or score-based diffusion models (which rely on score matching).
-
No Invertibility Requirement During Training: Unlike traditional NFs, the learned vector field doesn’t inherently need to be invertible during training, which allows for more flexible network architectures. Invertibility is only implicitly enforced if you want to sample by integrating the ODE in reverse.
-
Efficient Sampling: After training, sampling from a Flow Matching model involves numerically solving an ODE defined by the learned vector field. Because the training objective directly learns this field, sampling can often be done with fewer steps and more efficiently than some diffusion models, especially with higher-order ODE solvers.
-
Generalization of Diffusion: Flow Matching can be seen as a generalization of score-based diffusion models. When the reference path in Flow Matching is a Gaussian diffusion process, Flow Matching essentially becomes an alternative, often more stable, way to train diffusion models.
Analogy: Instead of trying to define the exact shape of the clay sculpture (the target distribution) and how to arrive at it through complex transformations (NFs), Flow Matching gives you a blueprint of how the clay should move at every moment in time to go from a blob to the final sculpture. Your neural network learns to be the sculptor’s hand, applying the exact push and pull described by the blueprint.
Connection and Difference
Flow Matching is a specialized and highly effective training paradigm for a class of “Flow Models,” particularly Continuous Normalizing Flows (CNFs), where the core idea is to estimate the velocity field
Here’s a breakdown of why this phrasing is accurate:
-
“Flow Model” (Normalizing Flow / CNF): This refers to the overarching generative modeling framework where you transform a simple distribution into a complex one using invertible mappings. When these transformations are continuous in time, they are specifically called Continuous Normalizing Flows (CNFs). CNFs are defined by an Ordinary Differential Equation (ODE) whose right-hand side is a learned velocity field (or vector field).
-
“Estimating the Velocity Field”: Both traditional CNF training and Flow Matching aim to learn this velocity field. This field dictates how samples flow from the simple noise distribution to the complex data distribution.
-
“Special Solution/Training Paradigm”: This is where Flow Matching shines. Traditional CNFs, while conceptually powerful, often face challenges in training due to:
-
Computing the Jacobian determinant: Needed for likelihood estimation, which is computationally expensive and constrains network architectures.
-
ODE solving during training: Backpropagating through ODE solvers can also be complex.
In essence:
-
Flow Models (CNFs): The type of model that uses a continuous transformation defined by a velocity field.
-
Flow Matching: A specific and advanced method for training these continuous-time generative models by efficiently learning that crucial velocity field, often leading to more stable training and faster sampling.
Appendix
ODE(Ordinary Differential Equation) 常微分方程
The concept of a Continuous Normalizing Flow (CNF) is that the transformation from a simple distribution (e.g., Gaussian noise) to a complex data distribution happens continuously over time.
Imagine a point starting in the noise space and smoothly “flowing” through time until it arrives in the data space. This continuous movement is described by an Ordinary Differential Equation (ODE).
A classic example: Imagine a simple object moving along a line. If its
velocity is constant, say 5 meters per second, then:
dx/dt=5dx/dt=5dx/dt=5
Here, x(t) is the unknown position function. This is a very simple
ODE. Solving it means finding x(t). We know x(t)=5t+Cx(t)=5t+Cx(t)=5t+C (where C is the
starting position).
Let’s break down each part:
- $ \mathbf{z}(t) $:
-
This represents a data point (or a sample) at a specific “time” t.
-
At t=0, z(0) is a sample from your simple base distribution (e.g., standard Gaussian noise).
-
At t=1 (or some maximum time T), z(1) (or z(T)) is a sample from your complex target data distribution (e.g., an image).
-
As t changes from 0 to 1, z(t) traces a continuous path from noise to data.
- $ \frac{d\mathbf{z}(t)}{dt} $:
-
This is the derivative of z(t) with respect to time t.
-
In physics, this is exactly what velocity is. It tells you the instantaneous rate of change of the position z(t) at any given time t.
-
It’s a vector that points in the direction a data point is moving and its magnitude indicates the speed.
- $ \mathbf{v}(\mathbf{z}(t), t) $:
-
This is the “right-hand side” of the ODE.
-
It represents the velocity field (or vector field).
-
This is the neural network that the CNF (and Flow Matching) learns.
-
Inputs to the network: It takes the current position z(t) and the current time t as input.
-
Output of the network: It outputs the instantaneous velocity v at that specific position z(t) and time t.
Analogy: Imagine a river.
-
z(t) is the position of a tiny boat at time t.
-
dtdz(t) is the boat’s velocity.
-
v(z(t),t) is the current of the river at that specific location and time. The neural network’s job is to learn this “current map” of the river. If you know the current at every point and time, you can predict where the boat will go.
How Flow Matching Uses the ODE’s Right-Hand Side (and simplifies training):
Flow Matching still learns the same velocity field $ \mathbf{v}(\mathbf{z}(t), t) $ (the right-hand side of the ODE). The difference is how it trains this neural network:
-
Predefined Path: Flow Matching first defines a simple “straight line” or linear interpolation path between a noise sample z0 and a data sample $ \mathbf{x} $:
z(t)=(1−t)z0+txz(t)=(1−t)z0+txz(t)=(1−t)z0+tx -
Target Velocity: For this predefined path, we can easily calculate its instantaneous velocity (its derivative with respect to t):
vtarget(z(t),t)=dtdz(t)=x−z0vtarget(z(t),t)=dtdz(t)=x−z0vtarget(z(t),t)=dtdz(t)=x−z0 -
Direct Regression: Flow Matching then trains its neural network v(z(t),t) (the right-hand side of the ODE) by simply trying to match this known target velocity. The loss function becomes a straightforward Mean Squared Error (MSE):
L=Ez0,x,t[∥v(z(t),t)−(x−z0)∥2]L=Ez0,x,t[∥v(z(t),t)−(x−z0)∥2]L=Ez0,x,t[∥v(z(t),t)−(x−z0)∥2]
The key insight here is: Flow Matching doesn’t need to compute the Jacobian determinant of v during training. It just needs to ensure its learned v matches a simple, predefined target velocity. This simplifies the optimization problem significantly.