Constant-Time vs Constant-Velocity Transport: Anytime Geodesic Generation with Transfer to Flow Matching

Reparameterization, warped geometry, and an anytime theorem

J. Landers

This work reframes continuous transport by treating the integration parameter as distance traveled rather than an arbitrary fixed time horizon. Under an arc-length (constant-speed) parameterization, truncating computation yields a state at a calibrated geodesic radius, and geodesic progress becomes predictable as compute increases. [3] [13] [9]

Contributions.

Formulates constant-speed (arc-length) transport and the induced warped geometry via a metric tensor.
Proves that every prefix and suffix of a distance-minimizing geodesic is itself distance-minimizing, and derives the linear distance-vs-parameter law under constant metric speed.
Transfers the anytime property to flow matching in the idealized setting of exact constant-speed Wasserstein geodesics, yielding an exact failure-recovery corollary.
Proves a segmentwise parallelization theorem for arc-length flow matching based on geodesic prefix/suffix optimality.

Related Work: Geometry-Aware Flow Matching and Optimal-Transport Paths

Recent work has emphasized that the choice of geometry underlying the conditional path in flow matching can be as important as the parameterization used to traverse it. Riemannian Flow Matching (RFM) extends flow matching beyond Euclidean spaces by constructing geometry-aware target vector fields on manifolds [15]. Metric Flow Matching (MFM) further promotes manifold-respecting interpolants by learning approximate geodesics under a data-induced Riemannian metric via kinetic-energy minimization [16], while Fisher Flow Matching equips categorical distributions with the Fisher–Rao metric and transports mass along closed-form geodesics on a hypersphere representation [17]. Wasserstein Flow Matching lifts these ideas to “distributions of distributions” by appealing to the Riemannian structure of Wasserstein space and by explicitly connecting constant-speed Wasserstein geodesics to the Benamou–Brenier dynamic formulation [18] [4]. In discrete settings, $\alpha$-Flow unifies continuous-state discrete flow matching variants under information-geometric $\alpha$-representations and a generalized kinetic-energy perspective [19]. Complementary to these constructions, Energy Guided Geometric Flow Matching proposes learning a metric tensor via score/energy-based objectives to better capture data geometry for geodesic-like interpolants [20].

The present work is orthogonal in emphasis: rather than proposing a new metric or manifold construction, we isolate a structural property of distance-minimizing geodesics under arc-length (constant-speed) parameterization—namely prefix/suffix optimality—and show that this property induces an “anytime” semantics in which truncation corresponds to a well-defined intermediate transport with compute proportional to geodesic distance. This viewpoint clarifies what is and is not preserved when the standard constant-time parameterization used in flow matching [3] is replaced by a constant-speed (arc-length) scheduler. A closely related but distinct line of work uses learned reparameterizations to shorten or align conditional probability paths, e.g. CAR-Flow in conditional flow matching [21]; our contribution is not an alignment mechanism but theorems showing that, in the idealized regime where the learned field coincides with a constant-speed geodesic, subsegment optimality yields exact failure recovery and enables segmentwise parallel training without losing geodesic consistency. For broader context on design choices and variants of the flow matching framework, see the recent guide and reference implementation [23].

Finally, the connection between flow-based generative modeling and optimal transport continues to be refined. Dynamic Conditional Optimal Transport through Simulation-Free Flows derives a dynamical conditional OT formulation generalizing Benamou–Brenier and proposes learning the induced geodesic path of measures for conditional generation [22]. At a higher level, Peyré surveys the dual Eulerian/Lagrangian viewpoints for diffusion and optimal transports in machine learning, highlighting both the non-uniqueness of advecting vector fields and the opportunity to design evolutions with favorable stability and computational properties [24]. Our results can be read as a concrete instance of this agenda: by committing to arc-length parameterization of distance-minimizing transport paths, one obtains a computation-calibrated notion of partial progress together with deterministic recovery and parallelization guarantees that do not hold for generic constant-time parameterizations.

1. Classical Constant-Time Transport

In continuous normalizing flows and flow matching, probability mass is transported by a velocity field [1] [2] [3] [10] [13]

\[ \frac{dx}{dt} = v(x,t), \qquad t \in [0,1]. \]

The endpoint is

\[ x(1) = x(0) + \int_0^1 v\big(x(t),t\big)\, dt. \]

The path length is

\[ \ell = \int_0^1 \lVert v\big(x(t),t\big)\rVert\, dt. \]

Because the time horizon is fixed, distance is encoded in the magnitude of the velocity field: $\text{larger displacement} \Rightarrow \text{larger }\lVert v\rVert$. Compute cost is therefore tied to a fixed clock, not geometric distance.

2. Reparameterization Principle

Let $x(t)$ be any smooth trajectory with velocity $\dot x(t)=v(x(t),t)$. Define arc length [7]

\[ s(t) = \int_0^t \lVert v\big(x(\tau),\tau\big)\rVert\, d\tau. \]

Reparameterizing by arc length yields the unit-speed curve

\[ \frac{dx}{ds} = \frac{v(x,t)}{\lVert v(x,t)\rVert}. \]

Thus any variable-speed trajectory can be rewritten as a constant-speed trajectory under a different parameter. This is an equivalence until we impose constraints (fixed time horizon vs fixed speed).

3. Constant-Velocity (Fixed-Speed) Formulation

Introduce a new time parameter $\tau$ and impose a speed constraint $c>0$:

\[ \frac{dx}{d\tau} = c\,u(x,\tau), \qquad \lVert u(x,\tau)\rVert = 1. \]

Velocity magnitude is now fixed:

\[ \left\lVert\frac{dx}{d\tau}\right\rVert = c. \]

Distance traveled satisfies

\[ \ell = \int_0^T c\, d\tau = cT, \qquad \boxed{T = \ell/c}. \]

Distance is encoded in arrival time, not speed.

4. Minimal Arrival Time as Distance

The fixed-speed form in Section 3 writes $\|u\|=1$ to emphasize arc-length parameterization. For the minimal-arrival-time definition it is standard to allow the control bound $\|u\|\le 1$; under time minimization any subunit-speed segment can be accelerated (while remaining feasible) to reduce arrival time, so optimal trajectories satisfy $\|u\|=1$ almost everywhere.

Given $x,y\in\mathbb{R}^d$, define the minimal time to reach $y$ from $x$:

\[ T^*(x\to y) = \inf\left\{\,T: \exists\,x(\tau),\ \dot x = c u,\ \lVert u\rVert\le 1,\ x(0)=x,\ x(T)=y \right\}. \]

Define the induced distance

\[ \boxed{d(x,y)=c\,T^*(x\to y)}. \]

Under isotropic constraints this reduces to Euclidean distance; under position-dependent constraints it induces a warped geometry.

5. Warped Geometry via a Metric Tensor

Let $G(x)$ be symmetric positive definite. Define the metric-speed functional

\[ F(x,\dot x)=\sqrt{\dot x^\top G(x)\dot x}. \]

Then the path length is

\[ \mathcal L[x] = \int_0^T \sqrt{\dot x^\top G(x)\dot x}\, d\tau, \]

and the induced distance is

\[ \boxed{ d_G(x,y)= \inf_{x(0)=x,\ x(T)=y} \int_0^T \sqrt{\dot x^\top G(x)\dot x}\, d\tau }. \]

Imposing constant metric-speed $\sqrt{\dot x^\top G(x)\dot x}=c$ implies $T = d_G(x,y)/c$. This makes the “distance = speed × time” identity literal in the learned geometry.

6. Optimal Transport Interpretation

In Benamou–Brenier dynamic optimal transport, [4] [5] [6] [14] [13]

\[ \inf_{\rho,v} \int_0^1 \int \rho(x,t)\,\lVert v(x,t)\rVert^2\, dx\,dt \]

one minimizes kinetic energy over a fixed time horizon. Switching to fixed-speed trajectories replaces energy minimization with length minimization under a speed constraint, yielding geodesics in the induced geometry. Conceptually:

Energy minimization: constant time, variable speed.
Length minimization: constant speed, variable time.

7. Algorithmic Consequence: Compute Proportional to Geodesic Distance

Under fixed metric-speed $c$, length and time are coupled: $\ell=cT$. If a numerical integrator uses step size $\Delta\tau$, then the number of steps required to traverse a trajectory of duration $T$ scales as

\[ N \approx \frac{T}{\Delta\tau} = \frac{d_G(x_0,\mathcal M)}{c\,\Delta\tau}, \]

where $\mathcal M$ is a target set (e.g., a data manifold or a terminal distribution region). Hence the scaling law [11] [9]

\[ \boxed{\text{Compute} \ \propto\ \text{Geodesic Distance}}. \]

8. Main Result: Anytime Geodesic Prefix Theorem

The core structural fact is that a distance-minimizing geodesic is itself distance-minimizing on every subinterval. Under an arc-length (constant-speed) parameterization, this yields an anytime guarantee: truncating computation at parameter time $\tau$ returns a point at a calibrated geodesic distance from the start, and (along a minimizing geodesic) the remaining distance to the target decreases linearly in $\tau$.

Theorem 1 (Geodesic Prefix/Suffix Optimality).

Let $(\mathcal X, d_G)$ be the metric space induced by the length functional

\[ \mathcal L[x] = \int_0^T \sqrt{\dot x(\tau)^\top G(x(\tau))\dot x(\tau)}\, d\tau, \]

where $G(x)$ is symmetric positive definite. Let $x^\star:[0,T]\to\mathcal X$ be a minimizing geodesic from $x_0$ to $y$: $x^\star(0)=x_0$, $x^\star(T)=y$, and $\mathcal L[x^\star]=d_G(x_0,y)$.

Then for every $\tau\in[0,T]$:

The prefix $x^\star|_{[0,\tau]}$ is distance-minimizing from $x_0$ to $x^\star(\tau)$.
The suffix $x^\star|_{[\tau,T]}$ is distance-minimizing from $x^\star(\tau)$ to $y$.

In particular, the distances are exact and match the segment lengths:

\[ \boxed{d_G(x_0, x^\star(\tau)) = \mathcal L[x^\star|_{[0,\tau]}]} \qquad\text{and}\qquad \boxed{d_G(x^\star(\tau), y) = \mathcal L[x^\star|_{[\tau,T]}]}. \]

Proof.

Fix $\tau\in[0,T]$. Suppose, for contradiction, there exists another curve $\gamma$ from $x_0$ to $x^\star(\tau)$ with strictly smaller length: $\mathcal L[\gamma] < \mathcal L[x^\star|_{[0,\tau]}]$. Concatenating $\gamma$ with the suffix $x^\star|_{[\tau,T]}$ yields a new curve $\tilde x$ from $x_0$ to $y$ with $\mathcal L[\tilde x] = \mathcal L[\gamma] + \mathcal L[x^\star|_{[\tau,T]}] < \mathcal L[x^\star]$, contradicting $\mathcal L[x^\star]=d_G(x_0,y)$. Therefore the prefix is length-minimizing, hence distance-minimizing, and $d_G(x_0,x^\star(\tau))=\mathcal L[x^\star|_{[0,\tau]}]$. The suffix claim follows by the same splicing argument (equivalently, apply the argument to the reversed curve). $\blacksquare$

Corollary 1.1 (Arc-Length Distance Calibration).

Under the assumptions of Theorem 1, additionally assume $x^\star$ is parameterized at constant metric-speed $c>0$:

\[ \sqrt{\dot x^\star(\tau)^\top G(x^\star(\tau))\dot x^\star(\tau)} = c \quad\text{for all }\tau\in[0,T]. \]

Then for every $\tau\in[0,T]$:

\[ \boxed{d_G(x_0, x^\star(\tau)) = c\,\tau} \qquad\text{and}\qquad \boxed{d_G(x^\star(\tau), y) = c\,(T-\tau)}. \]

Proof. By constant metric-speed, $\mathcal L[x^\star|_{[0,\tau]}]=\int_0^\tau c\,ds=c\tau$ and $\mathcal L[x^\star|_{[\tau,T]}]=\int_\tau^T c\,ds=c(T-\tau)$. Apply Theorem 1 to convert segment lengths to distances. $\square$

Anytime generator principle. Under arc-length (constant-speed) geodesic transport, truncating computation after time $\tau$ returns a point at known geodesic radius $c\tau$ from the start, and (along a minimizing geodesic) exactly $c(T-\tau)$ from the target. Compute becomes distance traveled.

9. Consequences and Interpretation

Monotone progress. Travel distance increases linearly with compute budget, giving predictable quality–compute tradeoffs. [12] [11]
Geometry-aware truncation. Early stopping corresponds to a controlled partial geodesic (not an arbitrary intermediate time).
Separation of geometry and scheduler. The direction field / metric encodes geometry; allowed time $T$ encodes budget.
Scaling law. For fixed $\Delta\tau$, function evaluations scale with induced distance.

10. Transfer of the Anytime Property to Flow Matching

This section is intentionally idealized: it assumes exact geodesic structure (constant-speed, distance-minimizing) and exact velocity matching. The purpose is to isolate the geometric target property; practical models may only satisfy it approximately. [3] [10] [13]

We now formalize how Theorem 1 transfers into a flow matching setting when the learned velocity field represents a constant-speed Wasserstein geodesic. The essential point is that the arc-length calibration in Corollary 1.1 is a statement about minimizing geodesics parameterized by arc length, and therefore it lifts from trajectories in $(\mathcal X,d_G)$ to curves of measures in $(\mathcal P_2(\mathcal X),W_{2,G})$ whenever the flow matching model exactly realizes such a geodesic. [3] [6]

Let $\{\rho_s\}_{s\in[0,1]}$ denote a $W_{2,G}$-minimizing geodesic between $\rho_0$ and $\rho_1$, parameterized at constant metric speed $c>0$ (in the sense that the metric derivative equals $c$ a.e.). Let $v^*(x,s)$ be an optimal velocity field solving the continuity equation

\[ \partial_s \rho_s + \nabla\cdot\big(\rho_s v^*(\cdot,s)\big)=0, \]

and assume the constant-speed condition holds:

\[ \left(\int_{\mathcal X}\rho_s(x)\,\|v^*(x,s)\|_{G(x)}^2\,dx\right)^{1/2}=c \quad \text{for a.e. } s\in[0,1]. \]

Suppose a flow matching model $v_\theta(x,s)$ satisfies

\[ v_\theta(x,s)=v^*(x,s)\quad \text{for all }(x,s)\in\mathcal X\times[0,1]. \]

Let $\Phi_s$ be the induced flow map:

\[ \frac{d}{ds}\Phi_s(x_0)=v_\theta(\Phi_s(x_0),s),\qquad \Phi_0(x_0)=x_0. \]

Theorem 2 (Anytime Flow Matching Geodesic Property).

Under the assumptions above, for every $s\in[0,1]$:

(i) The pushforward distribution satisfies $(\Phi_s)_\#\rho_0=\rho_s$.

(ii) The curve $\{\rho_s\}_{s\in[0,1]}$ is a constant-speed minimizing geodesic, and every prefix and suffix segment is minimizing between its endpoints.

(iii) The distance to the endpoints is linear in $s$:

\[ W_{2,G}(\rho_0,\rho_s)=cs,\qquad W_{2,G}(\rho_s,\rho_1)=c(1-s). \]

Proof. Because $v_\theta=v^*$, the continuity equation is satisfied exactly and the pushforward identity follows from flow-map theory. Since $\{\rho_s\}$ is assumed to be a minimizing $W_{2,G}$-geodesic parameterized at constant speed $c$, its (metric) length on $[0,s]$ equals $\int_0^s c\,d\tau=cs$, and similarly on $[s,1]$ equals $c(1-s)$. If a prefix were not minimizing between $\rho_0$ and $\rho_s$, one could splice a shorter competitor into the full curve, contradicting global minimality—exactly the splicing argument in Theorem 1. [6] Therefore every prefix and suffix is minimizing and the endpoint distances are linear in $s$. $\square$

11. Corollary: Exact Failure Recovery

Corollary 2.1 (Failure Recovery Property).

Suppose numerical integration halts at $s=\tau\in[0,1]$. Then:

(i) the current state lies at exact geodesic distance $c\tau$ from $\rho_0$;

(ii) restarting integration from $s=\tau$ continues along the same minimizing geodesic;

(iii) the remaining distance to $\rho_1$ is exactly $c(1-\tau)$.

In particular, truncation preserves optimality of both the prefix and suffix segments.

Proof. Immediate from Theorem 2(iii) and the prefix/suffix optimality in Theorem 2(ii). $\square$

This gives a concrete interpretation of intermediate states in arc-length flow matching: truncation time is literally “distance traveled,” not merely “fraction of a training clock.” In practical settings the learned field is approximate, but the exact statement clarifies what property is being targeted.

12. Structural Difference from Standard Constant-Time Flow Matching

We now isolate precisely what fails under the standard constant-time parameterization $t\in[0,1]$. Even if the learned field represents the same geometric curve of measures, the parameter $t$ does not, in general, measure arc length in $(\mathcal P_2(\mathcal X),W_{2,G})$. This mismatch is the reason the anytime property is parameterization-sensitive. [3] [6] [13]

Let $\{\rho_t\}_{t\in[0,1]}$ be the curve induced by a (possibly optimal) velocity field $v^*(x,t)$. Define the accumulated length (arc length) in Wasserstein space:

\[ \Lambda(t)=\int_0^t \left(\int_{\mathcal X}\rho_\tau(x)\,\|v^*(x,\tau)\|_{G(x)}^2\,dx\right)^{1/2} d\tau. \]

In general $\Lambda(t)\neq t\,\Lambda(1)$, i.e. the speed profile is not constant. Therefore stopping at $t=\tfrac12$ does not imply that $\rho_{1/2}$ lies halfway in geodesic distance between $\rho_0$ and $\rho_1$:

\[ W_{2,G}(\rho_0,\rho_{1/2})\neq \tfrac12\,W_{2,G}(\rho_0,\rho_1)\quad\text{in general}. \]

Equivalently, the equal-distance partition points $t^\star$ solving $\Lambda(t^\star)=\tfrac12\Lambda(1)$ depend on the unknown speed profile induced by the learned field. Thus the constant-time parameter is not intrinsically meaningful for truncation or segment decomposition.

13. Parallelization Theorem for Arc-Length Flow Matching

We now prove a structural decomposability result. The constant-speed parameterization makes each subinterval correspond to a fixed amount of geometric progress, enabling segmentwise training that composes correctly. This is analogous in spirit to parallel-in-time methods for dynamical systems, but here the guarantee is geometric: it follows from prefix/suffix optimality of constant-speed geodesics. [8]

Partition $[0,1]$ into $m$ intervals $0=s_0<s_1<\cdots<s_m=1$. For each $j$, denote $\rho_{s_j}$ by $\rho_j$. Under constant-speed, the segment lengths satisfy

\[ W_{2,G}(\rho_{j-1},\rho_j)=c(s_j-s_{j-1}). \]

Theorem 3 (Segmentwise Geodesic Parallelization).

Assume $\{\rho_s\}_{s\in[0,1]}$ is a constant-speed minimizing $W_{2,G}$-geodesic with velocity field $v^*(x,s)$. For each segment $[s_{j-1},s_j]$, train an independent model $v^{(j)}_\theta(x,s)$ such that

\[ v^{(j)}_\theta(x,s)=v^*(x,s)\quad\text{for }s\in[s_{j-1},s_j]. \]

Define a piecewise field $v_\theta(x,s)=v^{(j)}_\theta(x,s)$ for $s\in[s_{j-1},s_j]$, and let $\Phi_s$ be its flow map. Then:

(i) Each segment flow realizes a minimizing geodesic between $\rho_{j-1}$ and $\rho_j$.

(ii) Concatenating the segment flows yields the global minimizing geodesic from $\rho_0$ to $\rho_1$.

(iii) Consequently, segment training may proceed independently (in parallel) without loss of global optimality, provided each segment is exact.

Proof. By Theorem 1 applied in $(\mathcal P_2(\mathcal X),W_{2,G})$ as in Theorem 2, every restriction of a constant-speed minimizing geodesic to a subinterval is itself minimizing between its endpoints. Therefore the restriction of $v^*$ to $[s_{j-1},s_j]$ generates a minimizing geodesic from $\rho_{j-1}$ to $\rho_j$. If $v^{(j)}_\theta$ matches this restriction exactly, the induced segment flow equals the true segment flow. Concatenation of identical segment maps equals the global map, hence the global curve is recovered and remains minimizing. $\square$

The obstruction in constant-time flow matching is the lack of an a priori equal-distance segmentation: subintervals $[t_{j-1},t_j]$ need not correspond to fixed geodesic lengths, and the restriction of a non-constant-speed parameterization need not preserve the minimizing prefix/suffix structure. Thus there is no comparable guarantee that independently trained segments will compose into a globally minimizing path.

14. Consequences for Recovery and Distributed Training

The reparameterized (arc-length) formulation yields three mathematically distinct advantages:

First, truncation corresponds to an intrinsic progress measure: the remaining distance to the target is known exactly in the idealized setting (Corollary 2.1), giving a principled failure-recovery interpretation.

Second, intermediate states become meaningful objects: $\rho_s$ is not merely the distribution at a chosen clock time but the distribution a fixed geodesic distance $cs$ from $\rho_0$.

Third, constant-speed geodesics admit segmentwise decomposition with correct composition (Theorem 3), suggesting distributed or parallel training schemes in which segments correspond to fixed-distance progress along the optimal path.

These are not generic properties of learning a time-indexed vector field. They are consequences of imposing a geometric calibration between parameter time and metric distance—exactly the point of replacing constant-time transport with constant-speed transport.

15. Summary

This work contrasts constant-time transport (standard continuous normalizing flows and flow matching) with a constant-speed formulation in which the transport parameter measures geometric distance traveled. In the base space $(\mathcal X,d_G)$, constant-speed parameterization turns the parameter into arc length, yielding geodesic prefix/suffix optimality (Theorem 1) and, under constant metric speed, an explicit distance-vs-parameter calibration (Corollary 1.1). [7]

We then show that the same structural property transfers to flow matching whenever the learned velocity field realizes a constant-speed Wasserstein geodesic (Theorem 2), giving an exact failure-recovery corollary and a clean explanation of why constant-time flow matching lacks such guarantees. Finally, we prove a segmentwise parallelization theorem (Theorem 3) that follows purely from geodesic prefix/suffix optimality and therefore is specific to arc-length parameterization. [6] [8]

Next natural steps are (i) quantitative stability bounds under approximate velocity matching, and (ii) error propagation controls for segmentwise training when each segment is learned only up to a specified tolerance.

16. References

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, D. Duvenaud. Neural Ordinary Differential Equations. NeurIPS (2018).
W. Grathwohl, R. T. Q. Chen, J. Bettencourt, I. Sutskever, D. Duvenaud. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models. ICLR (2019).
Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, M. Le. Flow Matching for Generative Modeling. arXiv:2210.02747 (2022).
J.-D. Benamou, Y. Brenier. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numerische Mathematik (2000).
C. Villani. Optimal Transport: Old and New. Springer (2008).
L. Ambrosio, N. Gigli, G. Savaré. Gradient Flows in Metric Spaces and in the Space of Probability Measures. Birkhäuser (2008).
M. P. do Carmo. Riemannian Geometry. Birkhäuser (1992).
J.-L. Lions, Y. Maday, G. Turinici. A “parareal” in time discretization of PDEs. C. R. Acad. Sci. Paris, Série I (2001).
T. Karras, M. Aittala, T. Aila, S. Laine. Elucidating the Design Space of Diffusion-Based Generative Models. NeurIPS (2022). arXiv:2206.00364.
X. Liu, C. Gong, Q. Liu. Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow. arXiv:2209.03003 (2022).
C. Lu, Y. Zhou, F. Bao, J. Chen, C. Li, J. Zhu. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. NeurIPS (2022). arXiv:2206.00927.
Y. Song, P. Dhariwal, M. Chen, I. Sutskever. Consistency Models. arXiv:2303.01469 (2023).
M. S. Albergo, N. M. Boffi, E. Vanden-Eijnden. Stochastic Interpolants: A Unifying Framework for Flows and Diffusions. arXiv:2303.08797 (2023).
J. Thornton, M. Hutchinson, E. Mathieu, V. De Bortoli, Y. W. Teh, A. Doucet. Riemannian Diffusion Schrödinger Bridge. arXiv:2207.03024 (2022).
R. T. Q. Chen, Y. Lipman. Flow Matching on General Geometries. arXiv:2302.03660 (2023).
K. Kapuśniak, P. Potaptchik, T. Reu, L. Zhang, A. Tong, M. Bronstein, A. J. Bose, F. Di Giovanni. Metric Flow Matching for Smooth Interpolations on the Data Manifold. arXiv:2405.14780 (2024).
O. Davis, S. Kessler, M. Petrache, İ. İ. Ceylan, M. Bronstein, A. J. Bose. Fisher Flow Matching for Generative Modeling over Discrete Data. arXiv:2405.14664 (2024).
D. Haviv, A.-A. Pooladian, D. Pe’er, B. Amos. Wasserstein Flow Matching: Generative Modeling over Families of Distributions. arXiv:2411.00698 (2024).
C. Cheng, J. Li, J. Fan, G. Liu. $α$-Flow: A Unified Framework for Continuous-State Discrete Flow Matching Models. arXiv:2504.10283 (2025).
A. Zweig, M. Zhang, E. Azizi, D. A. Knowles. Energy Guided Geometric Flow Matching. arXiv:2509.25230 (2025).
C. Chen, P. Guo, L. Song, J. Lu, R. Qian, X. Wang, T.-J. Fu, W. Liu, Y. Yang, A. Schwing. CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching. arXiv:2509.19300 (2025).
G. Kerrigan, G. Migliorini, P. Smyth. Dynamic Conditional Optimal Transport through Simulation-Free Flows. arXiv:2404.04240 (2024).
Y. Lipman, M. Havasi, P. Holderrieth, N. Shaul, M. Le, B. Karrer, R. T. Q. Chen, D. Lopez-Paz, H. Ben-Hamu, I. Gat. Flow Matching Guide and Code. arXiv:2412.06264 (2024).
G. Peyré. Optimal and Diffusion Transports in Machine Learning. arXiv:2512.06797 (2025).