This work reframes continuous transport by treating the integration parameter as distance traveled rather than an arbitrary fixed time horizon. Under an arc-length (constant-speed) parameterization, truncating computation yields a state at a calibrated geodesic radius, and geodesic progress becomes predictable as compute increases. [3] [13] [9]
Contributions.
Recent work has emphasized that the choice of geometry underlying the conditional path in flow matching can be as important as the parameterization used to traverse it. Riemannian Flow Matching (RFM) extends flow matching beyond Euclidean spaces by constructing geometry-aware target vector fields on manifolds [15]. Metric Flow Matching (MFM) further promotes manifold-respecting interpolants by learning approximate geodesics under a data-induced Riemannian metric via kinetic-energy minimization [16], while Fisher Flow Matching equips categorical distributions with the Fisher–Rao metric and transports mass along closed-form geodesics on a hypersphere representation [17]. Wasserstein Flow Matching lifts these ideas to “distributions of distributions” by appealing to the Riemannian structure of Wasserstein space and by explicitly connecting constant-speed Wasserstein geodesics to the Benamou–Brenier dynamic formulation [18] [4]. In discrete settings, $\alpha$-Flow unifies continuous-state discrete flow matching variants under information-geometric $\alpha$-representations and a generalized kinetic-energy perspective [19]. Complementary to these constructions, Energy Guided Geometric Flow Matching proposes learning a metric tensor via score/energy-based objectives to better capture data geometry for geodesic-like interpolants [20].
The present work is orthogonal in emphasis: rather than proposing a new metric or manifold construction, we isolate a structural property of distance-minimizing geodesics under arc-length (constant-speed) parameterization—namely prefix/suffix optimality—and show that this property induces an “anytime” semantics in which truncation corresponds to a well-defined intermediate transport with compute proportional to geodesic distance. This viewpoint clarifies what is and is not preserved when the standard constant-time parameterization used in flow matching [3] is replaced by a constant-speed (arc-length) scheduler. A closely related but distinct line of work uses learned reparameterizations to shorten or align conditional probability paths, e.g. CAR-Flow in conditional flow matching [21]; our contribution is not an alignment mechanism but theorems showing that, in the idealized regime where the learned field coincides with a constant-speed geodesic, subsegment optimality yields exact failure recovery and enables segmentwise parallel training without losing geodesic consistency. For broader context on design choices and variants of the flow matching framework, see the recent guide and reference implementation [23].
Finally, the connection between flow-based generative modeling and optimal transport continues to be refined. Dynamic Conditional Optimal Transport through Simulation-Free Flows derives a dynamical conditional OT formulation generalizing Benamou–Brenier and proposes learning the induced geodesic path of measures for conditional generation [22]. At a higher level, Peyré surveys the dual Eulerian/Lagrangian viewpoints for diffusion and optimal transports in machine learning, highlighting both the non-uniqueness of advecting vector fields and the opportunity to design evolutions with favorable stability and computational properties [24]. Our results can be read as a concrete instance of this agenda: by committing to arc-length parameterization of distance-minimizing transport paths, one obtains a computation-calibrated notion of partial progress together with deterministic recovery and parallelization guarantees that do not hold for generic constant-time parameterizations.
In continuous normalizing flows and flow matching, probability mass is transported by a velocity field [1] [2] [3] [10] [13]
The endpoint is
The path length is
Because the time horizon is fixed, distance is encoded in the magnitude of the velocity field: \(\text{larger displacement} \Rightarrow \text{larger }\lVert v\rVert\). Compute cost is therefore tied to a fixed clock, not geometric distance.
Let \(x(t)\) be any smooth trajectory with velocity \(\dot x(t)=v(x(t),t)\). Define arc length [7]
Reparameterizing by arc length yields the unit-speed curve
Thus any variable-speed trajectory can be rewritten as a constant-speed trajectory under a different parameter. This is an equivalence until we impose constraints (fixed time horizon vs fixed speed).
Introduce a new time parameter \(\tau\) and impose a speed constraint \(c>0\):
Velocity magnitude is now fixed:
Distance traveled satisfies
Distance is encoded in arrival time, not speed.
The fixed-speed form in Section 3 writes \(\|u\|=1\) to emphasize arc-length parameterization. For the minimal-arrival-time definition it is standard to allow the control bound \(\|u\|\le 1\); under time minimization any subunit-speed segment can be accelerated (while remaining feasible) to reduce arrival time, so optimal trajectories satisfy \(\|u\|=1\) almost everywhere.
Given \(x,y\in\mathbb{R}^d\), define the minimal time to reach \(y\) from \(x\):
Define the induced distance
Under isotropic constraints this reduces to Euclidean distance; under position-dependent constraints it induces a warped geometry.
Let \(G(x)\) be symmetric positive definite. Define the metric-speed functional
Then the path length is
and the induced distance is
Imposing constant metric-speed \(\sqrt{\dot x^\top G(x)\dot x}=c\) implies \(T = d_G(x,y)/c\). This makes the “distance = speed × time” identity literal in the learned geometry.
In Benamou–Brenier dynamic optimal transport, [4] [5] [6] [14] [13]
one minimizes kinetic energy over a fixed time horizon. Switching to fixed-speed trajectories replaces energy minimization with length minimization under a speed constraint, yielding geodesics in the induced geometry. Conceptually:
Under fixed metric-speed \(c\), length and time are coupled: \(\ell=cT\). If a numerical integrator uses step size \(\Delta\tau\), then the number of steps required to traverse a trajectory of duration \(T\) scales as
where \(\mathcal M\) is a target set (e.g., a data manifold or a terminal distribution region). Hence the scaling law [11] [9]
The core structural fact is that a distance-minimizing geodesic is itself distance-minimizing on every subinterval. Under an arc-length (constant-speed) parameterization, this yields an anytime guarantee: truncating computation at parameter time \(\tau\) returns a point at a calibrated geodesic distance from the start, and (along a minimizing geodesic) the remaining distance to the target decreases linearly in \(\tau\).
Let \((\mathcal X, d_G)\) be the metric space induced by the length functional
where \(G(x)\) is symmetric positive definite. Let \(x^\star:[0,T]\to\mathcal X\) be a minimizing geodesic from \(x_0\) to \(y\): \(x^\star(0)=x_0\), \(x^\star(T)=y\), and \(\mathcal L[x^\star]=d_G(x_0,y)\).
Then for every \(\tau\in[0,T]\):
In particular, the distances are exact and match the segment lengths:
Fix \(\tau\in[0,T]\). Suppose, for contradiction, there exists another curve \(\gamma\) from \(x_0\) to \(x^\star(\tau)\) with strictly smaller length: \(\mathcal L[\gamma] < \mathcal L[x^\star|_{[0,\tau]}]\). Concatenating \(\gamma\) with the suffix \(x^\star|_{[\tau,T]}\) yields a new curve \(\tilde x\) from \(x_0\) to \(y\) with \(\mathcal L[\tilde x] = \mathcal L[\gamma] + \mathcal L[x^\star|_{[\tau,T]}] < \mathcal L[x^\star]\), contradicting \(\mathcal L[x^\star]=d_G(x_0,y)\). Therefore the prefix is length-minimizing, hence distance-minimizing, and \(d_G(x_0,x^\star(\tau))=\mathcal L[x^\star|_{[0,\tau]}]\). The suffix claim follows by the same splicing argument (equivalently, apply the argument to the reversed curve). \(\blacksquare\)
Under the assumptions of Theorem 1, additionally assume \(x^\star\) is parameterized at constant metric-speed \(c>0\):
Then for every \(\tau\in[0,T]\):
Proof. By constant metric-speed, \(\mathcal L[x^\star|_{[0,\tau]}]=\int_0^\tau c\,ds=c\tau\) and \(\mathcal L[x^\star|_{[\tau,T]}]=\int_\tau^T c\,ds=c(T-\tau)\). Apply Theorem 1 to convert segment lengths to distances. \(\square\)
Anytime generator principle. Under arc-length (constant-speed) geodesic transport, truncating computation after time \(\tau\) returns a point at known geodesic radius \(c\tau\) from the start, and (along a minimizing geodesic) exactly \(c(T-\tau)\) from the target. Compute becomes distance traveled.
This section is intentionally idealized: it assumes exact geodesic structure (constant-speed, distance-minimizing) and exact velocity matching. The purpose is to isolate the geometric target property; practical models may only satisfy it approximately. [3] [10] [13]
We now formalize how Theorem 1 transfers into a flow matching setting when the learned velocity field represents a constant-speed Wasserstein geodesic. The essential point is that the arc-length calibration in Corollary 1.1 is a statement about minimizing geodesics parameterized by arc length, and therefore it lifts from trajectories in \((\mathcal X,d_G)\) to curves of measures in \((\mathcal P_2(\mathcal X),W_{2,G})\) whenever the flow matching model exactly realizes such a geodesic. [3] [6]
Let \(\{\rho_s\}_{s\in[0,1]}\) denote a \(W_{2,G}\)-minimizing geodesic between \(\rho_0\) and \(\rho_1\), parameterized at constant metric speed \(c>0\) (in the sense that the metric derivative equals \(c\) a.e.). Let \(v^*(x,s)\) be an optimal velocity field solving the continuity equation
and assume the constant-speed condition holds:
Suppose a flow matching model \(v_\theta(x,s)\) satisfies
Let \(\Phi_s\) be the induced flow map:
Under the assumptions above, for every \(s\in[0,1]\):
(i) The pushforward distribution satisfies \((\Phi_s)_\#\rho_0=\rho_s\).
(ii) The curve \(\{\rho_s\}_{s\in[0,1]}\) is a constant-speed minimizing geodesic, and every prefix and suffix segment is minimizing between its endpoints.
(iii) The distance to the endpoints is linear in \(s\):
Proof. Because \(v_\theta=v^*\), the continuity equation is satisfied exactly and the pushforward identity follows from flow-map theory. Since \(\{\rho_s\}\) is assumed to be a minimizing \(W_{2,G}\)-geodesic parameterized at constant speed \(c\), its (metric) length on \([0,s]\) equals \(\int_0^s c\,d\tau=cs\), and similarly on \([s,1]\) equals \(c(1-s)\). If a prefix were not minimizing between \(\rho_0\) and \(\rho_s\), one could splice a shorter competitor into the full curve, contradicting global minimality—exactly the splicing argument in Theorem 1. [6] Therefore every prefix and suffix is minimizing and the endpoint distances are linear in \(s\). \(\square\)
Suppose numerical integration halts at \(s=\tau\in[0,1]\). Then:
(i) the current state lies at exact geodesic distance \(c\tau\) from \(\rho_0\);
(ii) restarting integration from \(s=\tau\) continues along the same minimizing geodesic;
(iii) the remaining distance to \(\rho_1\) is exactly \(c(1-\tau)\).
In particular, truncation preserves optimality of both the prefix and suffix segments.
Proof. Immediate from Theorem 2(iii) and the prefix/suffix optimality in Theorem 2(ii). \(\square\)
This gives a concrete interpretation of intermediate states in arc-length flow matching: truncation time is literally “distance traveled,” not merely “fraction of a training clock.” In practical settings the learned field is approximate, but the exact statement clarifies what property is being targeted.
We now isolate precisely what fails under the standard constant-time parameterization \(t\in[0,1]\). Even if the learned field represents the same geometric curve of measures, the parameter \(t\) does not, in general, measure arc length in \((\mathcal P_2(\mathcal X),W_{2,G})\). This mismatch is the reason the anytime property is parameterization-sensitive. [3] [6] [13]
Let \(\{\rho_t\}_{t\in[0,1]}\) be the curve induced by a (possibly optimal) velocity field \(v^*(x,t)\). Define the accumulated length (arc length) in Wasserstein space:
In general \(\Lambda(t)\neq t\,\Lambda(1)\), i.e. the speed profile is not constant. Therefore stopping at \(t=\tfrac12\) does not imply that \(\rho_{1/2}\) lies halfway in geodesic distance between \(\rho_0\) and \(\rho_1\):
Equivalently, the equal-distance partition points \(t^\star\) solving \(\Lambda(t^\star)=\tfrac12\Lambda(1)\) depend on the unknown speed profile induced by the learned field. Thus the constant-time parameter is not intrinsically meaningful for truncation or segment decomposition.
We now prove a structural decomposability result. The constant-speed parameterization makes each subinterval correspond to a fixed amount of geometric progress, enabling segmentwise training that composes correctly. This is analogous in spirit to parallel-in-time methods for dynamical systems, but here the guarantee is geometric: it follows from prefix/suffix optimality of constant-speed geodesics. [8]
Partition \([0,1]\) into \(m\) intervals \(0=s_0<s_1<\cdots<s_m=1\). For each \(j\), denote \(\rho_{s_j}\) by \(\rho_j\). Under constant-speed, the segment lengths satisfy
Assume \(\{\rho_s\}_{s\in[0,1]}\) is a constant-speed minimizing \(W_{2,G}\)-geodesic with velocity field \(v^*(x,s)\). For each segment \([s_{j-1},s_j]\), train an independent model \(v^{(j)}_\theta(x,s)\) such that
Define a piecewise field \(v_\theta(x,s)=v^{(j)}_\theta(x,s)\) for \(s\in[s_{j-1},s_j]\), and let \(\Phi_s\) be its flow map. Then:
(i) Each segment flow realizes a minimizing geodesic between \(\rho_{j-1}\) and \(\rho_j\).
(ii) Concatenating the segment flows yields the global minimizing geodesic from \(\rho_0\) to \(\rho_1\).
(iii) Consequently, segment training may proceed independently (in parallel) without loss of global optimality, provided each segment is exact.
Proof. By Theorem 1 applied in \((\mathcal P_2(\mathcal X),W_{2,G})\) as in Theorem 2, every restriction of a constant-speed minimizing geodesic to a subinterval is itself minimizing between its endpoints. Therefore the restriction of \(v^*\) to \([s_{j-1},s_j]\) generates a minimizing geodesic from \(\rho_{j-1}\) to \(\rho_j\). If \(v^{(j)}_\theta\) matches this restriction exactly, the induced segment flow equals the true segment flow. Concatenation of identical segment maps equals the global map, hence the global curve is recovered and remains minimizing. \(\square\)
The obstruction in constant-time flow matching is the lack of an a priori equal-distance segmentation: subintervals \([t_{j-1},t_j]\) need not correspond to fixed geodesic lengths, and the restriction of a non-constant-speed parameterization need not preserve the minimizing prefix/suffix structure. Thus there is no comparable guarantee that independently trained segments will compose into a globally minimizing path.
The reparameterized (arc-length) formulation yields three mathematically distinct advantages:
First, truncation corresponds to an intrinsic progress measure: the remaining distance to the target is known exactly in the idealized setting (Corollary 2.1), giving a principled failure-recovery interpretation.
Second, intermediate states become meaningful objects: \(\rho_s\) is not merely the distribution at a chosen clock time but the distribution a fixed geodesic distance \(cs\) from \(\rho_0\).
Third, constant-speed geodesics admit segmentwise decomposition with correct composition (Theorem 3), suggesting distributed or parallel training schemes in which segments correspond to fixed-distance progress along the optimal path.
These are not generic properties of learning a time-indexed vector field. They are consequences of imposing a geometric calibration between parameter time and metric distance—exactly the point of replacing constant-time transport with constant-speed transport.
This work contrasts constant-time transport (standard continuous normalizing flows and flow matching) with a constant-speed formulation in which the transport parameter measures geometric distance traveled. In the base space \((\mathcal X,d_G)\), constant-speed parameterization turns the parameter into arc length, yielding geodesic prefix/suffix optimality (Theorem 1) and, under constant metric speed, an explicit distance-vs-parameter calibration (Corollary 1.1). [7]
We then show that the same structural property transfers to flow matching whenever the learned velocity field realizes a constant-speed Wasserstein geodesic (Theorem 2), giving an exact failure-recovery corollary and a clean explanation of why constant-time flow matching lacks such guarantees. Finally, we prove a segmentwise parallelization theorem (Theorem 3) that follows purely from geodesic prefix/suffix optimality and therefore is specific to arc-length parameterization. [6] [8]
Next natural steps are (i) quantitative stability bounds under approximate velocity matching, and (ii) error propagation controls for segmentwise training when each segment is learned only up to a specified tolerance.