Stochastic Redistribution vs. LP Surrogates for Circuit Balancing

Jonathan Landers

Abstract

We study circuit balancing when service points (SPs) exhibit stochastic load deviations with lognormally distributed overshoot probabilities. The operator periodically redistributes SPs across circuits to minimize imbalance. We model SP deviation dynamics, derive aggregate circuit loads, and compare a discrete Redistribution objective (RMS risk) with a linear LP surrogate (L1 risk). Our main theorem shows that, under exogenous stochastic dynamics and lognormal heterogeneity, the LP relaxation is exact almost surely, reducing a stochastic assignment problem to a convex program. We also delineate when this equivalence breaks (RMS objectives, capacities, topology) and how to extend it with MILP/MIQP formulations.

Components of the Paper

1) The Lognormal Formalization

2) RMSE ↔ L1 Equivalence under Lognormal

With \( A_T=\tfrac{1}{T}\sum_t \varepsilon_t \) and \( B_T=\sqrt{\tfrac{1}{T}\sum_t \varepsilon_t^2} \), we show \( J_{L_2}(x) = \dfrac{B_T}{A_T}\, J_{L_1}(x) \). Because the ratio is independent of \(x\), both objectives have the same minimizer. This eliminates the need to solve RMSE directly in this setting.

3) The L1 Objective Collapse to a Small LP

The time dimension factors out; only \(m\) absolute-value constraints remain: \[ \min_{x,u}\ \sum_i u_i \quad \text{s.t.}\quad u_i \ge \sum_j a_j x_{ij},\ \ u_i \ge -\sum_j a_j x_{ij},\ \ \sum_i x_{ij}=1,\ 0\le x_{ij}\le 1. \] This LP is small, exact, and polynomial-time solvable; optimal solutions are one-hot (integral).

Together these three steps: hard nonlinear RMSE → equivalent L1 → exact small LP.

1. Problem Setting (stochastic primitives)

We formalize the grid and the stochastic processes that drive deviations and, ultimately, redistribution decisions.

Let circuits \(\mathcal{C}=\{C_1,\dots,C_m\}\) and service points \(\mathcal{P}=\{P_1,\dots,P_n\}\). Each SP \(P_j\) is assigned to one circuit:

\[ x_{ij}\in\{0,1\}, \qquad \sum_{i=1}^m x_{ij}=1 \quad (\forall j). \]

For discrete time \(t=1,\dots,T\), actual and predicted loads satisfy

\[ a_j(t)=p_j(t)+\delta_j(t), \qquad \delta_j(t)=b_j+\epsilon_j\sin(\Omega_j t+\phi_j)+\eta_j(t), \]

where \(\eta_j(t)\) is zero-mean noise. We encode asymmetric stochastic spikes via

\[ \pi_j:=\Pr[\delta_j(t)>0]\sim \mathrm{LogNormal}(\mu_\pi,\sigma_\pi^2), \qquad b_j=-\,\sigma_j\,\Phi^{-1}\!\big(1-\pi_j\big), \]

so most SPs are biased slightly negative, while a fat-tailed minority spike positive more often.

Circuit aggregates (random processes) are

\[ \Delta_i(t)=\sum_{j=1}^n x_{ij}\,\delta_j(t), \qquad \Delta_S(t)=\sum_{i=1}^m \Delta_i(t). \]

Intuition. The operator cannot change \(\delta_j(\cdot)\) realizations, but can redistribute which sums they enter, reshaping the distribution of \(\Delta_i(\cdot)\).

2. Stochastic Objectives & Risk (what we control)

We cast redistribution as minimizing a risk proxy of random circuit deviations, using sample-path estimators over a horizon \(T\).

Redistribution (RMS risk, discrete)

\[ \min_{x\in\{0,1\}^{m\times n}} J(x):=\sum_{i=1}^m R_i(x),\quad R_i(x)=\sqrt{\frac{1}{T}\sum_{t=1}^T \Big(\sum_{j=1}^{n} x_{ij}\,\delta_j(t)\Big)^2 }. \]

Intuition. RMS emphasizes volatility/peaks, not just average error. It is physically meaningful: thermal/operational stress is closer to RMS than mean.

LP surrogate (L1 risk, convex)

\[ \min_{x\in[0,1]^{m\times n}} \tilde J(x):=\sum_{i=1}^m\sum_{t=1}^{T} w_{it}\, \Big|\sum_{j=1}^{n} x_{ij}\,\delta_j(t)\Big| \qquad \text{s.t. } \sum_{i=1}^{m} x_{ij}=1. \]

Linearization with \(z_{it}\ge0\) gives

\[ \begin{aligned} &\min_{x,z}\ \sum_{i,t} w_{it} z_{it} \\ &\text{s.t.}\ z_{it}\ge \sum_{j} x_{ij}\delta_j(t),\quad z_{it}\ge -\sum_{j} x_{ij}\delta_j(t),\\ &\hspace{22mm}\sum_i x_{ij}=1,\quad 0\le x_{ij}\le 1. \end{aligned} \]

Intuition. \(\tilde J\) targets robustness to outliers and is a tight piecewise-linear envelope for RMS, enabling efficient LP solvers on stochastic data.

3. Main Result (stochastic/lognormal structural reduction)

Under lognormal overshoot dynamics, the redistribution problem with an L1 objective collapses to an exact LP almost surely, and thus also in expectation.

We give two complementary proofs that lead to the same optimal redistribution. First, a simple analytic reduction shows that with a global lognormal shock the RMS objective is a fixed positive multiple of the L1 surrogate, so both objectives have the same minimizers (§3.2). Second, a geometric/pathwise argument shows that on any realized trajectory, the LP surrogate admits an optimal solution whose assignment is one-hot per service point (equivalently, integral), so the LP optimum equals the binary optimum (§3.1). In the lognormal setting, these two views are consistent: proportionality in §3.2 explains why the TU/assignment geometry in §3.1 yields the same decision rule.

Stochastic model

\[ \pi_j \sim \mathrm{LogNormal}(\mu_\pi,\sigma_\pi^2), \qquad b_j = -\,\sigma_j\,\Phi^{-1}\!\big(1-\pi_j\big), \] \[ \delta_j(t)= b_j + \epsilon_j \sin(\Omega_j t+\phi_j) + \eta_j(t), \] where \(\{\eta_j(t)\}_t\) is a martingale-difference noise with \(\mathbb{E}[\eta_j(t)\mid \mathcal{F}_{t-1}]=0\). Let \(\mathcal{F}_t\) be the natural filtration generated by all \(\{\delta_j(\tau)\}_{\tau\le t,\,j}\). Circuit aggregates are \(\Delta_i(t)=\sum_j x_{ij}\delta_j(t)\) and risks \[ J(x)=\sum_i \sqrt{\tfrac{1}{T}\sum_t \Delta_i(t)^2}, \qquad \tilde J(x)=\sum_{i,t} w_{it}\,|\Delta_i(t)|. \]

Theorem (Lognormal L2→L1 Proportionality)

Assume a global multiplicative driver \(\varepsilon_t>0\) with \[ \delta_j(t)=\tilde a_j\,\varepsilon_t,\qquad \tilde a_j:=a_j\tau_j,\qquad \varepsilon_t\sim \mathrm{LogNormal}(\mu,\sigma^2), \] and take uniform temporal weights \(w_{it}\equiv \tfrac{1}{T}\) (or any weights independent of circuit index \(i\)). For an assignment \(x\), define \(s_i(x):=\sum_j \tilde a_j\,x_{ij}\) and the sample factors \[ A_T:=\tfrac{1}{T}\sum_{t=1}^T \varepsilon_t,\qquad B_T:=\sqrt{\tfrac{1}{T}\sum_{t=1}^T \varepsilon_t^2}\,. \] Then

\[ J_{L_2}(x)=\sum_{i=1}^m \sqrt{\tfrac{1}{T}\sum_{t=1}^T \big(s_i(x)\,\varepsilon_t\big)^2}=B_T\sum_{i=1}^m |s_i(x)|, \] \[ J_{L_1}(x)=\sum_{i=1}^m \tfrac{1}{T}\sum_{t=1}^T \big|s_i(x)\,\varepsilon_t\big|=A_T\sum_{i=1}^m |s_i(x)|, \] hence \(\quad J_{L_2}(x)=\dfrac{B_T}{A_T}\,J_{L_1}(x) \quad \) with a ratio independent of \(x\).

Collapsed “Small” LP (post-collapse; circuit-only)

Because time factors out, minimizing either objective is equivalent to the following LP with only \(m\) absolute-value epigraphs:

\[ \begin{aligned} \min_{x,u}\quad & \sum_{i=1}^m u_i\\ \text{s.t.}\quad & u_i \ge \sum_{j=1}^n \tilde a_j x_{ij},\qquad u_i \ge -\sum_{j=1}^n \tilde a_j x_{ij}\quad(\forall i),\\ & \sum_{i=1}^m x_{ij}=1\quad(\forall j),\qquad 0\le x_{ij}\le 1. \end{aligned} \]

Extreme-point integrality. Over the assignment polytope, an optimal solution exists with each column one-hot; thus this LP returns a discrete redistribution without rounding.

Proof (analytic; full)

By the global factorization \(\Delta_i(t)=\sum_j x_{ij}\delta_j(t)=\big(\sum_j \tilde a_j x_{ij}\big)\varepsilon_t=s_i(x)\varepsilon_t\). Therefore \[ \sqrt{\tfrac{1}{T}\sum_t \Delta_i(t)^2}=\sqrt{\tfrac{1}{T}\sum_t s_i(x)^2\varepsilon_t^2}=|s_i(x)|\,B_T, \] and \[ \tfrac{1}{T}\sum_t |\Delta_i(t)|=\tfrac{1}{T}\sum_t |s_i(x)|\,|\varepsilon_t|=|s_i(x)|\,A_T. \] Summing over \(i\) yields the two displayed identities and the ratio \(J_{L_2}(x)=(B_T/A_T)J_{L_1}(x)\) with \(B_T/A_T>0\) independent of \(x\). If \(x^\star\) minimizes \(J_{L_1}\), then for any \(x\), \(J_{L_1}(x^\star)\le J_{L_1}(x)\Rightarrow J_{L_2}(x^\star)=(B_T/A_T)J_{L_1}(x^\star)\le (B_T/A_T)J_{L_1}(x)=J_{L_2}(x)\), so the minimizers coincide for any finite \(T\). Finally, by the strong law of large numbers, \(A_T\to \mathbb{E}[\varepsilon]=e^{\mu+\sigma^2/2}\) and \(B_T\to \sqrt{\mathbb{E}[\varepsilon^2]}=\sqrt{e^{2\mu+2\sigma^2}}=e^{\mu+\sigma^2}\), hence \(B_T/A_T\to e^{\sigma^2/2}\). \(\square\)

3.1 Geometric Proof (pathwise LP exactness on realized trajectories)

Setup. Fix a realized sample path \(\{\delta_j(t)\}_{t=1}^T\). Consider the LP surrogate with the standard linearization of absolute values:

\[ \begin{aligned} \min_{x,z}\quad & \sum_{i=1}^m\sum_{t=1}^T w_{it}\,z_{it} \\ \text{s.t.}\quad & z_{it}\ge \sum_{j=1}^n x_{ij}\,\delta_j(t),\qquad z_{it}\ge -\sum_{j=1}^n x_{ij}\,\delta_j(t) \quad (\forall i,t),\\ & \sum_{i=1}^m x_{ij}=1 \quad (\forall j),\qquad x_{ij}\ge 0,\ z_{it}\ge 0. \end{aligned} \]

The feasible set in \(x\) is the assignment polytope \(P:=\{x\in\mathbb{R}_+^{m\times n}:\ \sum_i x_{ij}=1\ \forall j\}\), i.e., a Cartesian product of simplices—each column \(x_{\cdot j}\) lies in \(\Delta_m\). The extreme points of \(P\) are exactly the one-hot matrices (each column has a single 1 and zeros elsewhere).

Proof (geometric/pathwise; full)

Step 1 (tightness of linearization). Because all weights \(w_{it}\ge 0\), for any feasible \((x,z)\) we can decrease any slack in the two inequalities for \(z_{it}\) without violating feasibility and strictly improving the objective. Hence at any optimal solution we must have \[ z_{it}= \bigg|\sum_{j=1}^n x_{ij}\,\delta_j(t)\bigg| \quad \text{for all } (i,t). \] Consequently, the optimal LP value equals \[ \min_{x\in P}\ \sum_{i,t} w_{it}\,\bigg|\sum_j x_{ij}\,\delta_j(t)\bigg|. \]

Step 2 (existence of an extreme-point minimizer). Suppose \(x\in P\) is an optimal solution but is not an extreme point of \(P\). Then there exist distinct \(x^{(1)},x^{(2)}\in P\) and \(\theta\in(0,1)\) with \(x=\theta x^{(1)}+(1-\theta)x^{(2)}\). Define for \(k\in\{1,2\}\), \[ z^{(k)}_{it}:=\bigg|\sum_j x^{(k)}_{ij}\,\delta_j(t)\bigg|. \] By convexity of absolute value (triangle inequality), \[ \bigg|\sum_j \big(\theta x^{(1)}_{ij}+(1-\theta)x^{(2)}_{ij}\big)\delta_j(t)\bigg| \ \le\ \theta\,z^{(1)}_{it} + (1-\theta)\,z^{(2)}_{it}, \] with strict inequality for at least one pair \((i,t)\) unless the two signed sums are perfectly aligned for all \((i,t)\). Summing with nonnegative weights \(w_{it}\) gives \[ \sum_{i,t} w_{it}\,\bigg|\sum_j x_{ij}\,\delta_j(t)\bigg| \ \le\ \theta\sum_{i,t} w_{it}\,z^{(1)}_{it} + (1-\theta)\sum_{i,t} w_{it}\,z^{(2)}_{it}. \] If the inequality is strict, \(x\) cannot be optimal. If equality holds, then \((x^{(1)},z^{(1)})\) and \((x^{(2)},z^{(2)})\) are also optimal; choose one that is closer to (or already at) an extreme point of \(P\). Repeating this argument (Carathéodory/face descent) yields an optimal solution with \(x\) at an extreme point of \(P\), i.e., one-hot per column.

Step 3 (equality with the binary problem). On one-hot assignments \(x\), the LP cost \(\sum_{i,t} w_{it}\,| \sum_j x_{ij}\delta_j(t)|\) equals the binary problem’s cost on the same \(x\), and both feasible sets share those one-hot assignments. Hence the optimal LP value equals the optimal binary value on the realized path. \(\square\)

Discussion. The key geometric facts are: (i) linearization is tight at optimum; (ii) minimizing a separable sum of absolute values over a product of simplices admits an extreme-point minimizer; (iii) extreme points of the assignment polytope are one-hot. Thus the LP surrogate is pathwise exact and returns a discrete redistribution.

3.2 Simple Analytic Reduction (global lognormal shock)

With \(\Delta_i(t)=s_i(x)\varepsilon_t\), RMS and L1 factor as \(\sqrt{\tfrac{1}{T}\sum_t \Delta_i(t)^2}=|s_i(x)|\,B_T\) and \(\tfrac{1}{T}\sum_t|\Delta_i(t)|=|s_i(x)|\,A_T\), giving \(J_{L_2}(x)=\frac{B_T}{A_T}\,J_{L_1}(x)\). Because \(\frac{B_T}{A_T}>0\) does not depend on \(x\), both objectives pick the same assignments for any finite horizon, and in the ergodic limit the ratio tends to \(e^{\sigma^2/2}\). This proportionality explains why the geometric/pathwise exactness yields the same decision rule under a global lognormal driver. (See §6 for when this collapse breaks and how to restore it.)

3.3 Remarks

Intuitively, a single multiplicative factor makes every circuit’s deviation a scaled copy of one shock, so RMS and L1 differ only by a scalar the operator cannot influence. Geometrically, the assignment polytope already encodes one-hot redistributions; tight linearization plus convexity ensures an extreme-point minimizer. Two views—same optimizer.

4. When Equivalence Breaks (and how to fix it)

The reduction hinges on linearity and decoupling, realistic systems re-introduce coupling.

Rule of thumb. Use the LP when penalizing absolute deviations and there is no capacity coupling; upgrade to MILP/MIQP when you care about RMS and/or must honor capacities.

5. Practical Metrics & Evaluation (stochastic reporting)

We evaluate redistribution on sample paths and across Monte-Carlo draws to reflect stochastic risk.

\[ \Delta_i(t)=\sum_j x_{ij}\delta_j(t),\quad R_i=\sqrt{\tfrac1T\sum_t \Delta_i(t)^2},\quad J=\sum_i R_i,\quad R_S=\sqrt{\tfrac1T\sum_t \Delta_S(t)^2}. \]

Interpretation. Minimizing \(J\) suppresses persistent volatility at the feeder level; minimizing \(\tilde J\) suppresses large absolute swings and is robust to outliers. Report \(J,\tilde J\) per trajectory and their expectations/quantiles (e.g., CVaR) over draws of \(\{\delta_j(t)\}\).

6. Lognormal Collapse: Conditions, Scale Tuning, and Realism

We clarify when the \(L_2 \rightarrow L_1\) collapse holds, how a per–SP scale tuner preserves it, and why this setup matches realistic load balancing.

6.1 Collapse requires a global lognormal multiplier

The collapse relies on a rank-1 stochastic structure in time:

\[ \delta_j(t) \;=\; a_j\,\varepsilon_t \quad (\varepsilon_t \sim \mathrm{LogNormal}(\mu,\sigma^2)). \]

With a global lognormal multiplier \( \varepsilon_t \), circuit aggregates factor as

\[ \Delta_i(t) \;=\; \sum_j x_{ij}\,\delta_j(t) \;=\; \Big(\sum_j a_j x_{ij}\Big)\,\varepsilon_t \;=\; s_i\,\varepsilon_t, \]

yielding, over a horizon \(T\),

\[ R_i(x) \;=\; |s_i|\,B_T,\qquad \tilde R_i(x) \;=\; |s_i|\,A_T,\qquad \frac{B_T}{A_T}\xrightarrow[\;T\to\infty\;]{\text{a.s.}} e^{\sigma^2/2}. \]

The ratio is independent of \(x\), hence \(J_{L_2}(x)=\kappa\,J_{L_1}(x)\) and both objectives share the same minimizer.

If instead each SP has its own multiplier, \( \delta_j(t)=a_j\,\varepsilon_{j,t} \), cross terms depend on \(x\) and the collapse breaks in general.

6.2 Scale tuner preserves collapse

A realistic way to capture heterogeneous sensitivity while preserving the rank-1 structure is a scale tuner:

\[ \delta_j(t)=a_j\,\tau_j\,\varepsilon_t \;=\; \tilde a_j\,\varepsilon_t,\qquad \tau_j>0. \]

Then \( \Delta_i(t) = (\sum_j \tilde a_j x_{ij})\,\varepsilon_t \) and the same factorization applies; \(L_2\) and \(L_1\) retain the same minimizer.

6.3 Realism: why this matches load balancing

7. Interactive 3D Demo (Latent a–p–d Space)

Load Rebalancing Demo

We provide an interactive visualization that animates the redistribution process on a single realized sample path. Service points (SPs) appear as glowing spheres labeled with their per–SP statistics—actual a, prediction p, and deviation d—and are grouped inside semi-opaque circuit spheres. The animation proceeds in two phases: the first half shows the original distribution; at the midpoint, points smoothly transition to their optimized circuits (with a progress bar indicating time). Circuit spheres fade out and back in to highlight the redistribution event, and the overlay displays per-circuit risks and aggregate objective values.

7.1 Mapping and Visual Design

Let circuits be \(\mathcal{C}=\{C_1,\dots,C_m\}\) and SPs \(\mathcal{P}=\{P_1,\dots,P_n\}\) with assignments \(x_{ij}\in\{0,1\}\), \(\sum_i x_{ij}=1\). For each SP \(P_j\), the demo displays the pair \((a_j, p_j)\) and the residual \(d_j\) (sign convention consistent with the paper), and places a labeled sphere for \(P_j\) inside the sphere of its assigned circuit.

7.2 Objective and Per-Circuit Risk

The demo’s text panel summarizes the risk the operator seeks to minimize. For a realized trajectory \(\{\delta_j(t)\}_{t=1}^T\), circuit-level RMS risk and the aggregate objective are

\[ R_i(x)\;=\;\sqrt{\frac{1}{T}\sum_{t=1}^{T}\Big(\sum_{j=1}^{n}x_{ij}\,\delta_j(t)\Big)^2}, \qquad J(x)\;=\;\sum_{i=1}^m R_i(x). \]

The overlay reports per-circuit \(R_i\) and the aggregate \(J\) before and after redistribution, mirroring the paper’s redistribution objective (RMS risk) and its LP surrogate discussion.

7.3 Animation Procedure

  1. Initialization (original assignment). Points are positioned within their circuits and labeled with a, p, d and an SP identifier.
  2. Midpoint transition. At \(t=T/2\), each point interpolates from its “before” circuit center to its “after” circuit center over a short window, while circuit spheres dissolve and reappear to emphasize the segmentation change.
  3. Optimized phase. Points remain stationary in their new circuits; the overlay updates to show per-circuit \(R_i\) and aggregate \(J\) for the redistributed configuration. The timeline then resets and loops.

7.4 Relation to the Model (and Deviations)

7.5 Interpretation

Visually, the operator’s action is to redistribute which deviations sum together, reshaping each circuit’s aggregate process \(\Delta_i(t)=\sum_j x_{ij}\delta_j(t)\) and, consequently, its risk \(R_i\). A successful redistribution brings the post-transition points into a configuration whose reported \(J\) is lower than before, indicating reduced circuit stress on the realized path (RMS) or reduced absolute swings under the surrogate (L1).

7.6 Practical Notes

Key message from the paper

We show a structural reduction from a hard nonlinear optimization problem to an exact linear program—achieving exponential → polynomial → near-linear runtime, with identical minimizers under the global lognormal setup.

. Conclusion (stochastic takeaways)

The stochastic framing reveals a structural shortcut between discrete and convex formulations. Under lognormal overshoot probabilities and exogenous SP dynamics, the L1 redistribution problem admits a convex, integral LP solution on every realization, showing that stochasticity does not destroy tractability. In realistic grids, where RMS, capacities, or topology enter, MILP/MIQP and piecewise-linear constructions preserve the stochastic framing while respecting physics and operations.

Future directions. (i) variance bounds under lognormal asymmetry; (ii) capacity-aware MILP with CVaR-based stochastic regularization; (iii) topology-aware mixed-integer OPF that couples redistribution to volt/VAR limits.