DBSCAN’s parameter minPts is a discrete knob, but the phenomenon it controls is not: as the required local support increases, low-density bridges and fringes recede, while dense cores persist. This article moves that intuition into a continuous setting by treating clustering as a function of a density threshold \(\lambda\), where clusters are the connected components of the superlevel set \(L_\lambda = \{x : p(x)\ge \lambda\}\) of a smooth density \(p\). We make two concrete contributions. First, we derive an explicit calculus for how both the mass and the centroid of a surviving component change with \(\lambda\); the formulas reveal that raising \(\lambda\) peels an infinitesimal boundary shell whose “lever arm” drives centroid motion. Second, we prove a sharp high-threshold limit: as \(\lambda\) approaches the value of a strict mode, the component contracts to that mode and its centroid converges to the modal location; across multiple modes, the total inter-center separation converges to the corresponding pairwise distances between modes. Throughout, the emphasis is not on rebranding known tools, but on packaging them into a clean, operational picture: a density sweep has a geometry, and that geometry admits derivatives.
When you turn the DBSCAN dial, you can feel what it is trying to do. With \(\varepsilon\) fixed, increasing minPts does not “optimize an objective” so much as it demands a stricter local witness for density. Thin bridges snap. Stray filaments vanish. The surviving regions tighten around their cores. None of this wants to be discrete; the discreteness comes from counting.
The continuous analogue is to replace “neighbor count exceeds a threshold” with the statement “density exceeds a threshold.” One then studies a one-parameter family of sets \[ L_\lambda := \{x\in\mathbb{R}^3 : p(x)\ge \lambda\}, \] and declares clusters to be connected components of \(L_\lambda\). This is the classical level-set picture: \(\lambda\) is a continuous density knob, and raising \(\lambda\) literally erodes the superlevel sets from the outside in.
Our goal is modest and very hands-on. We want formulas you can hold in your head while thinking about what a density sweep is doing. The first theorem gives a derivative law for component mass and component centroid as functions of \(\lambda\). The second theorem shows what happens near the top of a peak: the set collapses to the mode, and the centroid has nowhere else to go.
Let \(p:\mathbb{R}^3\to\mathbb{R}\) be a smooth density (or a smooth surrogate, such as a kernel density estimate). For \(\lambda\in\mathbb{R}\), define the superlevel set \(L_\lambda\) as above. For values of \(\lambda\) where \(L_\lambda\) has multiple connected components, write them as \(A_1(\lambda),\dots,A_{r(\lambda)}(\lambda)\).
For a component \(A(\lambda)\), we will track its geometric (uniform) centroid \[ \mu(\lambda) := \frac{1}{\mathrm{Vol}(A(\lambda))}\int_{A(\lambda)} x\,dx, \] where \(dx\) is Lebesgue measure on \(\mathbb{R}^3\). (One can also track a probability-weighted centroid \(\int x p(x)dx / \int p(x)dx\); the uniform choice is the cleanest lens for “erosion of a set.”)
When there are \(r\) well-defined components \(A_1(\lambda),\dots,A_r(\lambda)\) across a range of \(\lambda\), we will also use the global separation functional \[ S(\lambda):=\sum_{1\le i<j\le r}\|\mu_i(\lambda)-\mu_j(\lambda)\|, \] a crude but surprisingly informative way to “listen” to what a density sweep is doing to cluster geometry.
The formulas look technical until you read them as a story. \(M'(\lambda)\) is negative because raising \(\lambda\) can only remove points. The factor \(1/\|\nabla p\|\) is the key: if the density climbs steeply across the boundary, then a small increase in \(\lambda\) removes a geometrically thin layer; if the density is flat, the same \(d\lambda\) corresponds to a thicker bite out of the set.
The centroid derivative is the most useful line. It says: the centroid moves as though the boundary shell were pulling on it with a lever arm \(x-\mu(\lambda)\), reweighted by \(1/\|\nabla p(x)\|\). Boundary points far from the centroid have more torque; flat regions of the density make the torque stronger. This is the continuous analogue of the discrete “peeling identity” you can derive when a cluster loses fringe points.
Proof. Let \(f:\mathbb{R}^3\to\mathbb{R}^k\) be a smooth test function (we will use \(f\equiv 1\) and \(f(x)=x\)). Define \(F_f(\lambda):=\int_{A(\lambda)} f(x)\,dx\). For \(\lambda\) in the persistence interval, one may represent the component integral using the indicator of the superlevel set: \[ F_f(\lambda)=\int f(x)\,\mathbf{1}\{p(x)\ge \lambda\}\,dx, \] where we restrict to the chosen component (this can be formalized by choosing a small open set that isolates it across \(\lambda\)). Differentiating with respect to \(\lambda\) gives, in the distributional sense, \[ \frac{d}{d\lambda}\mathbf{1}\{p(x)\ge \lambda\}=-\delta(p(x)-\lambda), \] so \[ F_f'(\lambda)=-\int f(x)\,\delta(p(x)-\lambda)\,dx. \] Now apply the surface delta (coarea) identity: \[ \int f(x)\,\delta(p(x)-\lambda)\,dx =\int_{p(x)=\lambda}\frac{f(x)}{\|\nabla p(x)\|}\,dS(x). \] Restricting to the boundary of the chosen component yields \[ F_f'(\lambda)=-\int_{\partial A(\lambda)}\frac{f(x)}{\|\nabla p(x)\|}\,dS(x). \] Taking \(f\equiv 1\) gives the formula for \(M'(\lambda)\). Taking \(f(x)=x\) gives the formula for \(N'(\lambda)\). Finally, since \(\mu=N/M\), the quotient rule gives \[ \mu' = \frac{N'}{M} - \mu\,\frac{M'}{M} = -\frac{1}{M}\int_{\partial A}\frac{x}{\|\nabla p\|}\,dS +\frac{\mu}{M}\int_{\partial A}\frac{1}{\|\nabla p\|}\,dS = -\frac{1}{M}\int_{\partial A}\frac{x-\mu}{\|\nabla p\|}\,dS. \] \(\square\)
This theorem is the continuous version of a familiar empirical feeling: crank the density threshold high enough and each cluster becomes “just the top of a peak.” Once you are in that regime, centers stop being negotiable. They are pinned to the geometry of the modes.
Proof. Because \(x^\star\) is a strict local maximum and \(H\) is negative definite, there exists \(c>0\) and a neighborhood \(U\) of \(x^\star\) such that for all \(x\in U\), \[ p(x) \le p(x^\star) - c\,\|x-x^\star\|^2. \] Now fix \(\lambda<p(x^\star)\) sufficiently close to \(p(x^\star)\) so that the component \(A(\lambda)\) lies in \(U\). If \(x\in A(\lambda)\), then \(p(x)\ge \lambda\), hence \[ p(x^\star) - c\,\|x-x^\star\|^2 \ge p(x) \ge \lambda, \] so \[ \|x-x^\star\| \le \sqrt{\frac{p(x^\star)-\lambda}{c}}. \] Thus \(A(\lambda)\subseteq B\!\left(x^\star,\sqrt{\frac{p(x^\star)-\lambda}{c}}\right)\), and the radius of this ball shrinks to zero as \(\lambda\uparrow p(x^\star)\). Therefore \(\mathrm{diam}(A(\lambda))\to 0\). Since the centroid \(\mu(\lambda)\) lies in the convex hull of \(A(\lambda)\), it lies in the same shrinking ball, hence \(\mu(\lambda)\to x^\star\). For the multi-mode statement, apply the same argument to each component \(A_i(\lambda)\) around \(x_i^\star\) in the range of \(\lambda\) where the components are separated. Then \(\mu_i(\lambda)\to x_i^\star\) for each \(i\). By continuity of the norm, \(\|\mu_i(\lambda)-\mu_j(\lambda)\|\to\|x_i^\star-x_j^\star\|\), and summing over \(i<j\) yields the stated limit for \(S(\lambda)\). \(\square\)
DBSCAN lives in finite samples, with \(\varepsilon\)-neighborhood counts and a discrete \(\text{minPts}\) gate. The level-set model is not a claim that DBSCAN literally computes \(L_\lambda\). It is a way to name the underlying geometric act: threshold a notion of density; take connected components; watch what survives as the threshold rises.
Theorem 1 tells you what “peeling” really means in a smooth model: it is not mysterious; it is a boundary integral. Theorem 2 tells you where the story ends: the sweep ultimately resolves into modes. Between these extremes is where the interesting behavior lives—components splitting at saddle densities, centers drifting as bridges erode, and global observables like \(S(\lambda)\) showing plateaus and cliff-edges. Those middle regimes are exactly where DBSCAN users instinctively tune parameters by eye; the continuous picture explains what the eye is seeing.
Generated 2026-02-15.