A Continuous Sensitivity Theory for Density-Based Clustering Scale

Jonathan R. Landers

A defining parameter of density-based clustering is the interaction scale $\varepsilon$, intuitively a kind of radius. Empirically one observes a monotone qualitative behavior: small radii produce many clusters, large radii produce few. Yet the mathematical object that the algorithm returns—the number of clusters—is a discontinuous function of $\varepsilon$. The function

$$C(\varepsilon)$$

is piecewise constant, changing only when a pairwise distance or a density threshold is crossed. Consequently its classical derivative vanishes almost everywhere and carries no information about how clustering geometry evolves as the scale changes.

This mismatch reflects a deeper fact. What varies continuously is not the discrete partition, but the strength of connectivity in the dataset. The combinatorial clustering produced by density-based clustering is a thresholded observable of an underlying continuous geometric process. The purpose of this note is to construct a differentiable quantity that behaves as a continuous analogue of cluster count and to compute its sensitivity with respect to the scale. The result is a closed-form derivative revealing a universal scaling law: the sensitivity of clustering to scale behaves like $\varepsilon^{-2}$.

Continuous relaxation of neighborhoods

Let $X=\{x_1,\dots,x_n\}\subset \mathbb{R}^d$ be a finite dataset with distances $d_{ij}=\|x_i-x_j\|$. Replace the hard-threshold neighborhood relation $d_{ij}\le\varepsilon$ by a smooth interaction weight. Let $\psi:[0,\infty)\to(0,1]$ be continuously differentiable and strictly decreasing, with $\psi(0)=1$ and $\psi(r)\to 0$ as $r\to\infty$. Define the weighted adjacency matrix

$$A_\varepsilon(i,j)= \begin{cases} \psi(d_{ij}/\varepsilon), & i\neq j,\\[4pt] 0, & i=j. \end{cases}$$

Define the degree matrix by

$$(D_\varepsilon)_{ii}=\sum_{j=1}^n A_\varepsilon(i,j),$$

and the graph Laplacian by

$$L_\varepsilon=D_\varepsilon-A_\varepsilon.$$

As $\varepsilon$ varies, $L_\varepsilon$ defines a one-parameter family of weighted graphs interpolating between weakly connected and strongly connected regimes. Thus $\varepsilon$ continuously controls connectivity strength rather than combinatorial connectivity.

A differentiable surrogate for cluster count

The number of connected components of a graph equals the multiplicity of the zero eigenvalue of its Laplacian. This integer quantity is unstable under small perturbations of edge weights. A stable analogue is obtained through diffusion.

For $t>0$, define the effective cluster count

$$K_t(\varepsilon)=\mathrm{tr}\!\big(e^{-tL_\varepsilon}\big).$$

Let the eigenvalues of $L_\varepsilon$ be $0=\lambda_1(\varepsilon)\le \cdots \le \lambda_n(\varepsilon)$. Then

$$K_t(\varepsilon)=\sum_{m=1}^n e^{-t\lambda_m(\varepsilon)}.$$

For large $t$, all positive eigenvalues are exponentially suppressed, so $K_t(\varepsilon)$ concentrates on the near-zero spectrum and behaves like a smooth proxy for the number of connected components.

Differentiability and derivative formula

Theorem. Assume $\psi\in C^1$. Then $K_t(\varepsilon)$ is differentiable for all $\varepsilon>0$ and satisfies

$$\frac{d}{d\varepsilon}K_t(\varepsilon) = -t\,\mathrm{tr}\!\left(e^{-tL_\varepsilon}L_\varepsilon'\right),$$

where $L_\varepsilon'=\frac{d}{d\varepsilon}L_\varepsilon$. Moreover, for $i\neq j$,

$$A_\varepsilon'(i,j) = \psi'\!\left(\frac{d_{ij}}{\varepsilon}\right)\left(-\frac{d_{ij}}{\varepsilon^2}\right),$$

and

$$L_\varepsilon' = D_\varepsilon' - A_\varepsilon', \qquad (D_\varepsilon')_{ii}=\sum_{j=1}^n A_\varepsilon'(i,j).$$

The Laplacian depends smoothly on $\varepsilon$ through the entries $A_\varepsilon(i,j)$, hence $L_\varepsilon$ is differentiable as a matrix-valued function. We use the Fréchet derivative of the matrix exponential: for differentiable $M(\varepsilon)$,

$$\frac{d}{d\varepsilon}e^{M(\varepsilon)} = \int_0^1 e^{sM(\varepsilon)}M'(\varepsilon)e^{(1-s)M(\varepsilon)}\,ds.$$

Apply this with $M(\varepsilon)=-tL_\varepsilon$ to obtain

$$\frac{d}{d\varepsilon}e^{-tL_\varepsilon} = -t\int_0^1 e^{-stL_\varepsilon}L_\varepsilon' e^{-(1-s)tL_\varepsilon}\,ds.$$

Taking the trace and using cyclic invariance of the trace yields

$$\frac{d}{d\varepsilon}\mathrm{tr}(e^{-tL_\varepsilon}) = -t\int_0^1 \mathrm{tr}\!\left(e^{-tL_\varepsilon}L_\varepsilon'\right)\,ds = -t\,\mathrm{tr}\!\left(e^{-tL_\varepsilon}L_\varepsilon'\right),$$

hence the claimed derivative formula for $K_t(\varepsilon)$. Finally, for $i\neq j$,

$$A_\varepsilon(i,j)=\psi(d_{ij}/\varepsilon) \quad\Rightarrow\quad A_\varepsilon'(i,j)=\psi'(d_{ij}/\varepsilon)\left(-\frac{d_{ij}}{\varepsilon^2}\right),$$

and since $L_\varepsilon=D_\varepsilon-A_\varepsilon$, the expression for $L_\varepsilon'$ follows. $\square$

A universal sensitivity bound; fully explicit, no nuclear norm

The derivative identity is exact; to see scaling, we bound it using only Frobenius norms and elementary inequalities.

Proposition. If $|\psi'(r)|\le M$ for all $r\ge 0$, then for all $\varepsilon>0$,

$$\left|K_t'(\varepsilon)\right| \le \frac{t M n}{\varepsilon^2}\,\sqrt{2\sum_{1\le i<j\le n} d_{ij}^2}.$$

Using the Frobenius trace inequality $|\mathrm{tr}(AB)|\le \|A\|_F\|B\|_F$ gives

$$|K_t'(\varepsilon)| = t\left|\mathrm{tr}\!\left(e^{-tL_\varepsilon}L_\varepsilon'\right)\right| \le t\,\left\|e^{-tL_\varepsilon}\right\|_F\,\left\|L_\varepsilon'\right\|_F.$$

Since $L_\varepsilon\succeq 0$, its eigenvalues satisfy $\lambda_m\ge 0$, hence

$$\left\|e^{-tL_\varepsilon}\right\|_F^2 = \mathrm{tr}\!\left(e^{-2tL_\varepsilon}\right) = \sum_{m=1}^n e^{-2t\lambda_m} \le \sum_{m=1}^n 1 = n,$$

so

$$\left\|e^{-tL_\varepsilon}\right\|_F\le \sqrt{n}.$$

Next, for $i\neq j$ we have

$$|A_\varepsilon'(i,j)| = \left|\psi'\!\left(\frac{d_{ij}}{\varepsilon}\right)\right|\frac{d_{ij}}{\varepsilon^2} \le \frac{M}{\varepsilon^2}d_{ij}.$$

Because $(L_\varepsilon')_{ij}=-A_\varepsilon'(i,j)$ for $i\neq j$, the off-diagonal contribution to $\|L_\varepsilon'\|_F^2$ is bounded by

$$\sum_{i\neq j} (L_\varepsilon')_{ij}^2 = \sum_{i\neq j} (A_\varepsilon'(i,j))^2 \le \frac{M^2}{\varepsilon^4}\sum_{i\neq j} d_{ij}^2 = \frac{2M^2}{\varepsilon^4}\sum_{1\le i<j\le n} d_{ij}^2.$$

For the diagonal, note that

$$(L_\varepsilon')_{ii} = (D_\varepsilon')_{ii} = \sum_{j\neq i} A_\varepsilon'(i,j),$$

so by Cauchy–Schwarz,

$$(L_\varepsilon')_{ii}^2 \le (n-1)\sum_{j\neq i} (A_\varepsilon'(i,j))^2.$$

Summing over $i$ and using symmetry $d_{ij}=d_{ji}$ gives

$$\sum_{i=1}^n (L_\varepsilon')_{ii}^2 \le (n-1)\sum_{i=1}^n\sum_{j\neq i} (A_\varepsilon'(i,j))^2 \le (n-1)\cdot \frac{2M^2}{\varepsilon^4}\sum_{1\le i<j\le n} d_{ij}^2.$$

Combining diagonal and off-diagonal pieces yields

$$\|L_\varepsilon'\|_F^2 = \sum_{i=1}^n (L_\varepsilon')_{ii}^2 + \sum_{i\neq j} (L_\varepsilon')_{ij}^2 \le \frac{2nM^2}{\varepsilon^4}\sum_{1\le i<j\le n} d_{ij}^2,$$

so

$$\|L_\varepsilon'\|_F \le \frac{M}{\varepsilon^2}\sqrt{2n\sum_{1\le i<j\le n} d_{ij}^2}.$$

Putting everything together,

$$|K_t'(\varepsilon)| \le t\,\sqrt{n}\cdot \frac{M}{\varepsilon^2}\sqrt{2n\sum_{1\le i<j\le n} d_{ij}^2} = \frac{t M n}{\varepsilon^2}\,\sqrt{2\sum_{1\le i<j\le n} d_{ij}^2},$$

which is the claim. $\square$

Interpretation

The exact derivative formula shows that the effective cluster count changes because diffusion mixing changes as $\varepsilon$ varies. The factor $\varepsilon^{-2}$ is fundamental: it comes directly from differentiating the normalized distance $d_{ij}/\varepsilon$. Thus the geometry is most sensitive to $\varepsilon$ at small scales and stabilizes at larger scales. In regimes where the weighted graph is close to disconnected, large $t$ forces $K_t(\varepsilon)$ to behave like an integer cluster count; hard-threshold discontinuities can be viewed as integer "quantizations" of this smooth diffusion-based connectivity functional.