Decomposing Time Series into Marginal and Dependence Components

Abstract. We present a compact decomposition of a univariate time series into a marginal law of values and a dependence process of ranks, making explicit the informational split between “what occurs” and “how it is arranged in time.” Mathematically this is an application of Sklar’s theorem to time series via copulas; information-theoretically it yields a clean bit-budget identity. We connect the decomposition to two-stage vs. joint estimation and formalize its alignment with PAC learnability, clarifying how modular learning of marginals and dependence composes into learning the full process.

1. Context

A stationary time series $f(t)$ contains (i) a marginal distribution of values and (ii) a dependence structure that orders those values in time. Collapsing $f$ to its empirical density $P$ destroys temporal information; keeping only the rank ordering destroys scale. A natural question is whether we can separate these two aspects in a lossless manner. Sklar’s theorem (Sklar, 1959) affirms this for multivariate distributions; applied to time series it provides the sought factorization into a marginal law and a copula (dependence) component.

2. Mathematical Decomposition

Let $P$ denote the (continuous) marginal distribution of $f(t)$ with CDF $F_P$. Define the copula (rank) process

$$ \phi(t) \;=\; F_P\!\big(f(t)\big)\in[0,1]. $$

By the probability integral transform, $\phi(t)$ is marginally $\mathrm{Unif}(0,1)$ for each $t$. The original series is recovered via the quantile map

$$ f(t) \;=\; F_P^{-1}\!\big(\phi(t)\big). $$

Thus, $\big(P,\phi\big)$ is a lossless representation: $P$ encodes value distribution; $\phi$ encodes temporal structure (serial dependence, ordering, regimes).

3. Likelihood Factorization

For a block $f_{1:n}=(f_1,\dots,f_n)$ with $u_t=F_P(f_t)$, the joint density admits the copula factorization

$$ p(f_{1:n}) \;=\; c(u_{1:n}) \prod_{t=1}^n p_P(f_t), $$

where $p_P$ is the marginal density and $c(\cdot)$ is the copula density governing $(u_1,\dots,u_n)$ (Joe, 1997). The log-likelihood splits accordingly:

$$ \ell(\theta,\psi) \;=\; $$

$$ \sum_{t=1}^n \log p_\theta(f_t)\;+\; \log c_\psi(u_{1:n}). $$

This backbone underlies joint maximum likelihood as well as two-stage schemes such as IFM (Joe & Xu, 1996; Chen & Fan, 2006) and semiparametric rank-based pseudo-likelihood (Genest et al., 1995).

4. Information-Theoretic View

Let $h(\cdot)$ denote (differential) entropy. The multi-information (total correlation) of the block is $I(f_{1:n})=\sum_{t=1}^n h(f_t)-h(f_{1:n})$ (Watanabe, 1960). Using the copula factorization one obtains the identity

$$ h(f_{1:n}) \;=\; n\,h(P) \;-\; I(f_{1:n}), $$

where $h(P)=-\mathbb{E}[\log p_P(f)]$ is the marginal entropy. Ma & Sun (2011) showed that copula entropy equals $-I$, confirming that all dependence information is carried by the copula (i.e., by $\phi$).

In the limit, the entropy rate satisfies $\bar h = h(P) - \bar I$, where $\bar I$ is the per-sample dependence information. This yields a crisp “bit budget”: marginal bits ($n h(P)$) vs. structural bits ($I$).

5. Learning: Joint vs. Two-Stage

Joint MLE. Maximize $\ell(\theta,\psi)$ jointly for $(\theta,\psi)$; this is statistically efficient but couples marginal and copula parameters through $u_t=F_\theta(f_t)$ (Patton, 2006).

IFM (two-stage). First fit $\theta$ from $\sum_t \log p_\theta(f_t)$; then fit $\psi$ from $\log c_\psi(F_{\hat\theta}(f)_{1:n})$. IFM is consistent and asymptotically normal, typically with minor efficiency loss vs. joint MLE (Joe & Xu, 1996; Chen & Fan, 2006).

Pseudo-likelihood. Replace $F_\theta$ by empirical ranks $\tilde u_t$ to estimate $\psi$ semiparametrically; this is robust to marginal misspecification (Genest et al., 1995).

6. Machine Learning & PAC Learnability

The decomposition aligns with ML notions of disentanglement: $P$ captures value complexity (content), while $\phi$ captures structural complexity (temporal style). Modular learning trains these two parts independently and composes them via $f=F_P^{-1}\!\circ \phi$.

6.1 A PAC-style composition argument

Let $\mathcal H_P$ be a hypothesis class for marginals (e.g., parametric families, flows) and $\mathcal H_\phi$ a class for dependence models on $[0,1]$ (e.g., Markov copulas, sequence models). Consider the composed class

$$ \mathcal H_f \;=\; $$

$$ \big\{\, F_P^{-1}\!\circ h_\phi \;:\; P\in\mathcal H_P,\; h_\phi\in\mathcal H_\phi \,\big\}. $$

Suppose $\mathcal H_P$ and $\mathcal H_\phi$ are PAC-learnable with sample complexities $m_P(\varepsilon,\delta)$ and $m_\phi(\varepsilon,\delta)$ under appropriate, stable losses (e.g., proper scoring rules for density, log-likelihood). Then, under mild Lipschitz/monotonicity conditions on $F_P^{-1}$ and stability of the uniformization/quantile maps, a standard covering-number or Rademacher-complexity composition bound implies

$$ m_f(\varepsilon,\delta) \;\lesssim\; $$

$$ m_P\!\Big(\tfrac{\varepsilon}{2},\tfrac{\delta}{2}\Big) \;+\; m_\phi\!\Big(\tfrac{\varepsilon}{2},\tfrac{\delta}{2}\Big), $$

i.e., the composed class remains PAC-learnable with sample complexity bounded by the sum (up to constants) of those of the parts. This mirrors the information identity $h(f_{1:n}) = n h(P) - I(f_{1:n})$: learnability of the whole reduces to learning the marginal component and the dependence component.

7. Takeaways

Decomposition. $f(t)=F_P^{-1}(\phi(t))$ with $\phi(t)=F_P(f(t))$ is lossless and canonical.
Information split. $h(f_{1:n})=n h(P)-I(f_{1:n})$ quantifies marginal vs. dependence bits.
Estimation. Joint MLE is most efficient; two-stage IFM/pseudo-likelihood is consistent, modular, and often near-efficient.
Learnability. Under mild conditions, PAC learnability of $\mathcal H_P$ and $\mathcal H_\phi$ implies PAC learnability of the composed class $\mathcal H_f$.

References

Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges.
Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman & Hall.
Joe, H. & Xu, J. J. (1996). The estimation method of inference functions for margins for multivariate models. Tech. Rep. 166, UBC.
Chen, X. & Fan, Y. (2006). Estimation of copula-based semiparametric time series models. Journal of Econometrics.
Patton, A. J. (2006). Modelling asymmetric exchange rate dependence. International Economic Review.
Genest, C., Ghoudi, K., & Rivest, L.-P. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika.
Watanabe, S. (1960). Information theoretical analysis of multivariate correlation. IBM Journal of Research and Development.
Ma, J. & Sun, Z. (2011). Mutual information is copula entropy. Tsinghua Science and Technology.
Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM.