1. Context
A stationary time series $f(t)$ contains (i) a marginal distribution of values and (ii) a dependence structure that orders those values in time. Collapsing $f$ to its empirical density $P$ destroys temporal information; keeping only the rank ordering destroys scale. A natural question is whether we can separate these two aspects in a lossless manner. Sklar’s theorem (Sklar, 1959) affirms this for multivariate distributions; applied to time series it provides the sought factorization into a marginal law and a copula (dependence) component.
2. Mathematical Decomposition
Let $P$ denote the (continuous) marginal distribution of $f(t)$ with CDF $F_P$. Define the copula (rank) process
$$ \phi(t) \;=\; F_P\!\big(f(t)\big)\in[0,1]. $$
By the probability integral transform, $\phi(t)$ is marginally $\mathrm{Unif}(0,1)$ for each $t$. The original series is recovered via the quantile map
$$ f(t) \;=\; F_P^{-1}\!\big(\phi(t)\big). $$
Thus, $\big(P,\phi\big)$ is a lossless representation: $P$ encodes value distribution; $\phi$ encodes temporal structure (serial dependence, ordering, regimes).
3. Likelihood Factorization
For a block $f_{1:n}=(f_1,\dots,f_n)$ with $u_t=F_P(f_t)$, the joint density admits the copula factorization
$$ p(f_{1:n}) \;=\; c(u_{1:n}) \prod_{t=1}^n p_P(f_t), $$
where $p_P$ is the marginal density and $c(\cdot)$ is the copula density governing $(u_1,\dots,u_n)$ (Joe, 1997). The log-likelihood splits accordingly:
$$ \ell(\theta,\psi) \;=\; $$
$$ \sum_{t=1}^n \log p_\theta(f_t)\;+\; \log c_\psi(u_{1:n}). $$
This backbone underlies joint maximum likelihood as well as two-stage schemes such as IFM (Joe & Xu, 1996; Chen & Fan, 2006) and semiparametric rank-based pseudo-likelihood (Genest et al., 1995).
4. Information-Theoretic View
Let $h(\cdot)$ denote (differential) entropy. The multi-information (total correlation) of the block is $I(f_{1:n})=\sum_{t=1}^n h(f_t)-h(f_{1:n})$ (Watanabe, 1960). Using the copula factorization one obtains the identity
$$ h(f_{1:n}) \;=\; n\,h(P) \;-\; I(f_{1:n}), $$
where $h(P)=-\mathbb{E}[\log p_P(f)]$ is the marginal entropy. Ma & Sun (2011) showed that copula entropy equals $-I$, confirming that all dependence information is carried by the copula (i.e., by $\phi$).
In the limit, the entropy rate satisfies $\bar h = h(P) - \bar I$, where $\bar I$ is the per-sample dependence information. This yields a crisp “bit budget”: marginal bits ($n h(P)$) vs. structural bits ($I$).
5. Learning: Joint vs. Two-Stage
Joint MLE. Maximize $\ell(\theta,\psi)$ jointly for $(\theta,\psi)$; this is statistically efficient but couples marginal and copula parameters through $u_t=F_\theta(f_t)$ (Patton, 2006).
IFM (two-stage). First fit $\theta$ from $\sum_t \log p_\theta(f_t)$; then fit $\psi$ from $\log c_\psi(F_{\hat\theta}(f)_{1:n})$. IFM is consistent and asymptotically normal, typically with minor efficiency loss vs. joint MLE (Joe & Xu, 1996; Chen & Fan, 2006).
Pseudo-likelihood. Replace $F_\theta$ by empirical ranks $\tilde u_t$ to estimate $\psi$ semiparametrically; this is robust to marginal misspecification (Genest et al., 1995).
6. Machine Learning & PAC Learnability
The decomposition aligns with ML notions of disentanglement: $P$ captures value complexity (content), while $\phi$ captures structural complexity (temporal style). Modular learning trains these two parts independently and composes them via $f=F_P^{-1}\!\circ \phi$.
6.1 A PAC-style composition argument
Let $\mathcal H_P$ be a hypothesis class for marginals (e.g., parametric families, flows) and $\mathcal H_\phi$ a class for dependence models on $[0,1]$ (e.g., Markov copulas, sequence models). Consider the composed class
$$ \mathcal H_f \;=\; $$
$$ \big\{\, F_P^{-1}\!\circ h_\phi \;:\; P\in\mathcal H_P,\; h_\phi\in\mathcal H_\phi \,\big\}. $$
Suppose $\mathcal H_P$ and $\mathcal H_\phi$ are PAC-learnable with sample complexities $m_P(\varepsilon,\delta)$ and $m_\phi(\varepsilon,\delta)$ under appropriate, stable losses (e.g., proper scoring rules for density, log-likelihood). Then, under mild Lipschitz/monotonicity conditions on $F_P^{-1}$ and stability of the uniformization/quantile maps, a standard covering-number or Rademacher-complexity composition bound implies
$$ m_f(\varepsilon,\delta) \;\lesssim\; $$
$$ m_P\!\Big(\tfrac{\varepsilon}{2},\tfrac{\delta}{2}\Big) \;+\; m_\phi\!\Big(\tfrac{\varepsilon}{2},\tfrac{\delta}{2}\Big), $$
i.e., the composed class remains PAC-learnable with sample complexity bounded by the sum (up to constants) of those of the parts. This mirrors the information identity $h(f_{1:n}) = n h(P) - I(f_{1:n})$: learnability of the whole reduces to learning the marginal component and the dependence component.
7. Takeaways
- Decomposition. $f(t)=F_P^{-1}(\phi(t))$ with $\phi(t)=F_P(f(t))$ is lossless and canonical.
- Information split. $h(f_{1:n})=n h(P)-I(f_{1:n})$ quantifies marginal vs. dependence bits.
- Estimation. Joint MLE is most efficient; two-stage IFM/pseudo-likelihood is consistent, modular, and often near-efficient.
- Learnability. Under mild conditions, PAC learnability of $\mathcal H_P$ and $\mathcal H_\phi$ implies PAC learnability of the composed class $\mathcal H_f$.
References
- Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges.
- Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman & Hall.
- Joe, H. & Xu, J. J. (1996). The estimation method of inference functions for margins for multivariate models. Tech. Rep. 166, UBC.
- Chen, X. & Fan, Y. (2006). Estimation of copula-based semiparametric time series models. Journal of Econometrics.
- Patton, A. J. (2006). Modelling asymmetric exchange rate dependence. International Economic Review.
- Genest, C., Ghoudi, K., & Rivest, L.-P. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika.
- Watanabe, S. (1960). Information theoretical analysis of multivariate correlation. IBM Journal of Research and Development.
- Ma, J. & Sun, Z. (2011). Mutual information is copula entropy. Tsinghua Science and Technology.
- Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM.