Projection matrix
Let us consider a linear model: \begin{equation} \label{eq:fir} \mathbf{y} = \mathbf{X} \mathbf{b} + \mathbf{e} \end{equation} where \(\mathbf{y} \in \Re^{T\,\times\,1}\) is a response vector over \(T\) timepoints, \(\mathbf{X} \in \Re^{T\,\times\,P}\) is a design matrix of \(P\) predictors such that \(\|\mathbf{X}'\mathbf{X}\| > 0\), \(\mathbf{b} \in \Re^{P\,\times\,1}\) is an unknown weight vector, \(\mathbf{e} \in \Re^{T\,\times\,1}\) is a Gaussian noise vector \(\mathbf{e} \sim \mathcal{N} (\mathbf{0}, \sigma^2\mathbf{I}_{T})\) where \(\sigma^2\) is a variance scalar and \(\mathbf{I}_T \in \Re^{T\,\times\,T}\) is an identity matrix.
In a ridge regression and cross-validation over multiple partitions of data, if we use Pearson correlation for a prediction performance measure, the expectation of it can be simplified:
\begin{equation} \label{eq:expcorr1} \mathbb{E}\left[\text{corr} \left( \hat{\mathbf y}_{2},\, \mathbf y_2 \right) \right] \propto \mathbf{s}_1’ \mathbf{P}’ \mathbf{s}_2 \end{equation}
where \(\mathbf{s}\_i = \mathbf{X}\_i \mathbf{b}\) is a true signal in the \(i\)-th partition, and the first partition is training
set and the second partition is testing
set.
The ridge projection matrix \(\mathbf P = \mathbf{X}_2 \mathbf{X}_1^{+} + \lambda \mathbf{I}\) can be somewhere between these two extreme cases depending on the optimal \(\lambda\):
\[\begin{cases} \mathbf{P} \approx \lambda \mathbf{I} & \text{if } \lambda \gg \|\mathbf{X}_2 \mathbf{X}_1^{+}\| \\ \mathbf{P} = \mathbf{X}_2 \mathbf{X}_1^{+} & \text{if } \lambda = 0 \end{cases}\]where \((\cdot)^+\) denotes Moore–Penrose inverse.
Now, it shows for a higher regularization (i.e., a smaller signal), the projection matrix gets similar to an identity matrix, making the prediction accuracy highly dependent on the product of two true signals (\(\mathbf{s}_1'\mathbf{s}_2\)). And this should be zero in a usual case.
Enjoy Reading This Article?
Here are some more articles you might like to read next: