Encoding modeling

Understanding neural representation
Updated: 2024-09-08

Linear modeling with nonlinear transformation of the external information has been widely used to understand how the human brain processes real-world environment (Kim, 2025; Kim et al., 2024; Kim et al., 2023; Kim, 2022; Leahy* et al., 2021) .

Fig 1. Overview of Linearized Encoding Analysis (LEA)

Methodological issues

Reverse double-dipping

This article (Kim, 2025) elucidates a methodological pitfall of cross-validation for evaluating predictive models applied to naturalistic neuroimaging data—namely, ‘reverse double-dipping’ (RDD). In a broader context, this problem is also known as ‘leakage in training examples’, which is difficult to detect in practice. RDD can occur when predictive modeling is applied to data from a conventional neuroscientific design, characterized by a limited set of stimuli repeated across trials and/or participants. It results in spurious predictive performances due to overfitting to repeated signals, even in the presence of independent noise. Through comprehensive simulations and real-world examples following theoretical formulation, the article underscores how such information leakage can occur and how severely it could compromise the results and conclusions when it is combined with widely spread informal reverse inference. The article concludes with practical recommendations for researchers to avoid RDD in their experiment design and analysis.

Fig 2. Reserve double-dipping: data dips you, twice!

Time series prediction

Fig 3. Spurious correlation between smooth time series

Resources

Kim, 2024-09-07, Linearized Encoding Modeling: a Predictive Analysis Methodology for Music Perception, Korean Society for Music Perception and Cognition (KSMPC) Summer School 24, Session 3 lecture. [slides] [code] [repo]

References

2025

preprint
Reverse Double-Dipping: When Data Dips You, Twice—Stimulus-Driven Information Leakage in Naturalistic Neuroimaging

Seung-Goo Kim

bioRxiv, 2025

Abs DOI Bib HTML PDF

This article elucidates a methodological pitfall of cross-validation for evaluating predictive models applied to naturalistic neuroimaging data–namely, ’reverse double-dipping’ (RDD). In a broader context, this problem is also known as ’leakage in training examples’, which poses challenges in detecting it in practice. This issue can occur when predictive modeling is employed with data from a conventional neuroscientific design, characterized by a limited set of stimuli repeated across trials and/or participants, resulting in spurious predictive performances due to overfitting to repeated signals, even in the presence of independent noise. Through comprehensive simulations and real-world examples following theoretical formulation, the article underscores how such information leakage can occur and how severely it could compromise the analysis when it is combined with widely spread informal reverse inference. The article concludes with practical recommendations for researchers to avoid RDD in their experiment design and analysis.Competing Interest StatementThe authors have declared no competing interest.
@article{kim2025rdd, author = {Kim, Seung-Goo}, doi = {10.1101/2025.04.01.646146}, elocation-id = {2025.04.01.646146}, eprint = {https://www.biorxiv.org/content/early/2025/04/05/2025.04.01.646146.full.pdf}, journal = {bioRxiv}, publisher = {Cold Spring Harbor Laboratory}, title = {Reverse Double-Dipping: When Data Dips You, Twice{\textemdash}Stimulus-Driven Information Leakage in Naturalistic Neuroimaging}, year = {2025}, bdsk-url-1 = {https://www.biorxiv.org/content/early/2025/04/05/2025.04.01.646146}, bdsk-url-2 = {https://doi.org/10.1101/2025.04.01.646146}, }

2024

CerCor

Linguistic modulation of the neural encoding of phonemes

Seung-Goo Kim, Federico De Martino, and Tobias Overath

Cerebral Cortex, 2024

DOI Bib HTML PDF Code

@article{kim2024cc,
  journal = {Cerebral Cortex},
  author = {Kim, Seung-Goo and Martino, Federico De and Overath, Tobias},
  date-modified = {2024-04-25 14:43:05 +0200},
  doi = {10.1093/cercor/bhae155},
  title = {Linguistic modulation of the neural encoding of phonemes},
  year = {2024},
}

2023

ICMPC

Emotion-relevant Representations of Music Extracted by Convolutional Neural Networks Are Encoded in Medial Prefrontal Cortex

Seung-Goo Kim, Tobias Overath, and Daniela Sammler

Proceedings – The Joint Conference of the 17th International Conference on Music Perception and Cognition (ICMPC) and the 7th Conference of the Asia-Pacific Society for the Cognitive Sciences of Music (APSCOM), 2023

Bib PDF Slides

@article{kim2023icmpc,
  author = {Kim, Seung-Goo and Overath, Tobias and Sammler, Daniela},
  journal = {Proceedings -- The Joint Conference of the 17th International Conference on Music Perception and Cognition (ICMPC) and the 7th Conference of the Asia-Pacific Society for the Cognitive Sciences of Music (APSCOM)},
  date = {2023-08-01},
  date-modified = {2024-04-25 18:48:05 +0200},
  organization = {International Conference on Music Perception and Cognition (ICMPC)},
  title = {Emotion-relevant Representations of Music Extracted by Convolutional Neural Networks Are Encoded in Medial Prefrontal Cortex},
  year = {2023},
}

2022

FNsci
On the encoding of natural music in computational models and human brains

Seung-Goo Kim

Frontiers in Neuroscience, 2022

Abs DOI Bib HTML PDF

This article discusses recent developments and advances in the neuroscience of music to understand the nature of musical emotion. In particular, it highlights how system identification techniques and computational models of music have advanced our understanding of how the human brain processes the textures and structures of music and how the processed information evokes emotions. Musical models relate physical properties of stimuli to internal representations called features, and predictive models relate features to neural or behavioral responses and test their predictions against independent unseen data. The new frameworks do not require orthogonalized stimuli in controlled experiments to establish reproducible knowledge, which has opened up a new wave of naturalistic neuroscience. The current review focuses on how this trend has transformed the domain of the neuroscience of music.
@article{kim2022fn, author = {Kim, Seung-Goo}, doi = {10.3389/fnins.2022.928841}, issn = {1662-453X}, journal = {Frontiers in Neuroscience}, title = {On the encoding of natural music in computational models and human brains}, volume = {16}, year = {2022}, bdsk-url-1 = {https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2022.928841}, bdsk-url-2 = {https://doi.org/10.3389/fnins.2022.928841} }

2021

FNsci

An Analytical Framework of Tonal and Rhythmic Hierarchy in Natural Music Using the Multivariate Temporal Response Function

J. Leahy^*, Seung-Goo Kim^*, J. Wan, and 1 more author

Frontiers in Neuroscience, 2021

HTML PDF Code