Tag Archives: Reconstructing Speech from Human Auditory Cortex

Reconstructing Speech from Human Auditory Cortex: Computer Program Translates Brain Waves Into Individual Words with 80-90% Accuracy, says a study by Brian N. Pasley, Stephen V. David, Nima Mesgarani, Adeen Flinker, Shihab A. Shamma, Nathan E. Crone, Robert T. Knight, Edward F. Chang.

Abstract: How the human auditory system extracts perceptually relevant acoustic features of speech is unknown. To address this question, we used intracranial recordings from nonprimary auditory cortex in the human superior temporal gyrus to determine what acoustic information in speech sounds can be reconstructed from population neural activity. We found that slow and intermediate temporal fluctuations, such as those corresponding to syllable rate, were accurately reconstructed using a linear model based on the auditory spectrogram. However, reconstruction of fast temporal fluctuations, such as syllable onsets and offsets, required a nonlinear sound representation based on temporal modulation energy. Reconstruction accuracy was highest within the range of spectro-temporal fluctuations that have been found to be critical for speech intelligibility. The decoded speech representations allowed readout and identification of individual words directly from brain activity during single trial sound presentations. These findings reveal neural encoding mechanisms of speech acoustic parameters in higher order human auditory cortex.

Summary: Spoken language is a uniquely human trait. The human brain has evolved computational mechanisms that decode highly variable acoustic inputs into meaningful elements of language such as phonemes and words. Unraveling these decoding mechanisms in humans has proven difficult, because invasive recording of cortical activity is usually not possible. In this study, we take advantage of rare neurosurgical procedures for the treatment of epilepsy, in which neural activity is measured directly from the cortical surface and therefore provides a unique opportunity for characterizing how the human brain performs speech recognition. Using these recordings, we asked what aspects of speech sounds could be reconstructed, or decoded, from higher order brain areas in the human auditory system. We found that continuous auditory representations, for example the speech spectrogram, could be accurately reconstructed from measured neural signals. Reconstruction quality was highest for sound features most critical to speech intelligibility and allowed decoding of individual spoken words. The results provide insights into higher order neural speech processing and suggest it may be possible to readout intended speech directly from brain activity.

Full paper on: http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001251