Odette Scharenborg
Abstract Speech recognition is the mapping of a continuous, highly variable speech signal onto discrete, abstract representations. The question of how speech is represented and processed in the human brain and in automatic speech recognition (ASR) systems, although crucial in both the field of human speech processing and the field of automatic speech processing, has historically been investigated in the two fields separately. This webinar will discuss how comparisons between humans and deep neural network (DNN)-based ASRs, and cross-fertilization of the two research fields, can provide valuable insights into the way humans process speech and improve ASR technology. Specifically, it will present results of several experiments carried out on both human listeners and DNN-based ASR systems on the representation of speech in human listeners and DNNs and on lexically-guided perceptual learning, i.e., the ability to adapt a sound category on the basis of new incoming information resulting in improved processing of subsequent information. It will explain how listeners adapt to the speech of new speakers, and will present the results of a lexically-guided perceptual study carried out on a DNN-based ASR system, similar to the human experiments. In order to investigate the speech representations and adaptation processes in the DNN-based ASR systems, activations in the hidden layers of the DNN were visualized. These visualizations revealed that DNNs use speech representations that are similar to those used by human listeners, without being explicitly taught to do so, and showed an adaptation of the phoneme categories similar to what is assumed happens in the human brain.
Odette Scharenborg is an Associate Professor and Delft Technology Fellow at Delft University of Technology working on automatic speech processing. She has an interdisciplinary background in automatic speech recognition and psycholinguistics, and uses knowledge from how humans process speech in order to develop inclusive automatic speech recognition systems that are able to recognise speech from everyone, irrespective of how they speak or the language they speak. Since 2017, she is on the Board of the International Speech Communication Association, and currently serves as Vice-President. Since 2018, she is on the IEEE Speech and Language Processing Technical Committee, and she is a Senior Associate Editor of IEEE Signal Processing Letters.