INTERSPEECH 2018 – Special Session

The independence of source and filter in vowel production and perception in cross-cultural speech communication


Conference home page

Website for special sessions


Speech has traditionally been thought of as the product of two mostly independent variables, source and filter. However, the contribution of the acoustic correlates of source and filter on the perception of speech like the naturalness of voices or the categorisation of vowels is far from understood. By now there is strong evidence that high pitched vowels (up to 1kHz) are still well recognisable despite an extremely wide harmonic spacing and poorly sampled formant frequencies. The contribution of sources and filter components to speech perception can vary strongly between languages, cultures  and production styles. There is cross-cultural evidence from Chinese opera singing, for example, in which high pitched vowels in the soprano register are well recognisable, a phenomenon that is typically not found in Western Opera. Besides, phonation type, vocal effort and phoneme context can strongly affect source and filter interactions in speech. Consequently, the correspondence between perceived sound categories and their relation to patterns of spectral peaks or whole spectral envelopes is in question. For this special session, we invite contributions from different disciplines and based on multilingual/-cultural evidence, explaining the complex relationship between source and filter in speech production and perception.


The objective of this session is to bring together researchers

  •  … from multiple fields like speech technology (synthesis and recognition), speech acoustics, singing research and speech perception/auditory phonetics to commonly address the issue of fundamental frequency on vowel category production and perception.
  •  … from multiple linguistic backgrounds where phonetically different ranges of fundamental frequency in speech and singing are typical to compare source-filter dependence across a variety of languages and/or singing cultures.

This topic is typically scattered over numerous sessions at INTERSPEECH or other speech conferences. We find that there is a high need to combine the topic in a special session.


We are aiming at having talks from different disciplines based on multilingual evidence, arguing for dependence or independence of source and filter in the source-filter model. Depending on the number of submissions we will also have posters.

At the Zurich University of the Arts the applicants created a large database of c. 35’000+ vowels that were produced in a controlled way by 70 nonprofessional speakers and professional singers and actresses/actors from low to high fundamental frequencies (up to 1kHz) with varying basic production parameters such as vowel context, phonation type, vocal effort, fundamental frequency, and production style. This database will be demonstrated at the special session with the aim to test different hypotheses arising from the contributions directly on the database in follow up research.


Here is a selection of articles that contribute to the topic. Please feel free to suggest others by mail and we will add them to the list: 

  • Assmann, P. and Neary, T. M. (2007): Relationship between fundamental and formant frequencies in voice preferences. Journal of the Acoustic (122/2), EL35-43
  • Maurer, D. (in press): Why a phenomenology of vowel sounds is needed. In Tagungsband der 13. Tagung Phonetik und Phonologie im Deutschsprachigen Raum. P und P 13, 28. - 29. September 2017, Humboldt-Universität zu Berlin.

      Paper (manuscript accepted)  Extended text

  • Friedrichs, D., Maurer, D., Rosen, S., and Dellwo, V. (2017): Vowel recognition at fundamental frequencies up to 1kHz reveals point vowels as acoustic landmarks. Journal of the Acoustical Society of America, 142 (2), pp. 1025–1033.


  • Maurer, D. (2016): Acoustics of the Vowel – Preliminaries. Bern/Frankfurt: Peter Lang., open access (free download)   Materials online

  • Friedrichs, D., Maurer, D., Suter, H., and Dellwo, V. (2015): Vowel identification at high fundamental frequencies in minimal pairs. In The Scottish Consortium for ICPhS 2015 (Ed.), Proceedings of the 18th International Congress of Phonetic Sciences. Glasgow, UK. Paper number 0434, 1-4.


  • Friedrichs, D., Maurer, D., Dellwo, V. (2015): The phonological function of vowels is maintained at fundamental frequencies up to 880 Hz. Journal of the Acoustical Society of America, 138, EL36.


  • Maurer, D., Mok, P., Friedrichs, D., Dellwo, V. (2014): Intelligibility of high-pitched vowel sounds in the singing and speaking of a female Cantonese Opera singer. Proceedings of the 15th Conference of the International Speech Communication Association, Interspeech 2014, 2132-2133.

      Paper  Materials online

  • Sundberg, J. (2013). Perception of singing. In D. Deutsch (Ed.), The Psychology of music. (3rd ed.). London: Academic Press.
  • Ménard, L., Schwartz, J.-L., Boë, L.-J., Kandel, S., Vallée, N. (2002). Auditory normalization of French vowels synthesized by an articulatory model simulating growth from birth to adulthood. Journal of the Acoustical Society of America 111(4), 1892-1905.


  • Maurer, D., and Landis, T. (2000): Formant pattern ambiguity of vowel sounds. International Journal of Neuroscience, 100 (1-4), 39-76.


  • Maurer, D., and Landis, T. (1996): Intelligibility and spectral differences in high pitched vowels. Folia Phoniatrica et Logopaedica, 48, 1-10.


  • Maurer, D., and Landis, T. (1995): Fo-dependence, number alteration, and non-systematic behaviour of the formants in German vowels. International Journal of Neuroscience, 83 (1-2), 25-44.


  • Maurer, D., Gröne, B., Landis, T., Hoch, G., and Schönle, P.W. (1993): Reexamination of the relation between the vocal tract and the vowel sound with EMA in vocalizations. Clinical Linguistics & Phonetics, 7, 129-143


  • Maurer, D., Cook, N., Landis, T., and d'Heureuse, C. (1992): Are measured differences between the formants of men, women and children due to F0 differences? Journal of the International Phonetic Association, 21, 66-79.


  • Hirahara, T., & Kato, H. (1992). The effect of F0 on vowel identification. In Y. Tohkura, E. Vatikiotis-Bateson, & Y. Sagisaka, Y. (Eds.), Speech perception, production and linguistic structure (pp. 89-112). Tokyo: Ohmsha.
  • Traunmüller, H. (1988). Paralinguistic variation and invariance in the characteristic frequencies of vowels. Phonetica, 45(1), 1-29.


  • Traunmüller, H. (1981). Perceptual dimension of openness in vowels. The Journal of the Acoustical Society of America, 69(5), 1465-1475.


  • Smith, L. A., & Scott, B. L. (1980). Increasing the intelligibility of sung vowels. The Journal of the Acoustical Society of America, 67(5), 1795-1797.


  • Fant, G., Carlson, R., & Granström, B. (1974). The [e] –[ø] ambiguity. Proceedings of the Speech Communication Seminar, Stockholm, Aug. 1–3, 117–121.
  • Fujisaki, H., & Kawashima, T. (1968). The roles of pitch and higher formants in the perception of vowels. IEEE transactions on audio and electroacoustics, 16(1), 73-77.


  • Miller, R. L. (1953). Auditory tests with synthetic vowels. The Journal of the Acoustical society of America, 25(1), 114-121.


  • Potter, R. K., & Steinberg, J. C. (1950). Towards the specification of speech. The Journal of the Acoustical society of America22(6), 807–820.


Recent short contributions (pilot studies)

  • Maurer, D., Dellwo, V., Suter, H., Kathiresan, T. (2017, abstract): Formant pattern and spectral shape ambiguity of vowel sounds revisited in synthesis: changing perceptual vowel quality by only changing the fundamental frequency. Journal of the Acoustical Society of America, 141(5):3469-3470. 

      Abstract   Details 

  • Maurer, D., & Suter, H. (2017b, abstract). Vowel synthesis related to equal-amplitude harmonic series in frequency ranges > 1 kHz combined with single harmonics < 1 kHz, and including variation of fundamental frequency. The Journal of the Acoustical Society of America, 141(5), 3469-3469. 

      Abstract  Details

  • Maurer, D., & Suter, H. (2017a, abstract). "Flat" vowel spectra revisited in vowel synthesis. The Journal of the Acoustical Society of America, 141(5), 3469-3469.  

      Abstract  Details