Categories and Gradience in Intonation

Research Objectives

Soundwave
Intonation, or the melody of speech, plays a central role in human communication. It is one of many elements in the speech signal that needs to be decoded to translate speech sound into meaning, but its role in speech understanding is crucial, since it can give us immediate cues to the start of a new word or phrase in the speech stream, and to the meaning of utterances. As a consequence, when the intonation is wrong, communication often breaks down.

In spite of its importance, intonation is still very poorly understood. It is notoriously difficult to analyse because it is carried by a continuous sound signal, it has multiple functions, and it interacts with other elements in the speech signal that convey meaning.

We know that at some stage in the comprehension process, some of this continuous information is interpreted categorically and decoded into distinct meaningful units, such as a rising pattern that marks a question. However, sometimes it makes a more gradient contribution to meaning, when gradual increases or decreases in a particular feature like pitch convey, for instance, a more angry or less timid tone of voice. These variations in form and their contribution to meaning are closely intertwined, and difficult to disentangle. To make matters worse, they are ignored in virtually all cognitive, neuropsychological and neurobiological studies of intonation. As a result, it is unclear exactly how intonation is realised in speech, what units are involved, how it contributes to speech comprehension, and how it is processed in the brain. This lack of understanding holds back progress in the speech sciences.

The main objective of this research is to test the central principle of the currently predominant theoretical framework for intonation analysis (the ‘Autosegmental-Metrical approach’). This principle allows us to analyse the intonational features of the speech signal in a discrete, insightful way by disentangling the interaction between the categorical and gradient information mentioned above. Since it helps us identify the critical features of intonation patterns, we can use it to pin down their role in language processing and the neural architecture that supports it.
More specifically, the objectives of the project are to:

  1. test the Autosegmental-Metrical tenet that intonation is best analysed in terms of distinct phonological and phonetic levels of representation, which are defined by their contribution to meaning (‘linguistic’ and ‘paralinguistic’), and can be realised through gradient as well as categorical variation in form
  2. provide the first neurobiological evidence of a more refined, linguistically informed model of the neurobiological underpinnings of intonation
  3. show that hierarchically organised processing is a universal characteristic of language processing, encompassing both segmental and suprasegmental properties, where dissociations in lower-level auditory and higher-level linguistic subprocesses reflect distinctions made in current phonological theory
  4. combine linguistic evidence from production and perception experiments with direct evidence from neuroimaging
  5. evaluate established methods for testing category membership in intonation
  6. provide a template for future studies.

The findings will allow linguists, cognitive psychologists, neuropsychologists, speech engineers and computer scientists to improve their theories and models of language and speech processing.

Secondary beneficiaries are applied researchers in neuropsychology, language teaching and speech synthesis and recognition. In these disciplines it has long been known that deficiencies in prosodic performance are a major factor in impeding communication (e.g. cross-culturally, or in aphasic speech). This is not surprising given the crucial role that prosody and intonation have been shown to play in (human) speech segmentation and recognition.

In all of these disciplines, computer-aided programmes are being developed to optimise speech processing conditions for their respective user groups. For the neuropsychologist, knowing which areas of the brain are crucially implicated in processing intonation for different functions will allow fine-tuning of treatments for stroke rehabilitation and speech therapy. In second language teaching, a better understanding of which aspects of an intonation contour crucially affect how they are processed and perceived by a listener can be used improve teaching methods by helping learners’ attention focus on the most critical aspects of the spoken L2. Similarly, in speech synthesis and recognition, a better understanding of the categorical phonological and gradient phonetic information in intonation can be used to help lift success rates which are no longer showing the improvements that are offered by computational and engineering techniques alone. Thus, the fundamental research carried out in this project will only have scientific impacts, but indirectly, it may benefit non-academic communities in the longer term.

More generally, the research advances our insight into the integration of the cues to channels of communication that simultaneously convey linguistic messages, but also information about the speaker’s state of mind (such as their attitude or feelings). Our understanding of human communication and the neural and cognitive systems that support it crucially hinges on such insights.