Keyboard Shortcuts?

×
  • Next step
  • Previous step
  • Skip this slide
  • Previous slide
  • mShow slide thumbnails
  • nShow notes
  • hShow handout latex source
  • NShow talk notes latex source

Click here and press the right key for the next slide (or swipe left)

also ...

Press the left key to go backwards (or swipe right)

Press n to toggle whether notes are shown (or add '?notes' to the url before the #)

Press m or double tap to slide thumbnails (menu)

Press ? at any time to show the keyboard shortcuts

 

Speech Perception

articulatory gesture

In speaking we produce an overlapping sequence of articulatory gestures, which are motor actions involving coordinated movements of the lips, tongue, velum and larynx. These gestures are the units in terms of which we plan utterances (Browman and Goldstein 1992; Goldstein, Fowler, et al. 2003).
These are the actions I want to focus on first in thinking about what we experience when we encounter others’ actions.

Browman & Goldstein 1986, figure 1

‘Trajectory of lower lip in [abe] as measured by tracking infra-redLED placed on subject's lower lip’
‘Not every utterance of word transcribed with /b/ will display exactly the trajectory of Fig. 1.: the trajectory will vary with vowel context, syllable position, stress, speaking rate and speaker. We must, therefore, ultimately characterise /b/ as a family of patterns of lip movement’ \citep[p.~224]{browman:1986_towards}

Speech and auditory perception involve distinct processes

A schematic spectrogram for a synthetic sound which is normally perceived as [ra]. The horizontal axis represents time, the vertical frequency.
A schematic spectrogram for [la].
In the middle you see the \emph{base}, i.e. the part of the spectrogram common to [ra] and [la]. This is played to one ear
Below you see the transitions, , i.e. the parts of the spectrogram that differ between [ra] and [la]. When played in isolation these sound like a chirp. When played at the same time as the base but in the other ear, subjects hear a chirp and a [ra] or a [la] depending on which transition is played.
How do we know that the same stimuli may be processed by different perceptual systems concurrently—for instance, how do we know that speech and auditory processing are distinct? A phenomenon called “duplex” perception demonstrates their distinctness occurs in. Artificial speech-like stimuli for two syllables, [ra] and [la], are generated. The acoustic signals for each syllable is artificially broken up into two parts, the “base” and “transition” (see Fig. *** below). The syllables have the same “base” but differ in the “transition”. When the “transition” is played alone it sounds like a chirp and quite unlike anything we normally hear in speech. Duplex perception occurs when the base and transition are played together but in separate ears. In this case, subjects hear both the chirp that they hear when the transition is played in isolated, and the syllable [la] or [ra]. Which syllable they hear depends on which transition is played, so speech processing must have combined the base and transition. By contrast, auditory processing must have failed to combine them because otherwise the chirp would not have been heard. In this case, then, the perception resulting from the duplex presentation involves simultaneous auditory and speech recognition processes. This shows that auditory and speech processing are distinct perceptual processes.
The duplex case is unusual. We can’t normally hear the chirps we make in speaking because speech processing inhibits this level of auditory processing. But plainly speech is subject to some auditory processing for we can hear extra-linguistic qualities of speech; some of these provide cues to emotional state, gender and class. Perception of these extra-linguistic qualities enables us to distinguish stimuli within a category. As already mentioned, this is a problem for Repp’s operational definition. Our ability to discriminate stimuli is the product of both categorical speech processing and non-categorical auditory processing. If we want to get at the essence of categorical perception it seems there is no alternative but to appeal to particular perceptual processes rather than behaviours.
Source: \citep{Liberman:1981xk}
Here are 12 speech-like sounds. Acoustically each differs from its neighbours no more than any other does.
They would be labelled differently
And within a label they are relatively hard to disciminate whereas ...
Discriminating acoustically no less similar stimuli that are given different labels is easier (faster and more accurate).
This is categorical perception: speed and accuracy maps onto labelling ...
Categorical perception of mating calls and perhaps other acoustic signals is widespread in non-human animals including monkeys, mice and chinchillas (Ehret 1987; Kuhl 1987), and is even found in cognitively unsophisticated animals such as frogs (Baugh, Akre and Ryan 2008) and crickets (Wyttenbach, May and Hoy 1996).

What are the objects of categorical perception?

the location of the category boundaries changes depending on contextual factors such as the speaker’s dialect,22 or the rate at which the speaker talks;23 both factors dramatically affect which sounds are produced.
This means that in two different contexts, different stimuli may result in the same perceptions, and the same stimulus may result in different perceptions.
co-articulation, the fact that phonic gestures overlap (this is what makes talking fast).

What are the objects of categorical perception?

1. Speech perception is categorical

2. The category boundaries correspond (imperfectly but robustly) to differences in articulatory gestures

3. The best explanation of (2) involves the hypothesis that the objects of speech perception are articulatory gestures

\emph{Articulatory Gesture:} In speaking we produce an overlapping sequence of articulatory gestures, which are motor actions involving coordinated movements of the lips, tongue, velum and larynx. These gestures are the units in terms of which we plan utterances (Browman and Goldstein 1992; Goldstein, Fowler, et al. 2003).