Cued Speech is a system developed by Dr. Cornett in 1967 and designed to complement speechreading. It is based on the association of speech articulation with cues formed by the hand. For each CV (consonant-vowel) sequence pronounced, the cuer uses one of his/her hand to point out a specific position (indicating a subset of vowels) with a specific hand shape (indicating a subset of consonants).
We used a Vicon motion capture system (12 cameras, 120 Hz, 1 million of pixels) to record dynamic corporas (with tens of markers) pronounced and cued by our speaker. We also recorded static corpora (with hundreds of markers) to complement our dynamic data.
We processed and analyzed the corpora, especially a corpus of 238 sentences selected to contain all French diphones. The main observation was that the hand "key" is in advance compared to the corresponding acoustic unit.
Knowing the production synchronisation, we decided, in order to implement a Text-to-Cued Speech system, to proceed in two steps: a first concatenative synthesis system delivers sound and facial movements and a second concatenative synthesis system delivers the head movements, hand movements and hand shaping.
We can synthesize any (french) text input and deliver a low definition shape model (the original one, computed from the motion capture data) or a high definition model or a textured model.
To evaluate our talking and cueing head, we proposed 2 separate tests: the first one evaluating the segmental intelligibility and the second one for the long-term comprehension. It appears that our avatar is able to transmit significant linguistic information with minimal cognitive effort for segmental generation. Nevertheless suprasegmental organization of the synthetic French Cued Speech should be improved in order to ease discourse segmentation and emphasis in long-term comprehension task.
Presentation of the virtual CS talker by herself |
Superimposition of the avatar on a broadcast TV program (Karambolage, ARTE TV) |
This work was supported by the ANR ARTUS project which aimed at replacing the subtitles by a talking head able to produce the French Cued Speech.