Cued Speech Synthesis

What's Cued Speech ?

Cued Speech is a system developed by Dr. Cornett in 1967 and designed to complement speechreading. It is based on the association of speech articulation with cues formed by the hand. For each CV (consonant-vowel) sequence pronounced, the cuer uses one of his/her hand to point out a specific position (indicating a subset of vowels) with a specific hand shape (indicating a subset of consonants).

Implementation

Data recordings

We used a Vicon motion capture system (12 cameras, 120 Hz, 1 million of pixels) to record dynamic corporas (with tens of markers) pronounced and cued by our speaker. We also recorded static corpora (with hundreds of markers) to complement our dynamic data.

Data Analysis

We processed and analyzed the corpora, especially a corpus of 238 sentences selected to contain all French diphones. The main observation was that the hand "key" is in advance compared to the corresponding acoustic unit.

Synthesis

Knowing the production synchronisation, we decided, in order to implement a Text-to-Cued Speech system, to proceed in two steps: a first concatenative synthesis system delivers sound and facial movements and a second concatenative synthesis system delivers the head movements, hand movements and hand shaping.

audio visual text-to-cued speech scheme

We can synthesize any (french) text input and deliver a low definition shape model (the original one, computed from the motion capture data) or a high definition model or a textured model.

synthesis low definition synthesis high definition synthesis textured

Evaluation

To evaluate our talking and cueing head, we proposed 2 separate tests: the first one evaluating the segmental intelligibility and the second one for the long-term comprehension. It appears that our avatar is able to transmit significant linguistic information with minimal cognitive effort for segmental generation. Nevertheless suprasegmental organization of the synthetic French Cued Speech should be improved in order to ease discourse segmentation and emphasis in long-term comprehension task.

Some examples

Presentation of the virtual CS talker by herself

Superimposition of the avatar on a broadcast TV program (Karambolage, ARTE TV)

Selected publications

Analysis and synthesis of the three-dimensional movements of the head, face and hand of a speaker using Cued Speech

Gibert, G., Bailly, G., Beautemps, D., Elisei, F. & Brun, R.

Journal of the Acoustical Society of America

ARTUS: synthesis and audiovisual watermarking of the movements of a virtual agent interpreting subtitling using Cued Speech for deaf televiewers

Bailly, G., Attina, V., Baras, C., Bas, P., Baudry, S., Beautemps, D., Brun, R., Chassery, J.-M., Davoine, F., Elisei, F., Gibert, G., Girin, L., Grison, D., Léoni, J.-P., Liénard, J., Moreau, N. & Nguyen, P.

AMSE - Advances in Modelling

This work was supported by the ANR ARTUS project which aimed at replacing the subtitles by a talking head able to produce the French Cued Speech.

Guillaume Gibert