Cross-language synthesis with MBROLA

In my MBROLA TTS application, I'd like to supply a variety of English-speaking "voices".

MBROLA currently offers English speaker voices en1, us1, us2 and us3, all of which I use: but I'd like more!

For my application, it's fine (indeed, an advantage) to output English with a "foreign" (in this context, a non-British and non-American) accent.

As an experiment, I wrote some quick-and-dirty PERL scripts which take an en1 .pho file as input, and produce .pho files which can be played using the br1, de1, de2, fr1, fr2, fr3, fr4, nl2 and sw1 MBROLA voices.

Essentially, the scripts simply remap the source phonemes, and check/replace illegal phoneme pairs.

Although the code is very primitive, the results were promising. To my ear, the output sounds like an authentic Brazilian, Dutch, French, German or Swedish native speaker using English.

It would be perfectly possible to design a far better, and far more general system which can map source voice A to destination voice B, rather than my rather lame, manually-derived, ad-hoc implementation.

Below are downloadable PERL scripts, MBROLA .pho files and audio samples encoded with RealPlayer .

Items marked with are "singing" files, created with another program. Click the icon to play the MIDI, or right-click to save.

I'd be interested to correspond with anyone who may be working along parallel lines in this area.

Mike Hamilton

Update 22 June 2000:

Here are some results from some recent work:

Please note that these files were NOT generated with the PERL scripts below.

I took this text:

Melbourne ATIS

Melbourne terminal information PAPA, issued at one three zero zero ZULU. Landings and departures runway three four. Wind three six zero at one two. Ceiling and visibility OK. Temperature one zero, dew point six. QNH one zero zero five. Advise controller on initial contact that you have information PAPA.

... and created a British English .pho file which I rendered into an audio file with Mbrola:

Next, I processed the British .pho file to produce .pho files for
br1, br2, br3, de1, de2, de3, fr1, fr2, fr3, fr4, fr5. nl2, nl3 and sw1,
and rendered them all into audio with Mbrola.

note: the .mp3 audio files are all approx. 90k)

Note that the br2 and br3 examples exhibit some severe "burbling" effects caused by the lack of suitable phonemes in the target voice database.

Here are the PERL scripts (right-click to save):


(music by Rolf Soja, lyrics by Frank Dostal)

Yes Sir, I can boogie
But I need a certain song;
I can boogie,
Boogie woogie
All night long.
Yes Sir,
I can boogie
If you stay, you cant't go wrong.
I can boogie, Boogie woogie
All night long.


(music by Richard Rogers, lyrics by Oscar Hammerstein)

Let's start at the very beginning,
A very good place to start
When you read you begin with A, B, C,
When you sing you begin with do-re-mi
The first three notes just happen to be
Doe, a deer, a female deer
Ray, a drop of golden sun
Me, a name I call myself
Far, a long, long way to run
Sew, a needle pulling thread
La, a note to follow sew
Tea, a drink with jam and bread,
That will bring us back to Do, oh oh oh


It would be a considerable invention indeed, that of a machine able to mimic our speech, with its sounds and articulations.
I think it's not impossible.


Mbrola was developed by Thierry Dutoit.
It's a speech synthesizer based on the concatenation of diphones.
It takes a list of phonemes as input, together with prosodic information, and produces speech at the sampling frequency of the diphone database


I'm Popeye the sailorman,
I live in a caravan,
I eat all my spinach
and that's how I finish
I'm Popeye the sailorman.


To be, or not to be: that's the question.