This is a prototype of Estonian neural speech synthesis developed by the NLP research group at the University of Tartu. The model that has been trained on Estonian news and literature corpora is currently able to imitate the speech of ten different speakers. Additionally, we also work on text-to-speech synthesis for other Finno-Ugric languages and have two voices available for synthesizing Võro. Although the project is not finished and we still have a lot of work to do, we can already see that a neural speech synthesis model sounds much more natural when compared to previous approaches.
One of the biggest advantages of our speech synthesis model is its natural intonation. Although the audio can sometimes contain artefacts, it sounds a lot less robotic compared to earlier speech synthesis approaches.
Kollastes kummikutes kummitus kummitas kõrges kummutis.
Oled sa kunagi varem midagi nii ilusat kuulnud?
A tricky part for any speech synthesis system is to correctly pronounce numbers, symbols and abbreviations. That is especially relevant for Estonian where the correct form of a word is often dictated by the context and sentence structure. We have put a lot of effort into making sure that our system knows how to correctly pronounce even those elements which have not been written down word for word.
Sooja on 2–5 kraadi ning puhub kagutuul 7 m/s.
Tallinna-Tartu mnt 49. km-l on toimunud ränk liiklusõnnetus.
In addition to this web application, our speech synthesis models can be used via a simple API for free! However, if you would like to collaborate and create an even better system for your use case, let us know!