Q & A
What is Burcas?
Burcas is a modest attempt at concatenated singing synthesis for Swedish. The system employs the MBROLA speech generator and a prerecorded diphone data base.
Burcas is described in some detail (phonetic rather than implementational) in Uneson 2003 and Uneson 2002—see References.
What is the input and the output of Burcas?
Burcas takes as input a text file for lyrics and a midi file (possibly containing multiple voices) for time and pitch decisions. The output of Burcas is in .pho format, that is, control parameters (phoneme, duration, frequency) for the MBROLA speech generator. The MBROLA generator, in turn, outputs audio files in standard format, one for each part (probably to be mixed, edited and panned in some appropriate application).
The MIDI parser makes use of Günther Nagler's free midi-to-text converter ( http://www2.iicm.edu/Cpub#freemidi )What data base is used in Burcas?
Currently, the data base used ("Ofelia", female speaker, southern Swedish dialect; recorded by Adina Svensson for her master's thesis) is produced from -- and primarily for -- spoken language. The use of a sung data base would certainly be preferable. However, the diphone method has certain inherent limitations and the considerable work involved in recording such a data base might possibly be better invested elsewhere. See the thesis for further discussions.How does it sound?
Well, a few samples are available on this page. Generally spoken, consonant transitions sound fine, whereas extended vowels and melismas do not. As a rule, the overall impression of naturalness is greatly enhanced by the presence of other voices of instruments.
It should perhaps be stressed that Burcas is a zero-budget, one-man project; some restrictions of the scope are inevitable. Most notably, timbre is *not* an issue. In fact, from Burcas' point of view, the MBROLA generator is considered a black box with the three control parameters phoneme, duration, frequency-- anything that cannot be controlled by manipulating these cannot be controlled at all. (Kindly spare me comparisons with Lyricos and the likes -- see next question.)
Has this been done before?
Occasional attempts have been made at non-concatenative approaches to singing synthesis. Mention may be made of the SPASM system by Perry Cook et al, which builds on a graphically interfaced articulatory vocal tract model; the general CHANT formant waveform synthesis by Xavier Rodet et al, and formant synthesis methods explored by Johan Sundberg at the Royal Institute of Technology, Stockholm (e.g.the MUSSE system).
Concatenative singing synthesis is less explored (and not at all for Swedish). However, the few existing projects do show encouraging results. Michael Macon lead the Lyricos project at the Georgia Institute of Technology. Lyricos is closed, but it has a descender called Flinger , at CSLU, Oregon Graduate Institute. Furthermore, Yoram Meron has developed a trainable system for high-quality singing synthesis at the university of Tokyo, described in his PhD thesis.
All of these take a much more ambitious approach than Burcas. They employ dynamic unit selection from prerecorded corpora, following the leading trend in speech synthesis during the last few years. They also provide sophisticated DSP algorithms for the interpretation of MIDI parameters like vibrato and velocity.
What would you use singing synthesis for?
Synthesis of singing may be of use to composers and arrangers of vocal music, with synthetic voices filling the need of stand-ins for real singers at draft stage. It may also serve as a musicological and phonetic research tool for studying temporal aspects of articulation in singing, for instance to produce stimuli for perception tests.
A far more ambitious aim is that of 'high quality' synthesis of singing, wherein the synthesized sound is not considered a working tool, but rather a piece of art in its own right. Systems such as Flinger and that of Meron imply that this is not entirely out of reach. Burcas does not and does not claim to do so.
Can I have the source code?
No. At least not in its present, alphaish state (which, unfortunately, may last indefinitely). Burcas is a coursework in phonetics, rather than, for instance, computer science. Modularity, documentation, maintainability, exception handling etc have unfortunately been paid very little attention, and I have currently no time to remedy that. This may or may not change in the future. However, it is likely that a possible predecessor will be rewritten from scratch.
Is Burcas an acronym? What does it stand for?
Well, it is probably an acronym, only I can't remember for what. Have to think about it.