Malay statistical parametric speech synthesis with intelligibility improvement using artificial intelligence

Abstract
Speech synthesis is important nowadays and could be a great aid in various applications. So it is important to build a simple, reliable, light-weight, ease of use speech synthesizer. However, conventional speech synthesizers require tedious human efforts to prepare high quality recorded database, and the intelligibility of synthetic speech may decrease due to the appearance of polyphone (character with more than 1 pronunciation) because the speech synthesizer may not contain the definition of the polyphones. Moreover, the ready speech synthesizers in market are mostly built in Unit Selection method, which is large in database size and relying on Malay linguist knowledge. In this study, statistical parametric speech synthesis method has been adopted using lab speech and free speech data harvested online. The intelligibility improvement has been achieved using Active Learning and Feedforward Neural Network with Back-Propagation. The amount of training data used remained the same throughout this study. The result was evaluated using perception test. The listening test showed that the intelligibility of synthetic speech has been improved about 20%- 30% using the artificial intelligence technique. Volunteers were invited to take part in Active Learning experiment. The result showed no controversy between the result done by volunteers and the correct answer. In conclusion, a light-weight Malay speech synthesizer has been created without relying on Malay linguist knowledge. Using free source as training data can ease the human effort in preparing training database and using artificial intelligence technique can improve the intelligibility of synthetic speech under the same amount of training data used
Description
Thesis (PhD. (Biomedical Engineering))
Keywords
Speech synthesis, Speech processing systems
Citation
NA