The aim of this thesis has been to develop a Bayesian formulation for the switching NSSM and a learning algorithm for the parameters of the model. The term learning algorithm in this context means a procedure for optimising the parameters of the model to best describe the given data. The learning algorithm is based on the approximation method called ensemble learning. It provides a principled approach for global optimisation of the performance of the model. Similar switching models exist but they use only linear state-space models (SSMs).
In practice the switching model has been implemented by extending the existing NSSM model and learning algorithm developed by Dr. Harri Valpola [58,60]. The performance of the developed model has been verified by applying it to a data set of Finnish speech in two different experiments. In the first experiment, the switching NSSM has been compared with plain HMM, standard NSSM without switching and a static nonlinear factor analysis model which completely ignores the temporal structure of the data. In the second experiment, patches of speech with known annotation, i.e. the sequence of phonemes in the word, have been used. The correct segmentation of the word to the individual phonemes was, however, not known, and must be learnt by the model.
Even though the development of the model has been motivated here by speech recognition examples, the purpose of this thesis has not been to develop a working speech recognition system. Such a system could probably be developed by extending the work presented here.