Discussion

In this work, a Bayesian switching nonlinear state-space model (switching NSSM) and a learning algorithm for its parameters were developed. The switching model combines two dynamical models, a hidden Markov model (HMM) and a nonlinear state-space model (NSSM). The HMM models long-term behaviour of the data and controls the NSSM which describes the short-term dynamics of the data. In order to be used in practice, the switching NSSM is, like any other model, needs an efficient method to learn its parameters. In this work, the Bayesian approximation method called ensemble learning was used to derive a learning algorithm for the model.

The requirements of the learning algorithm set some limitations for the structure of the model. The learning algorithm for the NSSM which was used as the starting point for this work was computationally intensive. Therefore the additions needed for the switching model had to be designed very carefully to avoid making the model computationally intractable. Most existing switching state-space model structures use entirely different dynamical models for each state of the HMM. Such an approach would have resulted in a more powerful model, but the computational burden would have been too great to be practical. Therefore the developed switching model uses the same NSSM for all the HMM states. The HMM is only used to model the prediction errors of the NSSM.

The ensemble learning based learning algorithm seems well suited for the switching NSSM. This kind of models usually suffer from the problem that the exact posterior of the hidden states is an exponentially growing mixture of Gaussian distributions. The ensemble learning approach solves this problem elegantly by finding the best approximate posterior consisting only of a single Gaussian distribution. It would also be possible to approximate the posterior with a mixture of a few Gaussian distributions instead of the single Gaussian. This would, however, increase the computational burden quite a lot while achieving little gain.

Despite the suboptimal model structure, the switching model performs remarkably well in the experiments. It outperforms all the other tested models in the speech modelling problem by a large margin. Additionally, the switching model yields a segmentation of the data to different discrete dynamical states. This allows using the model for many different segmentation tasks.

For a segmentation problem, the closest competitor of the switching NSSM is the plain HMM. The speech modelling experiment shows that the switching NSSM is significantly better in modelling the speech data than the HMM. This means that the switching model can get more out of the same data. The greatest problem of the switching model is the computational cost. It is approximately two orders of magnitude slower to train than the HMM. The difference in actually using the fully trained models should, however, be smaller. Much of the difference in training times is caused by the fact that the weights of the MLP networks take very many iterations of the training algorithm to converge. When the system is in operation and the MLPs are not trained, the other parts needed for recognition purposes converge much more quickly. The usage of the switching model in such a recognition system has not been thoroughly studied. Therefore it might be possible to find ways of optimising the usage of the model to make it comparable with the HMM.

In the segmentation experiment, the switching model learnt the desired segmentation of an annotated data set of individual words of speech to different phonemes. However, the initial experiments on using the model for recognition, i.e. finding the segmentation without the annotation, are not as promising. The poor recognition performance is probably due to the fact that the HMM part of the model only uses the prediction error of the continuous NSSM. When the state of the dynamics changes, the prediction error grows. Thus it is easy for the HMM to see that something is changing but not how it is changing. The HMM would have to have more influence on the description of the dynamics of the data to know which state the system is going to.

The design of the model was motivated by computational efficiency and the resulting algorithm seems successful in that sense. The learning algorithm for the switching model is only about 25 % slower than the corresponding algorithm for the NSSM. This could probably still be improved as Matlab is far from an optimal programming environment for HMM calculations. The forward and backward calculations of the MLP networks, which require evaluating the Jacobian matrix of the network at each input point, constitute the slowest part of the NSSM training. Those parts remain unchanged in the switching model and still take most of the time.

There are at least two important lines of future work: giving the model more expressive power by making better use of the HMM part and optimising the speed of the NSSM part.

Improving the model structure to give the HMM a larger role would be important to make the model work in recognition problems. The HMM should be given control over at least some of the parameters of the dynamical mapping. It is, however, difficult to do this without increasing the computational burden of using the model. Time usually helps in this respect as the computers become faster. Joining the two component models together more tightly would also make the learning algorithm even more complicated. Nevertheless, it might be possible to find a better compromise between the computational burden and the power of the model.

Another way to help with the essentially same problem would be optimising the speed of the NSSM. The present algorithm is very good in modelling the data but it is also rather slow. Using a different kind of a structure for the model might help with the speed problem. This would, however, probably require a completely new architecture for the model.

All in all, the nonlinear switching state-space models form an interesting field of study. At present, the algorithms seem computationally too intensive for most practical uses, but this is likely to change in the future.