Most data analysis methods are based on developing a model that could be used to recreate the studied data set. Speech recognition systems, for example, are often built around a model that could in principle be used as a speech generator. The success of the recogniser depends heavily on how well the generator can generate realistic speech data.
The speech generators used by most modern speech recognition systems are based on the hidden Markov model (HMM). The HMM is a discrete model. It has a finite number of different internal states that produce different kind of output. Typically there are a couple of states for each phoneme or a pair of phonemes. The whole dynamical process of producing speech is thus modelled by discrete transitions between the states corresponding to the different phonemes.
The model of human speech implied by the HMM is not a very realistic one. The dynamics of the mouth and the vocal cord used to produce the speech are continuous. The discrete model is only a very crude approximation of the ``true'' model. A more realistic approach would be to model the data with a continuous model. The process of producing speech is clearly nonlinear and this should be reflected by its model. A good candidate for the task is the nonlinear state-space model (NSSM). The NSSM can be described as the continuous counterpart of the HMM. The problem with models like the NSSM is that they concentrate on modelling the short-term structure of the data. Therefore they are not as such very well suited for speech recognition.
There are speech recognition systems that try to get the best of the both worlds by combining the two different kinds of models into one hybrid structure. Such systems have performed well in several difficult real world problems but they are often rather specialised. The training algorithms for such models are usually based on some heuristic measures rather than on generally accepted mathematical principles.
In this work, a hybrid model structure that combines the HMM with another dynamical model, the continuous NSSM, is studied. The resulting model is called the switching nonlinear state-space model (switching NSSM). The resulting hybrid model has the power of a continuous NSSM to model the short-term dynamics of the data. However, above the NSSM there is still the familiar HMM to divide the data to different discrete states corresponding, for example, to the different phonemes.