
There are several possible architectures for switching SSMs. Figure 4.3 shows some of the most basic ones [43]. The first subfigure corresponds to the case where the function and possibly the model for noise in Equation (4.13) are different for different states. In the second subfigure, the function and the noise depend on the switching variable. Some combination of these two approaches is of course also possible. The third subfigure shows an interesting architecture proposed by Ghahramani and Hinton [18] in which there are several completely separate SSMs and the switching variable chooses between them. Their model is especially interesting as it uses ensemble learning to infer the model parameters.
One of the problems with switching SSMs is that the exact Estep of the EM algorithm is intractable, even if the individual continuous hidden states are Gaussian. Assuming the HMM has states, the posterior of a single state variable will be a mixture of Gaussians, one for each HMM state . When this is propagated forward according to the dynamical model, the mixture grows exponentially as the number of possible HMM state sequences increases. Finally, when the full observation sequence of length is taken into account, the posterior of each will be a mixture of Gaussians.
Ensemble learning is a very useful method in developing a tractable algorithm for the problem, although there are other heuristic methods for the same purpose. The other methods typically use some greedy procedure in collapsing the distribution and this may cause inaccuracies. This is not a problem with ensemble learning  it considers the whole sequence and minimises the KullbackLeibler divergence, which in this case has no local minima.