The most important NSSM learning algorithm for this work is the one by Dr. Harri Valpola [58,60]. It uses MLP networks to model the nonlinearities and ensemble learning to optimise the model. It is discussed in detail in Sections 5.2 and 6.2 and it will therefore be skipped for now.

Even though it is not exactly a learning algorithm for complete NSSMs,
the *extended Kalman filter* (EKF) is an important building block
for many such algorithms. The EKF extends standard Kalman filtering
for nonlinear models. The nonlinear functions
and
of Equation (4.13) must be known in advance. The
algorithm works by linearising the functions about the estimated
posterior mean. The posterior probability of the state variables is
evaluated with a forward-backward type iteration by assuming the
posterior of each sample to be
Gaussian [42,28,32].

Briegel and Tresp [7] present an NSSM that uses MLP networks as the model of the nonlinearities. The learning algorithm is based on Monte-Carlo generalised EM algorithm, i.e. an EM algorithm with stochastic estimates for the conditional posteriors at different steps. The Monte-Carlo E-step is further optimised by generating the samples of the hidden states from an approximate Gaussian distribution instead of the true posterior. The approximate posterior is found using either the EKF or an alternative similar method.

Roweis and Ghahramani [51,19] use RBF networks to model the nonlinearities. They use standard EM algorithm with EKF for the approximate E-step. The parameterisation of the RBF network allows an exact M-step for some of the parameters. All the parameters of the network can not, however, be adapted this way.