Continuing the learning process with the old model but new data requires initial estimates for the new hidden states. If the new data is a direct continuation of the old, the predictions of the old states provide a reasonable initial estimate for the new ones and the algorithm can continue the adaptation from there.
If the new data forms an entirely separate sequence, the problem is more difficult. Knowing the model, we can still do much better than starting at random or using the same initialisation as in the very beginning.
One way to find the estimates is to use an auxiliary MLP network to model the inverse of the observation mapping . This MLP can be trained using standard supervised back-propagation with the estimated means of and as training set. Their roles are of course inverted so that are the inputs and the outputs. The auxiliary MLP cannot give perfect estimates for the states , but they can usually be adapted very quickly by using the standard learning algorithm to update only the hidden states.