Non-negative Matrix Factorization with sparseness constraints

Patrik Hoyer

In this project, the goal was to incorporate explicit control of representation sparseness into the framework of NMF. In a sense, we are combining the ideas of ICA and NMF.

Contents

Non-negative Matrix Factorization Adding a sparseness term to the NMF objective NMF with sparseness constraints

Non-negative Matrix Factorization

Non-negative Matrix Factorization (NMF) is a technique which aims to explain a large dataset as a combination of a relatively small number of factors. In spirit, it is very similar to Principal Component Analysis (PCA) and Singular Value Decomposition (SVD), but the crucial difference is the use of non-negativity constraints: The decomposition is purely additive; no cancellations between components are allowed.

NMF was first developed by Paatero (see Paatero and Tapper, 1994) under the name of Positive Matrix Factorization (PMF). Lee and Seung (1999, 2001) brought the technique to the attention of the machine learning community.

To illustrate NMF, consider the following situation. We have data in the form of customer receipts from a particular store. The data is arranged into a matrix specifying how much each customer bought of each product, as shown below.

Suppose we wish to find a decomposition which describes the data using customer 'profiles'. For instance, some customers might buy sugar, flour, and yeast, and are presumably going to do some baking. Others might buy beer, snacks, and balloons and are evidently planning a party. Still others buy baby food and diapers. We want to automatically find such profiles, and deduce which items go together (and hence should perhaps be placed close to each other). Note that this is a form of clustering, but it does not force each product into a single cluster, or each customer as buying just from one profile.

One approach would be to use Principal Component Analysis. However, the problem with PCA is that it describes the data using both additive and subtractive interactions. Such cancellations among components are often counter-intuitive and hence the representation found does not correspond to the kind we seek.

NMF solves the problem by approximating the original matrix as a product of two low-dimensional matrices, as illustrated below.

The key property of NMF is that it allows no negative values in either of the matrices, and hence allows only additive combinations of profiles. In other words, profiles can only include positive quantities of products, and customer affiliations with the profiles cannot be negative.

References

• nmfpack - MATLAB code for NMF as well as its various extensions

Adding a sparseness term to the NMF objective

Several researchers have pointed out problems with NMF. In many cases, if one generates artificial data which seems to correspond to the above kind of decomposition, NMF nevertheless fails to find the original generating profiles.

If the original data is sparse (meaning it contains a large proportion of zeros or almost zeros) then it is possible to get around the above problem by adding a sparseness term to the objective function. In other words, one does not simply consider the quality of the approximation but one also seeks to emphasize the sparseness of one (or both) of the matrices of the decomposition. We have (Hoyer, 2002) shown how to accomplish this, including how to adapt the multiplicative algorithm of Lee and Seung so as to take into account the sparseness term. In addition, we have shown empirically that the resulting algorithm can find the original generating profiles in some cases which NMF cannot solve.

References

• P.O. Hoyer, "Non-negative sparse coding", in Neural Networks for Signal Processing XII (Proc. IEEE Workshop on Neural Networks for Signal Processing 2002, Martigny, Switzerland), pp. 557-565, 2002.

NMF with sparseness constraints

The above approach, of adding a sparseness term to the objective function, has one major drawback. The trade-off between representation accuracy and sparseness is controlled by the relative weighting of the two terms of the objective, and it is impossible a priori to know how sparse the representation becomes as a function of the weighting. In other words, the sparseness of the decomposition can only indirectly be controlled.

To get more explicit control over the sparseness of the decomposition, we have developed a method of optimizing the NMF representation under sparseness constraints. Thus, the user may directly set the desired level of sparseness, and the algorithm finds the best decomposition conforming to this setting.