Machine Learning Coffee Seminar – Tapio Pahikkala – A Link between Coding Theory and Cross-Validation with Applications
23.3.2020 @ 09:00 - 10:00
The session will be organized remotely via zoom: https://aalto.zoom.us/j/9776440568
Abstract: We study the combinatorics of cross-validation based AUC estimation under the null hypothesis that the binary class labels are exchangeable, that is, the data are randomly assigned into two classes. In particular, we study how the estimators based on leave-pair-out cross-validation (LPOCV), in which every possible pair of data with different class labels is held out from the training set at a time, behave under the null without any prior assumptions of the learning algorithm or the data. It is observed that the maximal number of different assignments of w nonzero and n-w zero class labels on the data, for which any fixed learning algorithm can achieve zero LPOCV error, is equivalent with the maximal size of a constant weight error-correcting code of length n, weight w and Hamming distance four between code words. We then introduce the concept of a light constant weight code and show similar results for bounded LPOCV errors. These results enable the design of new LPOCV based statistical tests for the learning algorithms ability to distinguish two classes from each other that are analogous to the classical Wilcoxon-Mann-Whitney U test for fixed functions.
Speaker: Tapio Pahikkala
Affiliation: Associate professor at the Department of Future Technologies, University of Turku
Machine Learning Coffee Seminar is organized weekly by the Finnish Center for Artificial Intelligence FCAI. The location alternates between Aalto University and the University of Helsinki.