Spring 2017

History of Previous Talks in Spring 2017

Learning Data Representation by Large-Scale Neighbor Embedding

Date: June 12, 2017

Abstract: Machine learning, the state-of-the-art data science, has been increasingly influencing our life. Encoding data in a suitable vector space is the fundamental starting point for machine learning. A good vector coding should respect the relations among the data items. However, conventional methods that preserve pairwise or higher order relationship are very slow and consequently they can handle only small-scale data sets. We have been developing a family of unsupervised methods called large-scale Neighbor Embedding (NE) which substantially accelerate the vector coding. Our method can thus learn low-dimensional vector representation for mega-scale data according to their neighborhoods in the original space. With our efficient algorithms and a wealth of neighborhood information, Neighbor Embedding significantly outperforms small-scale NE and many other existing approaches for learning data representation. Besides generic feature extraction, our work also delivers two important tools as special cases of Neighbor Embedding for data visualization and cluster analysis, which scales up these applications by an order of magnitude and enables the current-sized visualization and clustering for interactive use. Because neighborhood information is naturally and massively available in many areas, our method has wide applications as a critical component in scientific research, next-generation DNA sequence analysis, natural language processing, educational cloud, financial data analysis, market studies, etc.

Speaker: Zhirong Yang

Affiliation: Department of Computer Science, Aalto University

Place of Seminar: Aalto University

Machine Learning Coffee Seminars

History of Previous Talks in Spring 2017

Learning Data Representation by Large-Scale Neighbor Embedding

Statistical Ecology with Gaussian Processes

Graphics Meets Vision Meets Machine Learning

On Priors and Bayesian Variable Selection in Large p, Small n Regression

Machine Learning for Image-Based Localization

Empirical Parameterization of Exploratory Search Systems Based on Bandit Algorithms

Nintendo Wii Fit-Based Balance Testing to Detect Sleep Deprivation: Approximate Bayesian Computation Approach

Learning With Spectral Kernels

Multilayer Networks

Learning to Rank: Applications to Bioinformatics

Future AI: Autonomous machine learning and beyond

Small Data AUC Estimation of Machine Learning Methods: Pitfalls and Remedies

Differentially Private Bayesian Learning

Inverse Modeling in Behavioral Sciences and HCI

Compressed Sensing for Semi-Supervised Learning From Big Data Over Networks

Variable Selection From Summary Statistics

Towards Perfect Density Estimation

Likelihood-free Inference and Predictions for Computational Epidemiology

Metabolite Identification Through Machine Learning

Probabilistic Programming: Bayesian Modeling Made Easy

Unsupervised Machine Learning for Matrix Decomposition