Gaussian Process Classification

Jun 27, 2020

The second group project I worked on at COMPASS mainly involved learning how Gaussian process classification worked, as it is a complicated procedure, and not as straight forward as Gaussian process regression.

Our work involved a number of aspects that have improved on Gaussian process classification in recent literatures:

Pseudo-Marginal Likelihood: An importance sampling procedure to approximate the marginal likelihood in MCMC sampling.
Subset Selection: An entropy based measure that chooses a subset of a full dataset that maximises information across the dataset, referred to as the Information Vector Machine (IVM).
Laplace Approximation: An approximation of the posterior of the latent variables.

Together, these approximations makes Gaussian process classification feasibile. Without approximations such as these, the procedure would have an incredible runtime.

Finally, we compared the results on an e-mail spam dataset, and had a higher prediction accuracy than a JAGS implementation of logistic regression. We combined our code, written in Rcpp, into an R package, available here.

Daniel Williams

CDT Student

I have a PhD in statistics/machine learning/data science/AI (whatever you would like to call it) from the University of Bristol, under the COMPASS CDT. I previously studied a masters in mathematics at the University of Exeter. My research was primarily on truncated density estimation and unnormalised models. But I am also interested in AI more generally, including all the learnings, Machine, Deep and Reinforcement (as well as some others!).