User Guide

There are 3 main actions needed to train and use the different models:

Initialization
Training
Prediction

Initialization

Possible models

There are currently 3 possible Gaussian Process models:

GP corresponds to the original GP regression model, it is necessarily with a Gaussian likelihood.

    GP(X_train,y_train,kernel)

VGP is a variational GP model: a multivariate Gaussian is approximating the true posterior. There is no inducing points augmentation involved. Therefore it is well suited for small datasets (~10^3 samples)

    VGP(X_train,y_train,kernel,likelihood,inference)

SVGP is a variational GP model augmented with inducing points. The optimization is done on those points, allowing for stochastic updates and large scalability. The counterpart can be a slightly lower accuracy and the need to select the number and the location of the inducing points (however this is a problem currently worked on).

    SVGP(X_train,y_train,kernel,likelihood,inference,n_inducingpoints)

OnlineSVGP is an online variational GP model. It is based on the streaming method of Bui 17', it supports all likelihoods, even with multiple latents.

    OnlineSVGP(kernel, likelihood, inference, n_latent, inducing_points)

MOVGP is a multi output variational GP model.
MOSVGP is a multi output sparse variational GP model, based on Moreno-Muñoz 18'.
VStP is a variational Student-T model where the prior is a multivariate Student-T distribution with scale K, mean μ₀ and degrees of freedom ν. The inference is done automatically by augmenting the prior as a scale mixture of inverse gamma

    VStP(X_train,y_train,kernel,likelihood,inference,ν)

Likelihood

GP can only have a Gaussian likelihood, VGP and SVGP have more choices. Here are the ones currently implemented:

Regression

For regression, four likelihoods are available :

The classical GaussianLikelihood, for Gaussian noise
The StudentTLikelihood, assuming noise from a Student-T distribution (more robust to ouliers)
The LaplaceLikelihood, with noise from a Laplace distribution.
The HeteroscedasticLikelihood, (in development) where the noise is a function of the input: $Var(X) = λσ^{-1}(g(X))$ where g(X) is an additional Gaussian Process and σ is the logistic function.

Classification

For classification one can select among

The LogisticLikelihood : a Bernoulli likelihood with a logistic link
The BayesianSVM likelihood based on the frequentist SVM, equivalent to use a hinge loss.

Event Likelihoods

For likelihoods such as Poisson or Negative Binomial, we approximate a parameter by σ(f). Two Likelihoods are implemented :

The PoissonLikelihood : A discrete Poisson process (one parameter per point) with the scale parameter defined as λσ(f)
The NegBinomialLikelihood : The Negative Binomial likelihood where r is fixed and we define the success probability p as σ(f)

Multi-class classification

There is two available likelihoods for multi-class classification:

The SoftMaxLikelihood, the most common approach. However no analytical solving is possible
The LogisticSoftMaxLikelihood, a modified softmax where the exponential function is replaced by the logistic function. It allows to get a fully conjugate model, Corresponding paper

More options

There is the project to get distributions from Distributions.jl to work directly as likelihoods.

Inference

Inference can be done in various ways.

AnalyticVI : Variational Inference with closed-form updates. For non-Gaussian likelihoods, this relies on augmented version of the likelihoods. For using Stochastic Variational Inference, one can use AnalyticSVI with the size of the mini-batch as an argument
GibbsSampling : Gibbs Sampling of the true posterior, this also rely on an augmented version of the likelihoods, this is only valid for the VGP model at the moment.

The two next methods rely on numerical approximation of an integral and I therefore recommend using the classical Descent approach as it will use anyway the natural gradient updates. ADAM seem to give random results.

QuadratureVI : Variational Inference with gradients computed by estimating the expected log-likelihood via quadrature.
MCIntegrationVI : Variational Inference with gradients computed by estimating the expected log-likelihood via Monte Carlo Integration

We also use AdvancedHMC.jl to provide a HMC algorithm, although generally the Gibbs sampling is preferable when available.

Compatibility table

Not all inference are implemented/valid for all likelihoods, here is the compatibility table between them.

Likelihood/Inference	AnalyticVI	GibbsSampling	QuadratureVI	MCIntegrationVI
GaussianLikelihood	✔ (Analytic)	✖	✖	✖
StudentTLikelihood	✔	✔	✔	✖
LaplaceLikelihood	✔	✔	✔	✖
HeteroscedasticLikelihood	✔	(dev)	(dev)	✖
LogisticLikelihood	✔	✔	✔	✖
BayesianSVM	✔	(dev)	✖	✖
LogisticSoftMaxLikelihood	✔	✔	✖	(dev)
SoftMaxLikelihood	✖	✖	✖	✔
Poisson	✔	✔	✖	✖
NegBinomialLikelihood	✔	✔	✖	✖

(dev) means that the feature is possible and may be developped and tested but is not available yet. All contributions or requests are very welcome!

Additional Parameters

Hyperparameter optimization

One can optimize the kernel hyperparameters as well as the inducing points location by maximizing the ELBO. All derivations are already hand-coded (no AD needed). One can select the optimization scheme via :

The optimiser keyword, can be nothing or false for no optimization or can be an optimiser from the Flux.jl library, see list here Optimisers.
The Zoptimiser keyword, similar to optimiser it is used for optimizing the inducing points locations, it is by default set to nothing (no optimization)

PriorMean

The mean keyword allows you to add different types of prior means:

ZeroMean, a constant mean that cannot be optimized
ConstantMean, a constant mean that can be optimized
EmpiricalMean, a vector mean with a different value for each point
AffineMean, μ₀ is given by X*w + b

Training

Training is straightforward after initializing the model by running :

train!(model, 100; callback = callbackfunction)

Where the callback option is for running a function at every iteration. callbackfunction should be defined as`

function callbackfunction(model,iter)
    "do things here"...
end

Prediction

Once the model has been trained it is finally possible to compute predictions. There always three possibilities :

predict_f(model, X_test, covf=true, fullcov=false) : Compute the parameters (mean and covariance) of the latent normal distributions of each test points. If covf=false return only the mean, if fullcov=true return a covariance matrix instead of only the diagonal
predict_y(model,X_test) : Compute the point estimate of the predictive likelihood for regression or the label of the most likely class for classification.
proba_y(model,X_test) : Return the mean with the variance of eahc point for regression or the predictive likelihood to obtain the class y=1 for classification.

Miscellaneous

🚧 In construction – Should be developed in the near future 🚧

Saving/Loading models

Once a model has been trained it is possible to save its state in a file by using save_trained_model(filename,model), a partial version of the file will be save in filename.

It is then possible to reload this file by using load_trained_model(filename). !!!However note that it will not be possible to train the model further!!! This function is only meant to do further predictions.

🚧 Pre-made callback functions 🚧

There is one (for now) premade function to return a a MVHistory object and callback function for the training of binary classification problems. The callback will store the ELBO and the variational parameters at every iterations included in iterpoints If `Xtestandy_test` are provided it will also store the test accuracy and the mean and median test loglikelihood