API Library
Module
General Framework for the data augmented Gaussian Processes
Model Types
AugmentedGaussianProcesses.GP
— Type.Class for Gaussian Processes models
GP(X::AbstractArray{T1,N1}, y::AbstractArray{T2,N2}, kernel::Union{Kernel,AbstractVector{<:Kernel}};
noise::Real=1e-5, opt_noise::Bool=true, verbose::Int=0,
optimizer::Bool=Adam(α=0.01),atfrequency::Int=1,
mean::Union{<:Real,AbstractVector{<:Real},PriorMean}=ZeroMean(),
IndependentPriors::Bool=true,ArrayType::UnionAll=Vector)
Argument list :
Mandatory arguments
X
: input features, should be a matrix N×D where N is the number of observation and D the number of dimensiony
: input labels, can be either a vector of labels for multiclass and single output or a matrix for multi-outputs (note that only one likelihood can be applied)kernel
: covariance function, can be either a single kernel or a collection of kernels for multiclass and multi-outputs models
Keyword arguments
noise
: Initial noise of the modelopt_noise
: Flag for optimizing the noise σ=Σ(y-f)^2/Nmean
: Option for putting a prior meanverbose
: How much does the model print (0:nothing, 1:very basic, 2:medium, 3:everything)optimizer
: Optimizer for kernel hyperparameters (to be selected from GradDescent.jl)IndependentPriors
: Flag for setting independent or shared parameters among latent GPsatfrequency
: Choose how many variational parameters iterations are between hyperparameters optimizationmean
: PriorMean object, check the documentation on itMeanPrior
ArrayType
: Option for using different type of array for storage (allow for GPU usage)
AugmentedGaussianProcesses.VGP
— Type.Class for variational Gaussian Processes models (non-sparse)
VGP(X::AbstractArray{T1,N1},y::AbstractArray{T2,N2},kernel::Union{Kernel,AbstractVector{<:Kernel}},
likelihood::LikelihoodType,inference::InferenceType;
verbose::Int=0,optimizer::Union{Bool,Optimizer,Nothing}=Adam(α=0.01),atfrequency::Integer=1,
mean::Union{<:Real,AbstractVector{<:Real},PriorMean}=ZeroMean(),
IndependentPriors::Bool=true,ArrayType::UnionAll=Vector)
Argument list :
Mandatory arguments
X
: input features, should be a matrix N×D where N is the number of observation and D the number of dimensiony
: input labels, can be either a vector of labels for multiclass and single output or a matrix for multi-outputs (note that only one likelihood can be applied)kernel
: covariance function, can be either a single kernel or a collection of kernels for multiclass and multi-outputs modelslikelihood
: likelihood of the model, currently implemented : Gaussian, Bernoulli (with logistic link), Multiclass (softmax or logistic-softmax) seeLikelihood Types
inference
: inference for the model, can be analytic, numerical or by sampling, check the model documentation to know what is available for your likelihood see theCompatibility Table
Keyword arguments
verbose
: How much does the model print (0:nothing, 1:very basic, 2:medium, 3:everything)optimizer
: Optimizer for kernel hyperparameters (to be selected from GradDescent.jl)atfrequency
: Choose how many variational parameters iterations are between hyperparameters optimizationmean
: PriorMean object, check the documentation on itMeanPrior
IndependentPriors
: Flag for setting independent or shared parameters among latent GPsArrayType
: Option for using different type of array for storage (allow for GPU usage)
AugmentedGaussianProcesses.SVGP
— Type.Class for sparse variational Gaussian Processes
SVGP(X::AbstractArray{T1},y::AbstractArray{T2},kernel::Union{Kernel,AbstractVector{<:Kernel}},
likelihood::LikelihoodType,inference::InferenceType, nInducingPoints::Int;
verbose::Int=0,optimizer::Union{Optimizer,Nothing,Bool}=Adam(α=0.01),atfrequency::Int=1,
mean::Union{<:Real,AbstractVector{<:Real},PriorMean}=ZeroMean(),
IndependentPriors::Bool=true,Zoptimizer::Union{Optimizer,Nothing,Bool}=false,
ArrayType::UnionAll=Vector)
Argument list :
Mandatory arguments
X
: input features, should be a matrix N×D where N is the number of observation and D the number of dimensiony
: input labels, can be either a vector of labels for multiclass and single output or a matrix for multi-outputs (note that only one likelihood can be applied)kernel
: covariance function, can be either a single kernel or a collection of kernels for multiclass and multi-outputs modelslikelihood
: likelihood of the model, currently implemented : Gaussian, Student-T, Laplace, Bernoulli (with logistic link), Bayesian SVM, Multiclass (softmax or logistic-softmax) seeLikelihood
inference
: inference for the model, can be analytic, numerical or by sampling, check the model documentation to know what is available for your likelihood see theCompatibility table
nInducingPoints
: number of inducing points
Optional arguments
verbose
: How much does the model print (0:nothing, 1:very basic, 2:medium, 3:everything)optimizer
: Optimizer for kernel hyperparameters (to be selected from GradDescent.jl)atfrequency
: Choose how many variational parameters iterations are between hyperparameters optimizationmean
: PriorMean object, check the documentation on itMeanPrior
IndependentPriors
: Flag for setting independent or shared parameters among latent GPsoptimizer
: Optimizer for inducing point locations (to be selected from GradDescent.jl)ArrayType
: Option for using different type of array for storage (allow for GPU usage)
Likelihood Types
Gaussian Likelihood
Classical Gaussian noise : $p(y|f) = \mathcal{N}(y|f,\epsilon)$
GaussianLikelihood(ϵ::T=1e-3) #ϵ is the variance
There is no augmentation needed for this likelihood which is already conjugate
Student-T likelihood
Student-t likelihood for regression: $\frac{\Gamma((\nu+1)/2)}{\sqrt{\nu\pi}\sigma\Gamma(\nu/2)}\left(1+(y-f)^2/(\sigma^2\nu)\right)^{(-(\nu+1)/2)}$ see wiki page
StudentTLikelihood(ν::T,σ::Real=one(T)) #ν is the number of degrees of freedom
#σ is the variance for local scale of the data.
For the analytical solution, it is augmented via:
Where $\omega \sim \mathcal{IG}(\frac{\nu}{2},\frac{\nu}{2})$ where $\mathcal{IG}$ is the inverse gamma distribution See paper Robust Gaussian Process Regression with a Student-t Likelihood
Laplace likelihood
Laplace likelihood for regression: $\frac{1}{2\beta}\exp\left(-\frac{|y-f|}{\beta}\right)$ see wiki page
LaplaceLikelihood(β::T=1.0) # Laplace likelihood with scale β
For the analytical solution, it is augmented via:
where $\omega \sim \text{Exp}\left(\omega \mid \frac{1}{2 \beta^2}\right)$, and Exp is the Exponential distribution We approximate ``q(\omega) = \mathcal{GIG}\left(\omega \mid a,b,p\right)
Logistic Likelihood
Bernoulli likelihood with a logistic link for the Bernoulli likelihood $p(y|f) = \sigma(yf) = \frac{1}{1+\exp(-yf)}$, (for more info see : wiki page)
LogisticLikelihood()
For the analytic version the likelihood, it is augmented via:
where $\omega \sim \text{PG}(\omega\mid 1, 0)$, and PG is the Polya-Gamma distribution See paper : Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation
Heteroscedastic Likelihood
Gaussian with heteroscedastic noise given by another gp: $p(y|f,g) = \mathcal{N}(y|f,(\lambda\sigma(g))^{-1})$
HeteroscedasticLikelihood([kernel=RBFKernel(),[priormean=0.0]])
Augmentation is described here (#TODO)
Bayesian SVM
The Bayesian SVM is a Bayesian interpretation of the classical SVM. $p(y|f) \propto \exp\left(2\max(1-yf,0)\right)$
BayesianSVM()
For the analytic version of the likelihood, it is augmented via:
where $\omega\sim 1_{[0,\infty]}$ has an improper prior (his posterior is however has a valid distribution (Generalized Inverse Gaussian)). For reference see this paper
SoftMax Likelihood
Multiclass likelihood with Softmax transformation: $p(y=i|{f_k}) = \exp(f_i)/ \sum_{j=1}\exp(f_j)$
There is no possible augmentation for this likelihood
The Logistic-Softmax likelihood
The multiclass likelihood with a logistic-softmax mapping: : $p(y=i|\{f_k\}) = \sigma(f_i)/ \sum_k \sigma(f_k)$ where σ is the logistic function has the same properties as softmax.
For the analytical version, the likelihood is augmented multiple times to obtain :
Paper with details under submission
Poisson Likelihood
Inference Types
AnalyticVI
Variational Inference solver for conjugate or conditionally conjugate likelihoods (non-gaussian are made conjugate via augmentation) All data is used at each iteration (use AnalyticSVI for Stochastic updates)
AnalyticVI(;ϵ::T=1e-5)
Keywords arguments
- `ϵ::T` : convergence criteria
AugmentedGaussianProcesses.AnalyticSVI
— Function.AnalyticSVI Stochastic Variational Inference solver for conjugate or conditionally conjugate likelihoods (non-gaussian are made conjugate via augmentation)
AnalyticSVI(nMinibatch::Integer;ϵ::T=1e-5,optimizer::Optimizer=InverseDecay())
- `nMinibatch::Integer` : Number of samples per mini-batches
Keywords arguments
- `ϵ::T` : convergence criteria
- `optimizer::Optimizer` : Optimizer used for the variational updates. Should be an Optimizer object from the [GradDescent.jl](https://github.com/jacobcvt12/GradDescent.jl) package. Default is `InverseDecay()` (ρ=(τ+iter)^-κ)
GibbsSampling
Draw samples from the true posterior via Gibbs Sampling.
GibbsSampling(;ϵ::T=1e-5,nBurnin::Int=100,samplefrequency::Int=10)
Keywords arguments
- `ϵ::T` : convergence criteria
- `nBurnin::Int` : Number of samples discarded before starting to save samples
- `samplefrequency::Int` : Frequency of sampling
QuadratureVI
Variational Inference solver by approximating gradients via numerical integration via Quadrature
QuadratureVI(ϵ::T=1e-5,nGaussHermite::Integer=20,optimizer::Optimizer=Momentum(η=0.0001))
Keyword arguments
- `ϵ::T` : convergence criteria
- `nGaussHermite::Int` : Number of points for the integral estimation
- `optimizer::Optimizer` : Optimizer used for the variational updates. Should be an Optimizer object from the [GradDescent.jl](https://github.com/jacobcvt12/GradDescent.jl) package. Default is `Momentum(η=0.0001)`
AugmentedGaussianProcesses.QuadratureSVI
— Function.QuadratureSVI
Stochastic Variational Inference solver by approximating gradients via numerical integration via Quadrature
QuadratureSVI(nMinibatch::Integer;ϵ::T=1e-5,nGaussHermite::Integer=20,optimizer::Optimizer=Adam(α=0.1))
-`nMinibatch::Integer` : Number of samples per mini-batches
Keyword arguments
- `ϵ::T` : convergence criteria, which can be user defined
- `nGaussHermite::Int` : Number of points for the integral estimation (for the QuadratureVI)
- `optimizer::Optimizer` : Optimizer used for the variational updates. Should be an Optimizer object from the [GradDescent.jl](https://github.com/jacobcvt12/GradDescent.jl) package. Default is `Momentum(η=0.001)`
MCIntegrationVI(;ϵ::T=1e-5,nMC::Integer=1000,optimizer::Optimizer=Adam(α=0.1))
Constructor for Variational Inference via MC Integration approximation.
Keyword arguments
- `ϵ::T` : convergence criteria, which can be user defined
- `nMC::Int` : Number of samples per data point for the integral evaluation
- `optimizer::Optimizer` : Optimizer used for the variational updates. Should be an Optimizer object from the [GradDescent.jl]() package. Default is `Adam()`
AugmentedGaussianProcesses.MCIntegrationSVI
— Function.MCIntegrationSVI(;ϵ::T=1e-5,nMC::Integer=1000,optimizer::Optimizer=Adam(α=0.1))
Constructor for Stochastic Variational Inference via MC integration approximation.
Argument
-`nMinibatch::Integer` : Number of samples per mini-batches
Keyword arguments
- `ϵ::T` : convergence criteria, which can be user defined
- `nMC::Int` : Number of samples per data point for the integral evaluation
- `optimizer::Optimizer` : Optimizer used for the variational updates. Should be an Optimizer object from the [GradDescent.jl]() package. Default is `Adam()`
Functions and methods
AugmentedGaussianProcesses.train!
— Function.train!(model::AbstractGP;iterations::Integer=100,callback=0,conv_function=0)
Function to train the given GP model
.
Keyword Arguments
there are options to change the number of max iterations,
iterations::Int
: Number of iterations (not necessarily epochs!)for trainingcallback::Function
: Callback function called at every iteration. Should be of typefunction(model,iter) ... end
conv_function::Function
: Convergence function to be called every iteration, should return a scalar and take the same arguments ascallback
AugmentedGaussianProcesses.predict_f
— Function.Compute the mean of the predicted latent distribution of f
on X_test
for the variational GP model
Return also the variance if covf=true
and the full covariance if fullcov=true
Compute the mean of the predicted latent distribution of f on X_test
for a sparse GP model
Return also the variance if covf=true
and the full covariance if fullcov=true
AugmentedGaussianProcesses.predict_y
— Function.predict_y(model::AbstractGP{T,<:RegressionLikelihood},X_test::AbstractMatrix)
Return the predictive mean of X_test
predict_y(model::AbstractGP{T,<:ClassificationLikelihood},X_test::AbstractMatrix)
Return the predicted most probable sign of X_test
predict_y(model::AbstractGP{T,<:MultiClassLikelihood},X_test::AbstractMatrix)
Return the predicted most probable class of X_test
predict_y(model::AbstractGP{T,<:EventLikelihood},X_test::AbstractMatrix)
Return the expected number of events for the locations X_test
AugmentedGaussianProcesses.proba_y
— Function.proba_y(model::AbstractGP,X_test::AbstractMatrix)
Return the probability distribution p(ytest|model,Xtest) :
- Tuple of vectors of mean and variance for regression
- Vector of probabilities of y_test = 1 for binary classification
- Dataframe with columns and probability per class for multi-class classification
Kernels
Radial Basis Function Kernel also called RBF or SE(Squared Exponential)
Matern Kernel
Kernel functions
Create the covariance matrix between the matrix X1 and X2 with the covariance function kernel
Compute the covariance matrix of the matrix X, optionally only compute the diagonal terms
Compute the covariance matrix between the matrix X1 and X2 with the covariance function kernel
in preallocated matrix K
Compute the covariance matrix of the matrix X in preallocated matrix K, optionally only compute the diagonal terms
Return the variance of the kernel
Return the lengthscale of the IsoKernel
Return the lengthscales of the ARD Kernel
Prior Means
ZeroMean
ZeroMean()
Construct a mean prior set to 0 and cannot be changed.
ConstantMean
ConstantMean(c::T=1.0;opt::Optimizer=Adam(α=0.01))
Construct a prior mean with constant c
Optionally set an optimizer opt
(Adam(α=0.01)
by default)
EmpiricalMean julia` function EmpiricalMean(c::V=1.0;opt::Optimizer=Adam(α=0.01)) where {V<:AbstractVector{<:Real}}
Construct a constant mean with values c
Optionally give an optimizer opt
(Adam(α=0.01)
by default)
Index
AugmentedGaussianProcesses.AnalyticVI
AugmentedGaussianProcesses.BayesianSVM
AugmentedGaussianProcesses.ConstantMean
AugmentedGaussianProcesses.EmpiricalMean
AugmentedGaussianProcesses.GP
AugmentedGaussianProcesses.GaussianLikelihood
AugmentedGaussianProcesses.GibbsSampling
AugmentedGaussianProcesses.HeteroscedasticLikelihood
AugmentedGaussianProcesses.KernelModule.MaternKernel
AugmentedGaussianProcesses.KernelModule.RBFKernel
AugmentedGaussianProcesses.LaplaceLikelihood
AugmentedGaussianProcesses.LogisticLikelihood
AugmentedGaussianProcesses.LogisticSoftMaxLikelihood
AugmentedGaussianProcesses.MCIntegrationVI
AugmentedGaussianProcesses.PoissonLikelihood
AugmentedGaussianProcesses.QuadratureVI
AugmentedGaussianProcesses.SVGP
AugmentedGaussianProcesses.SoftMaxLikelihood
AugmentedGaussianProcesses.StudentTLikelihood
AugmentedGaussianProcesses.VGP
AugmentedGaussianProcesses.ZeroMean
AugmentedGaussianProcesses.AnalyticSVI
AugmentedGaussianProcesses.KernelModule.getlengthscales
AugmentedGaussianProcesses.KernelModule.getvariance
AugmentedGaussianProcesses.KernelModule.kernelmatrix
AugmentedGaussianProcesses.KernelModule.kernelmatrix!
AugmentedGaussianProcesses.MCIntegrationSVI
AugmentedGaussianProcesses.QuadratureSVI
AugmentedGaussianProcesses.predict_f
AugmentedGaussianProcesses.predict_y
AugmentedGaussianProcesses.proba_y
AugmentedGaussianProcesses.train!