API

API Library


Module

General Framework for the data augmented Gaussian Processes

source

Model Types

Class for Gaussian Processes models

GP(X::AbstractArray{T}, y::AbstractArray, kernel::Kernel;
    noise::Real=1e-5, opt_noise::Bool=true, verbose::Int=0,
    optimizer::Bool=Adam(α=0.01),atfrequency::Int=1,
    mean::Union{<:Real,AbstractVector{<:Real},PriorMean}=ZeroMean(),
    IndependentPriors::Bool=true,ArrayType::UnionAll=Vector)

Argument list :

Mandatory arguments

  • X : input features, should be a matrix N×D where N is the number of observation and D the number of dimension
  • y : input labels, can be either a vector of labels for multiclass and single output or a matrix for multi-outputs (note that only one likelihood can be applied)
  • kernel : covariance function, can be either a single kernel or a collection of kernels for multiclass and multi-outputs models

Keyword arguments

  • noise : Initial noise of the model
  • opt_noise : Flag for optimizing the noise σ=Σ(y-f)^2/N
  • mean : Option for putting a prior mean
  • verbose : How much does the model print (0:nothing, 1:very basic, 2:medium, 3:everything)
  • optimizer : Optimizer for kernel hyperparameters (to be selected from GradDescent.jl) or set it to false to keep hyperparameters fixed
  • IndependentPriors : Flag for setting independent or shared parameters among latent GPs
  • atfrequency : Choose how many variational parameters iterations are between hyperparameters optimization
  • mean : PriorMean object, check the documentation on it MeanPrior
  • ArrayType : Option for using different type of array for storage (allow for GPU usage)
source

Class for variational Gaussian Processes models (non-sparse)

VGP(X::AbstractArray{T},y::AbstractVector,
kernel::Kernel,
    likelihood::LikelihoodType,inference::InferenceType;
    verbose::Int=0,optimizer::Union{Bool,Optimizer,Nothing}=Adam(α=0.01),atfrequency::Integer=1,
    mean::Union{<:Real,AbstractVector{<:Real},PriorMean}=ZeroMean(),
    IndependentPriors::Bool=true,ArrayType::UnionAll=Vector)

Argument list :

Mandatory arguments

  • X : input features, should be a matrix N×D where N is the number of observation and D the number of dimension
  • y : input labels, can be either a vector of labels for multiclass and single output or a matrix for multi-outputs (note that only one likelihood can be applied)
  • kernel : covariance function, a single kernel from the KernelFunctions.jl package
  • likelihood : likelihood of the model, currently implemented : Gaussian, Bernoulli (with logistic link), Multiclass (softmax or logistic-softmax) see Likelihood Types
  • inference : inference for the model, can be analytic, numerical or by sampling, check the model documentation to know what is available for your likelihood see the Compatibility Table

Keyword arguments

  • verbose : How much does the model print (0:nothing, 1:very basic, 2:medium, 3:everything)
  • optimizer : Optimizer for kernel hyperparameters (to be selected from GradDescent.jl) or set it to false to keep hyperparameters fixed
  • atfrequency : Choose how many variational parameters iterations are between hyperparameters optimization
  • mean : PriorMean object, check the documentation on it MeanPrior
  • IndependentPriors : Flag for setting independent or shared parameters among latent GPs
  • ArrayType : Option for using different type of array for storage (allow for GPU usage)
source

Class for sparse variational Gaussian Processes

SVGP(X::AbstractArray{T1},y::AbstractVector{T2},kernel::Kernel,
    likelihood::LikelihoodType,inference::InferenceType, nInducingPoints::Int;
    verbose::Int=0,optimizer::Union{Optimizer,Nothing,Bool}=Adam(α=0.01),atfrequency::Int=1,
    mean::Union{<:Real,AbstractVector{<:Real},PriorMean}=ZeroMean(),
    Zoptimizer::Union{Optimizer,Nothing,Bool}=false,
    ArrayType::UnionAll=Vector)

Argument list :

Mandatory arguments

  • X : input features, should be a matrix N×D where N is the number of observation and D the number of dimension
  • y : input labels, can be either a vector of labels for multiclass and single output or a matrix for multi-outputs (note that only one likelihood can be applied)
  • kernel : covariance function, can be either a single kernel or a collection of kernels for multiclass and multi-outputs models
  • likelihood : likelihood of the model, currently implemented : Gaussian, Student-T, Laplace, Bernoulli (with logistic link), Bayesian SVM, Multiclass (softmax or logistic-softmax) see Likelihood
  • inference : inference for the model, can be analytic, numerical or by sampling, check the model documentation to know what is available for your likelihood see the Compatibility table
  • nInducingPoints : number of inducing points

Optional arguments

  • verbose : How much does the model print (0:nothing, 1:very basic, 2:medium, 3:everything)
  • optimizer : Optimizer for kernel hyperparameters (to be selected from GradDescent.jl) or set it to false to keep hyperparameters fixed
  • atfrequency : Choose how many variational parameters iterations are between hyperparameters optimization
  • mean : PriorMean object, check the documentation on it MeanPrior
  • IndependentPriors : Flag for setting independent or shared parameters among latent GPs
  • optimizer : Optimizer for inducing point locations (to be selected from GradDescent.jl)
  • ArrayType : Option for using different type of array for storage (allow for GPU usage)
source

Likelihood Types

GaussianLikelihood(σ²::T=1e-3) #σ² is the variance

Gaussian noise :

\[ p(y|f) = N(y|f,σ²)\]

There is no augmentation needed for this likelihood which is already conjugate to a Gaussian prior

source
StudentTLikelihood(ν::T,σ::Real=one(T))

Student-t likelihood for regression:

\[ p(y|f,ν,σ) = Γ(0.5(ν+1))/(sqrt(νπ) σ Γ(0.5ν)) * (1+(y-f)^2/(σ^2ν))^(-0.5(ν+1))\]

ν is the number of degrees of freedom and σ is the variance for local scale of the data.


For the analytical solution, it is augmented via:

\[ p(y|f,ω) = N(y|f,σ^2 ω)\]

Where ω ~ IG(0.5ν,,0.5ν) where IG is the inverse gamma distribution See paper Robust Gaussian Process Regression with a Student-t Likelihood

source
LaplaceLikelihood(β::T=1.0)  #  Laplace likelihood with scale β

Laplace likelihood for regression:

\[1/(2β) exp(-|y-f|/β)\]

see wiki page

For the analytical solution, it is augmented via:

\[p(y|f,ω) = N(y|f,ω⁻¹)\]

where $ω ~ Exp(ω | 1/(2 β^2))$, and Exp is the Exponential distribution We use the variational distribution $q(ω) = GIG(ω | a,b,p)$

source
LogisticLikelihood()

Bernoulli likelihood with a logistic link for the Bernoulli likelihood

\[ p(y|f) = \sigma(yf) = \frac{1}{1+\exp(-yf)},\]

(for more info see : wiki page)


For the analytic version the likelihood, it is augmented via:

\[ p(y|f,ω) = exp(0.5(yf - (yf)^2 ω))\]

where $ω ~ PG(ω | 1, 0)$, and PG is the Polya-Gamma distribution See paper : Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation

source
HeteroscedasticLikelihood(λ::T=1.0)

Gaussian with heteroscedastic noise given by another gp:

\[ p(y|f,g) = N(y|f,(λ σ(g))⁻¹)\]

Where σ is the logistic function

Augmentation will be described in a future paper

source
BayesianSVM()

The Bayesian SVM is a Bayesian interpretation of the classical SVM.

\[p(y|f) ∝ exp(2 max(1-yf,0)) ```` --- For the analytic version of the likelihood, it is augmented via:\]

math p(y|f,ω) = 1/(sqrt(2πω) exp(-0.5((1+ω-yf)^2/ω)) `$where$ω ∼ 𝟙[0,∞)`` has an improper prior (his posterior is however has a valid distribution, a Generalized Inverse Gaussian). For reference see this paper

source
    SoftMaxLikelihood()

Multiclass likelihood with Softmax transformation:

\[p(y=i|{fₖ}) = exp(fᵢ)/ ∑ₖexp(fₖ)\]

There is no possible augmentation for this likelihood

source
    LogisticSoftMaxLikelihood()

The multiclass likelihood with a logistic-softmax mapping: :

\[p(y=i|{fₖ}₁ᴷ) = σ(fᵢ)/∑ₖ σ(fₖ)\]

where σ is the logistic function. This likelihood has the same properties as softmax. –-

For the analytical version, the likelihood is augmented multiple times. More details can be found in the paper Multi-Class Gaussian Process Classification Made Conjugate: Efficient Inference via Data Augmentation

source
    Poisson Likelihood(λ::T=1.0)

Poisson Likelihood where a Poisson distribution is defined at every point in space (careful, it's different from continous Poisson processes)

\[ p(y|f) = Poisson(y|λσ(f))\]

Where σ is the logistic function Augmentation details will be released at some point (open an issue if you want to see them)

source
    NegBinomialLikelihood(r::Int=10)

Negative Binomial likelihood with number of failures r

\[ p(y|r,f) = binomial(y+r-1,y) (1-σ(f))ʳσ(f)ʸ\]

Where σ is the logistic function

source

Inference Types

AnalyticVI

Variational Inference solver for conjugate or conditionally conjugate likelihoods (non-gaussian are made conjugate via augmentation) All data is used at each iteration (use AnalyticSVI for Stochastic updates)

AnalyticVI(;ϵ::T=1e-5)

Keywords arguments

- `ϵ::T` : convergence criteria
source

AnalyticSVI Stochastic Variational Inference solver for conjugate or conditionally conjugate likelihoods (non-gaussian are made conjugate via augmentation)

AnalyticSVI(nMinibatch::Integer;ϵ::T=1e-5,optimizer::Optimizer=InverseDecay())
- `nMinibatch::Integer` : Number of samples per mini-batches

Keywords arguments

- `ϵ::T` : convergence criteria
- `optimizer::Optimizer` : Optimizer used for the variational updates. Should be an Optimizer object from the [GradDescent.jl](https://github.com/jacobcvt12/GradDescent.jl) package. Default is `InverseDecay()` (ρ=(τ+iter)^-κ)
source
GibbsSampling(;ϵ::T=1e-5,nBurnin::Int=100,samplefrequency::Int=1)

Draw samples from the true posterior via Gibbs Sampling.

Keywords arguments - ϵ::T : convergence criteria - nBurnin::Int : Number of samples discarded before starting to save samples - samplefrequency::Int : Frequency of sampling

source

QuadratureVI

Variational Inference solver by approximating gradients via numerical integration via Quadrature

QuadratureVI(ϵ::T=1e-5,nGaussHermite::Integer=20,optimizer::Optimizer=Momentum(η=0.0001))

Keyword arguments

- `ϵ::T` : convergence criteria
- `nGaussHermite::Int` : Number of points for the integral estimation
- `optimizer::Optimizer` : Optimizer used for the variational updates. Should be an Optimizer object from the [GradDescent.jl](https://github.com/jacobcvt12/GradDescent.jl) package. Default is `Momentum(η=0.0001)`
source

QuadratureSVI

Stochastic Variational Inference solver by approximating gradients via numerical integration via Quadrature

QuadratureSVI(nMinibatch::Integer;ϵ::T=1e-5,nGaussHermite::Integer=20,optimizer::Optimizer=Adam(α=0.1))
-`nMinibatch::Integer` : Number of samples per mini-batches

Keyword arguments

- `ϵ::T` : convergence criteria, which can be user defined
- `nGaussHermite::Int` : Number of points for the integral estimation (for the QuadratureVI)
- `optimizer::Optimizer` : Optimizer used for the variational updates. Should be an Optimizer object from the [GradDescent.jl](https://github.com/jacobcvt12/GradDescent.jl) package. Default is `Momentum(η=0.001)`
source

MCIntegrationVI(;ϵ::T=1e-5,nMC::Integer=1000,optimizer::Optimizer=Adam(α=0.1))

Variational Inference solver by approximating gradients via MC Integration.

Keyword arguments

- `ϵ::T` : convergence criteria, which can be user defined
- `nMC::Int` : Number of samples per data point for the integral evaluation
- `optimizer::Optimizer` : Optimizer used for the variational updates. Should be an Optimizer object from the [GradDescent.jl]() package. Default is `Adam()`
source

MCIntegrationSVI(;ϵ::T=1e-5,nMC::Integer=1000,optimizer::Optimizer=Adam(α=0.1))

Stochastic Variational Inference solver by approximating gradients via Monte Carlo integration

Argument

-`nMinibatch::Integer` : Number of samples per mini-batches

Keyword arguments

- `ϵ::T` : convergence criteria, which can be user defined
- `nMC::Int` : Number of samples per data point for the integral evaluation
- `optimizer::Optimizer` : Optimizer used for the variational updates. Should be an Optimizer object from the [GradDescent.jl]() package. Default is `Adam()`
source

Functions and methods

train!(model::AbstractGP;iterations::Integer=100,callback=0,convergence=0)

Function to train the given GP model.

Keyword Arguments

there are options to change the number of max iterations,

  • iterations::Int : Number of iterations (not necessarily epochs!)for training
  • callback::Function : Callback function called at every iteration. Should be of type function(model,iter) ... end
  • convergence::Function : Convergence function to be called every iteration, should return a scalar and take the same arguments as callback
source
Missing docstring.

Missing docstring for predict_f. Check Documenter's build log for details.

predict_y(model::AbstractGP,X_test::AbstractMatrix)

Return - the predictive mean of X_test for regression - the sign of X_test for classification - the most likely class for multi-class classification - the expected number of events for an event likelihood

source

proba_y(model::AbstractGP,X_test::AbstractMatrix)

Return the probability distribution p(ytest|model,Xtest) :

- Tuple of vectors of mean and variance for regression
- Vector of probabilities of y_test = 1 for binary classification
- Dataframe with columns and probability per class for multi-class classification
source

Kernels

Missing docstring.

Missing docstring for RBFKernel. Check Documenter's build log for details.

Missing docstring.

Missing docstring for MaternKernel. Check Documenter's build log for details.

Kernel functions

Missing docstring.

Missing docstring for kernelmatrix. Check Documenter's build log for details.

Missing docstring.

Missing docstring for kernelmatrix!. Check Documenter's build log for details.

Missing docstring.

Missing docstring for getvariance. Check Documenter's build log for details.

Missing docstring.

Missing docstring for getlengthscales. Check Documenter's build log for details.

Prior Means

ZeroMean

ZeroMean()

Construct a mean prior set to 0 and cannot be changed.

source

ConstantMean

ConstantMean(c::T=1.0;opt::Optimizer=Adam(α=0.01))

Construct a prior mean with constant c Optionally set an optimizer opt (Adam(α=0.01) by default)

source

EmpiricalMean julia` function EmpiricalMean(c::V=1.0;opt::Optimizer=Adam(α=0.01)) where {V<:AbstractVector{<:Real}} Construct a constant mean with values c Optionally give an optimizer opt (Adam(α=0.01) by default)

source

Index