Multi-output Gaussian Processes in PyMC [GSoC Week 01-03]
A personal note on the progress of incoporating Multi-output Gaussian Processes (MOGPs) into PyMC. Weeks 01-03 focus on implementing Intrinsic Coregionalization Model (ICM).
This work is supported by GSoC, NumFOCUS, and PyMC team.
1. What has been done?
In the previous weeks, I focused on implementing the Intrinsic Coregionalization Model (ICM) in PyMC.
-
In the beginning, I've started with a small goal, which is to run an Intrinsic Coregionalization Model (ICM) in PyMC. The main part of codes was already developed in PyMC v3 by Bill Engels (one of my mentors), so I just need to convert the PyMC v3 notebook into a PyMC v4 notebook.
-
The next goal is replicating the Coregionalized Regression Model example notebook in GPy. The result if ICM for this dataset is in this notebook. In addition, the example from GPytorch also be translated into PyMC here with 3 dimensional outputs.
-
What about two or more outputs with real datasets? Using the data sets here with 4 outputs: GOLD, OIL, NASDAQ, and USD. It seems to work alright in this notebook, but it still needs further improvement.
There are several issues that I faced along the way:
The issue of Mass matrix contains zeros on the diagonal
This seems a popular issue: ValueError: Mass matrix contains zeros on the diagonal.
when input y with shape [n,1]
Should we use inputs and outputs as a list similar to GPy: [x1, x2, x3]
and [y1, y2, y3]
? The pros is that it can include datasets of different sizes.
The output shape was also discussed on this pull request. I will need to look into it in detail.
How to use a Kronecker product?
with pm.Model() as model:
ell = pm.Gamma("ell", alpha=2, beta=0.5)
eta = pm.Gamma("eta", alpha=2, beta=0.5)
cov = eta**2 * pm.gp.cov.ExpQuad(1, ls=ell, active_dims=[0])
W = pm.Normal("W", mu=0, sigma=3, shape=(2,2), testval=np.random.randn(2,2))
kappa = pm.Gamma("kappa", alpha=1.5, beta=1, shape=2)
coreg = pm.gp.cov.Coregion(input_dim=2, active_dims=[1], kappa=kappa, W=W)
cov_func = coreg * cov
This coreg * cov
seems not a Kronecker product?
Several things that I plan to do:
- Implement the linear model of coregionalization (LMC) in PyMC: Use a kronecker product with two or more different kernels
- Integrate ICM and LMC into PyMC GP module [Add and/or change several kernels]
- Write an example with real data sets. This may extend the example with 4 outputs in Part 1 above.