This work is supported by GSoC, NumFOCUS, and PyMC team.

1. Work has been done?

In the last 12 weeks, I focused on implementing the Intrinsic Coregionalization Model (ICM) and Linear Coregionalization Model (LCM) in PyMC. All the experimental codes are published on this Github repository.

In Weeks 01-03, I've started with a small goal, which is to run an Intrinsic Coregionalization Model (ICM) in PyMC. The main part of codes was already developed in PyMC v3 by Bill Engels (one of my mentors), so I just need to convert the PyMC v3 notebook into a PyMC v4 notebook.

In Weeks 04-06, I've implemented Linear Coregionalization Model (LCM) in PyMC. This follows by proposing several API options for ICM and LCM.

Weeks 07-09 focus on implementing ICM and LCM using Hadamard (element-wise) product. This Hadamard product can work with same input data or different input data.

Weeks 10-12 focus on implementing ICM and LCM using Kronecker product. It is noted that this Kronecker product can ONLY work with same input data. In addition, The kernels for input data need to be stationary.
Create a PR on pymc-experimental github repo. This is a work-in-progress (WIP) PR as I still need to try and test different APIs options for different kinds of input data.

2. Work needs to be done

Finish the draft PR, that includes the implementation of both MultiOutputGP for Hadamard product and Kronecker product.
Write tests and documentations for these functions and classes.
Write two notebook example: One for Hadamard product using a baseball dataset (Thanks Chris for this data), and one for Kronecker product using the data sets here with 4 outputs: GOLD, OIL, NASDAQ, and USD.

3. A few thoughts on the project

The project allows me to learn more on Gaussian Process (GP), its advantages and also limitations. I think GP has a huge potential for spatial and temporal (time-series) data sets.

Besides, implementing GP helps me further understand on the Multivariate Normal distribution :) Although there are still a lot to learn and do. I'm especially interested in learning more on other methods for time-series data, and a comparison on the performance between these models.

Finally, I would like to thank the PyMC devs team, especially my mentors Chris Fonnesbeck, and Bill Engels for their great guidance and supports. I will definitely not able to perform the project well without their insightful suggestions. I would love to involve and contribute more to the PyMC community after this project. Also thank you NumFOCUS and GSoC program for providing me this opportunity to work on the Multi-output Gaussian Processes in PyMC project.

This work is supported by GSoC, NumFOCUS, and PyMC team.

Given input data $x$ and different outputs $o$, the ICM kernel $K$ is calculated by Kronecker product:

$$ K = K_1(x, x') \otimes K_2(o, o') $$

NOTE: This Kronecker product can ONLY work with same input data. In addition, The kernels for input data need to be stationary.

import numpy as np
import pymc as pm
from pymc.gp.cov import Covariance
import arviz as az
import matplotlib.pyplot as plt
# set the seed
np.random.seed(1)

import math
%matplotlib inline
%load_ext autoreload
%reload_ext autoreload
%autoreload 2

Set up training data: same X, three Y outputs

N = 50
train_x = np.linspace(0, 1, N)

train_y = np.stack([
    np.sin(train_x * (2 * math.pi)) + np.random.randn(len(train_x)) * 0.2,
    np.cos(train_x * (2 * math.pi)) + np.random.randn(len(train_x)) * 0.2,
    np.cos(train_x * (1 * math.pi)) + np.random.randn(len(train_x)) * 0.1,
], -1)

train_x.shape, train_y.shape

((50,), (50, 3))

fig, ax = plt.subplots(1,1, figsize=(12,5))
ax.scatter(train_x, train_y[:,0])
ax.scatter(train_x, train_y[:,1])
ax.scatter(train_x, train_y[:,2])
plt.legend(["y1", "y2", "y3"])

train_x.shape, train_y.shape

((50,), (50, 3))

x = train_x.reshape(-1,1)
y = train_y.reshape(-1,1)
x.shape, y.shape

((50, 1), (150, 1))

task_i = np.linspace(0, 2, 3)[:, None]
Xs = [x, task_i] # For training
Xs[0].shape, Xs[1].shape, x.shape

((50, 1), (3, 1), (50, 1))

M = 100
xnew = np.linspace(-0.5, 1.5, M)
Xnew = pm.math.cartesian(xnew, task_i) # For prediction
Xnew.shape

(300, 2)

Option 1: Implement ICM (one kernel) by using LatentKron with Coregion kernel

$$ K = K_1(x, x') \otimes K_2(o, o') $$

Create a model

Xs[0].shape, Xs[1]

((50, 1),
 array([[0.],
        [1.],
        [2.]]))

y = (K + noise) * α = (L x L.T) * α = y
B = L * α
L.T * B = y
B = solve(y, L) = (L\y)
α = solve(B, L.T) = (B\L.T) = L\(L\y)

with pm.Model() as model:
    # Kernel: K_1(x,x')
    ell = pm.Gamma("ell", alpha=2, beta=0.5)
    eta = pm.Gamma("eta", alpha=3, beta=1)
    cov = eta**2 * pm.gp.cov.ExpQuad(input_dim=1, ls=ell)
    
    # Coregion B matrix: K_2(o,o')
    W = pm.Normal("W", mu=0, sigma=3, shape=(3,2), initval=np.random.randn(3,2))
    kappa = pm.Gamma("kappa", alpha=1.5, beta=1, shape=3)
    coreg = pm.gp.cov.Coregion(input_dim=1, kappa=kappa, W=W)
    
    # Specify the GP.  The default mean function is `Zero`.
    mogp = pm.gp.LatentKron(cov_funcs=[cov, coreg])
    
    sigma = pm.HalfNormal("sigma", sigma=3)
    # Place a GP prior over thXse function f.
    f = mogp.prior("f", Xs=Xs)
    y_ = pm.Normal("y_", mu=f, sigma=sigma, observed=y.squeeze())

coreg.full(task_i).eval()

array([[ 2.46386292e+01, -1.21279356e+00, -3.43292549e+00],
       [-1.21279356e+00,  3.66420073e+00, -6.43785621e-03],
       [-3.43292549e+00, -6.43785621e-03,  5.78450962e+00]])

pm.model_to_graphviz(model)

%%time
with model:
    gp_trace = pm.sample(500, chains=1)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [ell, eta, W, kappa, sigma, f_rotated_]

Sampling 1 chain for 1_000 tune and 500 draw iterations (1_000 + 500 draws total) took 274 seconds.
There were 19 divergences after tuning. Increase `target_accept` or reparameterize.

CPU times: user 11min 14s, sys: 24min 24s, total: 35min 38s
Wall time: 4min 41s

Prediction

%%time
with model:
    preds = mogp.conditional("preds", Xnew, jitter=1e-6)
    gp_samples = pm.sample_posterior_predictive(gp_trace, var_names=['preds'])

CPU times: user 38.4 s, sys: 39.5 s, total: 1min 17s
Wall time: 11.9 s

pm.model_to_graphviz(model)

f_pred = gp_samples.posterior_predictive["preds"].sel(chain=0)
f_pred.shape

(500, 300)

Plot the first GP

from pymc.gp.util import plot_gp_dist
fig, axes = plt.subplots(1,1, figsize=(8,4))
plt.plot(x, train_y[:,0], 'ok', ms=3, alpha=0.5, label="Data 1");
plot_gp_dist(axes, f_pred[:, 0:N], x)
plot_gp_dist(axes, f_pred[:,Xnew[:,1] == 0], xnew)
plt.show()

Plot the second GP

from pymc.gp.util import plot_gp_dist
fig, axes = plt.subplots(1,1, figsize=(8,4))
plt.plot(x, train_y[:,1], 'ok', ms=3, alpha=0.5, label="Data 1");
plot_gp_dist(axes, f_pred[:, N:2*N], x)
plot_gp_dist(axes, f_pred[:,Xnew[:,1] == 1], xnew)
plt.show()

Option 2.1: Implement ICM (one kernel) by using pm.gp.cov.Kron with pm.gp.Marginal

$$ K = K_1(x, x') \otimes K_2(o, o') $$

X = pm.math.cartesian(x, task_i)
x.shape, task_i.shape, X.shape

((50, 1), (3, 1), (150, 2))

with pm.Model() as model:
    ell = pm.Gamma("ell", alpha=2, beta=0.5)
    eta = pm.Gamma("eta", alpha=3, beta=1)
    cov = eta**2 * pm.gp.cov.ExpQuad(1, ls=ell)
    
    W = pm.Normal("W", mu=0, sigma=3, shape=(3,2), initval=np.random.randn(3,2))
    kappa = pm.Gamma("kappa", alpha=1.5, beta=1, shape=3)
    coreg = pm.gp.cov.Coregion(input_dim=1, kappa=kappa, W=W)    
    
    cov_func = pm.gp.cov.Kron([cov, coreg])    
    sigma = pm.HalfNormal("sigma", sigma=3)    
    gp = pm.gp.Marginal(cov_func=cov_func)    
    y_ = gp.marginal_likelihood("f", X, y.squeeze(), noise=sigma)

cov(x).eval().shape, coreg(task_i).eval().shape, cov_func(X).eval().shape

((50, 50), (3, 3), (150, 150))

%%time
with model:
    gp_trace = pm.sample(500, chains=1)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [ell, eta, W, kappa, sigma]

Sampling 1 chain for 1_000 tune and 500 draw iterations (1_000 + 500 draws total) took 97 seconds.

CPU times: user 4min 40s, sys: 8min 9s, total: 12min 50s
Wall time: 1min 41s

Prediction

%%time
with model:
    preds = gp.conditional("preds", Xnew, jitter=1e-6)
    gp_samples = pm.sample_posterior_predictive(gp_trace, var_names=['preds'])

CPU times: user 37.4 s, sys: 37.2 s, total: 1min 14s
Wall time: 10.7 s

pm.model_to_graphviz(model)

Xnew.shape

(300, 2)

Marginalf_pred = gp_samples.posterior_predictive["preds"].sel(chain=0)
f_pred.shape

(500, 300)

Plot the GP prediction

from pymc.gp.util import plot_gp_dist
fig, axes = plt.subplots(3,1, figsize=(10,10))

for idx in range(3):
    axes[idx].plot(x, train_y[:,idx], 'ok', ms=3, alpha=0.5, label=f"Data {idx}");
    plot_gp_dist(axes[idx], f_pred[:,Xnew[:,1] == idx], xnew,
                 fill_alpha=0.5, samples_alpha=0.1)

plt.show()

az.summary(gp_trace)

arviz - WARNING - Shape validation failed: input_shape: (1, 500), minimum_shape: (chains=2, draws=4)

az.plot_trace(gp_trace);
plt.tight_layout()

Option 2.2: Implement LCM by using pm.gp.cov.Kron with pm.gp.Marginal

$$ K = ( K_{11}(x, x') + K_{12}(x, x') ) \otimes K_2(o, o') $$

X = pm.math.cartesian(x, task_i)
x.shape, task_i.shape, X.shape

((50, 1), (3, 1), (150, 2))

with pm.Model() as model:
    ell = pm.Gamma("ell", alpha=2, beta=0.5)
    eta = pm.Gamma("eta", alpha=3, beta=1)
    cov = eta**2 * pm.gp.cov.ExpQuad(1, ls=ell)
    
    ell2 = pm.Gamma("ell2", alpha=2, beta=0.5)
    eta2 = pm.Gamma("eta2", alpha=3, beta=1)
    cov2 = eta2**2 * pm.gp.cov.Matern32(1, ls=ell2)
    
    W = pm.Normal("W", mu=0, sigma=3, shape=(3,2), initval=np.random.randn(3,2))
    kappa = pm.Gamma("kappa", alpha=1.5, beta=1, shape=3)
    coreg = pm.gp.cov.Coregion(input_dim=1, kappa=kappa, W=W)    
    
    cov_func = pm.gp.cov.Kron([cov+cov2, coreg])    
    sigma = pm.HalfNormal("sigma", sigma=3)    
    gp = pm.gp.Marginal(cov_func=cov_func)    
    y_ = gp.marginal_likelihood("f", X, y.squeeze(), noise=sigma)

cov(x).eval().shape, coreg(task_i).eval().shape, cov_func(X).eval().shape

((50, 50), (3, 3), (150, 150))

%%time
with model:
    gp_trace = pm.sample(500, chains=1)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [ell, eta, ell2, eta2, W, kappa, sigma]

Sampling 1 chain for 1_000 tune and 500 draw iterations (1_000 + 500 draws total) took 129 seconds.

CPU times: user 6min 3s, sys: 11min 1s, total: 17min 5s
Wall time: 2min 15s

Prediction

%%time
with model:
    preds = gp.conditional("preds", Xnew, jitter=1e-6)
    gp_samples = pm.sample_posterior_predictive(gp_trace, var_names=['preds'])

CPU times: user 45.1 s, sys: 53.7 s, total: 1min 38s
Wall time: 14.2 s

pm.model_to_graphviz(model)

Xnew.shape

(300, 2)

f_pred = gp_samples.posterior_predictive["preds"].sel(chain=0)
f_pred.shape

(500, 300)

Plot the GP prediction

from pymc.gp.util import plot_gp_dist
fig, axes = plt.subplots(3,1, figsize=(10,10))

for idx in range(3):
    axes[idx].plot(x, train_y[:,idx], 'ok', ms=3, alpha=0.5, label=f"Data {idx}");
    plot_gp_dist(axes[idx], f_pred[:,Xnew[:,1] == idx], xnew,
                 fill_alpha=0.5, samples_alpha=0.1)

plt.show()

az.summary(gp_trace)

arviz - WARNING - Shape validation failed: input_shape: (1, 500), minimum_shape: (chains=2, draws=4)

%load_ext watermark
%watermark -n -u -v -iv -w

Last updated: Wed Sep 07 2022

Python implementation: CPython
Python version       : 3.9.12
IPython version      : 8.3.0

matplotlib: 3.5.2
pymc      : 4.1.5
numpy     : 1.22.4
arviz     : 0.12.1

Watermark: 2.3.0

This work is supported by GSoC, NumFOCUS, and PyMC team

Given input data $x$ and different outputs $o$, the ICM kernel $K$ is calculated by Hadamard or element-wise product: $$ K = K_1(x, x') * K_2(o, o') $$

Where $K_2(o, o')$ is broadcast into the shape of input data $K_2(x, x')$ using Coregion kernel.

NOTE: This Hadamard product can work with same input data or different input data.

import numpy as np
import pymc as pm
import arviz as az
import matplotlib.pyplot as plt
# set the seed
np.random.seed(1)
from multi_ouputs import build_XY, ICM, LCM, MultiMarginal
from mo import MultiOutputMarginal

import math
%matplotlib inline
%load_ext autoreload
%reload_ext autoreload
%autoreload 2

Set up training data

N = 50
train_x = np.linspace(0, 1, N)

train_y = np.stack([
    np.sin(train_x * (2 * math.pi)) + np.random.randn(len(train_x)) * 0.2,
    np.cos(train_x * (2 * math.pi)) + np.random.randn(len(train_x)) * 0.2,
    np.cos(train_x * (1 * math.pi)) + np.random.randn(len(train_x)) * 0.1,
], -1)

train_x.shape, train_y.shape

((50,), (50, 3))

fig, ax = plt.subplots(1,1, figsize=(12,5))
ax.scatter(train_x, train_y[:,0])
ax.scatter(train_x, train_y[:,1])
ax.scatter(train_x, train_y[:,2])
plt.legend(["y1", "y2", "y3"])

train_x.shape, train_y.shape

((50,), (50, 3))

np.vstack([train_y[:,0], train_y[:,1], train_y[:,2]]).shape

(3, 50)

x = train_x.reshape(-1,1)
X, Y, _ = build_XY([x,x,x], 
                   [train_y[:,0].reshape(-1,1), 
                    train_y[:,1].reshape(-1,1), 
                    train_y[:,2].reshape(-1,1)])
x.shape, X.shape, Y.shape

((50, 1), (150, 2), (150, 1))

M = 100
x_new = np.linspace(-0.5, 1.5, M)[:, None]
X_new, _, _ = build_XY([x_new, x_new, x_new])

X_new.shape

(300, 2)

ICM: one kernel

$$ K = K_1(x, x') * K_2(o, o') $$

import aesara.tensor as at

with pm.Model() as model:
    ell = pm.Gamma("ell", alpha=2, beta=0.5)
    eta = pm.Gamma("eta", alpha=3, beta=1)
    cov = eta**2 * pm.gp.cov.ExpQuad(input_dim=2, ls=ell, active_dims=[0])
    
    W = np.random.rand(3,2) # (n_outputs, w_rank)
    kappa = np.random.rand(3)
    B = pm.Deterministic('B', at.dot(W, W.T) + at.diag(kappa))
    sigma = pm.HalfNormal("sigma", sigma=3)
    
    mogp = MultiOutputMarginal(means=0, kernels=[cov], input_dim=2, active_dims=[1], num_outputs=3, B=B)
    y_ = mogp.marginal_likelihood("f", X, Y.squeeze(), noise=sigma)

B

pm.model_to_graphviz(model)

%%time
with model:
    gp_trace = pm.sample(500, chains=1)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [ell, eta, sigma]

Sampling 1 chain for 1_000 tune and 500 draw iterations (1_000 + 500 draws total) took 18 seconds.

CPU times: user 55.3 s, sys: 1min 22s, total: 2min 17s
Wall time: 24.8 s

%%time
with model:
    preds = mogp.conditional("preds", X_new)
    gp_samples = pm.sample_posterior_predictive(gp_trace, var_names=['preds'], random_seed=42)

CPU times: user 35.4 s, sys: 38.9 s, total: 1min 14s
Wall time: 10.7 s

pm.model_to_graphviz(model)

from pymc.gp.util import plot_gp_dist

f_pred = gp_samples.posterior_predictive["preds"].sel(chain=0)
fig, axes = plt.subplots(3,1, figsize=(10,10))

for idx in range(3):
    plot_gp_dist(axes[idx], f_pred[:,n_points*idx:n_points*(idx+1)], 
                 X_new[n_points*idx:n_points*(idx+1),0], 
                 palette="Blues", fill_alpha=0.5, samples_alpha=0.1)
    axes[idx].plot(x, train_y[:,idx], 'ok', ms=3, alpha=0.5, label="Data 1");

az.summary(gp_trace)

arviz - WARNING - Shape validation failed: input_shape: (1, 500), minimum_shape: (chains=2, draws=4)

LCM: two or more kernels

$$ K = ( K_{11}(x, x') + K_{12}(x, x') ) * K_2(o, o') $$

with pm.Model() as model:
    # Priors
    ell = pm.Gamma("ell", alpha=2, beta=0.5, shape=2)
    eta = pm.Gamma("eta", alpha=3, beta=1, shape=2)
    kernels = [pm.gp.cov.ExpQuad, pm.gp.cov.Matern32]
    sigma = pm.HalfNormal("sigma", sigma=3)
    
    # Define a list of covariance functions
    cov_list = [eta[idx] ** 2 * kernel(2,ls=ell[idx], active_dims=[0]) 
                for idx, kernel in enumerate(kernels)]
    
    # Define a Multi-output GP 
    mogp = MultiOutputMarginal(means=0, kernels=cov_list, input_dim=2, active_dims=[1], num_outputs=3)    
    y_ = mogp.marginal_likelihood("f", X, Y.squeeze(), noise=sigma)

None
None

pm.model_to_graphviz(model)
# x1, y1
# x2, y2
# x3, y3

%%time
with model:
    gp_trace = pm.sample(500, chains=1)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [ell, eta, sigma, ICM_kappa, ICM_W]

Sampling 1 chain for 1_000 tune and 500 draw iterations (1_000 + 500 draws total) took 93 seconds.

CPU times: user 3min 57s, sys: 7min 46s, total: 11min 44s
Wall time: 1min 40s

%%time
with model:
    preds = mogp.conditional("preds", X_new)
    gp_samples = pm.sample_posterior_predictive(gp_trace, var_names=['preds'], random_seed=42)

CPU times: user 48.8 s, sys: 1min 10s, total: 1min 59s
Wall time: 17.3 s

pm.model_to_graphviz(model)

from pymc.gp.util import plot_gp_dist

f_pred = gp_samples.posterior_predictive["preds"].sel(chain=0)
fig, axes = plt.subplots(3,1, figsize=(10,10))

for idx in range(3):
    plot_gp_dist(axes[idx], f_pred[:,n_points*idx:n_points*(idx+1)], 
                 X_new[n_points*idx:n_points*(idx+1),0], 
                 palette="Blues", fill_alpha=0.5, samples_alpha=0.1)
    axes[idx].plot(x, train_y[:,idx], 'ok', ms=3, alpha=0.5, label="Data 1");

az.summary(gp_trace)

arviz - WARNING - Shape validation failed: input_shape: (1, 500), minimum_shape: (chains=2, draws=4)

az.plot_trace(gp_trace);
plt.tight_layout()

%load_ext watermark
%watermark -n -u -v -iv -w

Last updated: Wed Sep 07 2022

Python implementation: CPython
Python version       : 3.9.12
IPython version      : 8.3.0

matplotlib: 3.5.2
numpy     : 1.22.4
arviz     : 0.12.1
aesara    : 2.7.9
pymc      : 4.1.5

Watermark: 2.3.0

This work is supported by GSoC, NumFOCUS, and PyMC team.

import math
import numpy as np
import pymc as pm
import arviz as az
import matplotlib.pyplot as plt
# set the seed
np.random.seed(1)
%matplotlib inline
%load_ext autoreload
%reload_ext autoreload
%autoreload 2

Set up training data

train_x = np.linspace(0, 1, 50)

train_y = np.stack([
    np.sin(train_x * (2 * math.pi)) + np.random.randn(len(train_x)) * 0.2,
    np.cos(train_x * (2 * math.pi)) + np.random.randn(len(train_x)) * 0.2,
    np.cos(train_x * (1 * math.pi)) + np.random.randn(len(train_x)) * 0.1,
], -1)

train_x.shape, train_y.shape

((50,), (50, 3))

fig, ax = plt.subplots(1,1, figsize=(12,5))
ax.scatter(train_x, train_y[:,0])
ax.scatter(train_x, train_y[:,1])
ax.scatter(train_x, train_y[:,2])
plt.legend(["sin", "cos"])

x = train_x
xx = np.concatenate((x, x, x), axis=0)[:,None]
n = len(x)
idx2 = np.ones(n) + 1
idx = np.concatenate((np.zeros(n), np.ones(n), idx2))[:,None]
X = np.concatenate((xx, idx), axis=1)

y = np.concatenate((train_y[:,0], train_y[:,1], train_y[:,2]))
x.shape, X.shape, y.shape

((50,), (150, 2), (150,))

LCM model in PyMC

X.shape, y.shape

((150, 2), (150,))

with pm.Model() as model:
    ell = pm.Gamma("ell", alpha=2, beta=0.5)
    eta = pm.Gamma("eta", alpha=2, beta=0.5)
    cov = eta**2 * pm.gp.cov.ExpQuad(2, ls=ell, active_dims=[0])
    
    ell2 = pm.Gamma("ell2", alpha=2, beta=0.5)
    eta2 = pm.Gamma("eta2", alpha=2, beta=0.5)
    cov2 = eta**2 * pm.gp.cov.Matern32(2, ls=ell, active_dims=[0])
    
    W = pm.Normal("W", mu=0, sigma=3, shape=(3,2), initval=np.random.randn(3,2))
    kappa = pm.Gamma("kappa", alpha=1.5, beta=1, shape=3)
    coreg = pm.gp.cov.Coregion(input_dim=2, active_dims=[1], kappa=kappa, W=W)
    
    W2 = pm.Normal("W2", mu=0, sigma=3, shape=(3,2), initval=np.random.randn(3,2))
    kappa2 = pm.Gamma("kappa2", alpha=1.5, beta=1, shape=3)
    coreg2 = pm.gp.cov.Coregion(input_dim=2, active_dims=[1], kappa=kappa2, W=W2)
    
    cov_func1 = coreg * cov #pm.gp.cov.Prod([coreg, cov])
    cov_func2 = coreg2 * cov2 #pm.gp.cov.Prod([coreg2, cov2])
    cov_func = cov_func1 + cov_func2 #pm.gp.cov.Add([cov_func1, cov_func2])
    
    sigma = pm.HalfNormal("sigma", sigma=3)
    gp = pm.gp.Marginal(cov_func=cov_func)
    y_ = gp.marginal_likelihood("f", X, y, noise=sigma)

%%time
with model:
    gp_trace = pm.sample(500, chains=1)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [ell, eta, ell2, eta2, W, kappa, W2, kappa2, sigma]

Sampling 1 chain for 1_000 tune and 500 draw iterations (1_000 + 500 draws total) took 246 seconds.
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.

CPU times: user 11min 13s, sys: 21min 26s, total: 32min 39s
Wall time: 4min 16s

x_new = np.linspace(-0.5, 1.5, 200)[:, None]
xx_new = np.concatenate((x_new, x_new, x_new), axis=0)
idx2 = np.ones(200) + 1
idx2 = np.concatenate((np.zeros(200), np.ones(200), idx2))[:, None]
X_new = np.concatenate((xx_new, idx2), axis=1)

X_new.shape

(600, 2)

with model:
    preds = gp.conditional("preds", X_new)
    gp_samples = pm.sample_posterior_predictive(gp_trace, var_names=['preds'], random_seed=42)

from pymc.gp.util import plot_gp_dist
fig = plt.figure(figsize=(12,5))
ax = fig.gca()

f_pred = gp_samples.posterior_predictive["preds"].sel(chain=0)
plot_gp_dist(ax, f_pred[:,:200], X_new[:200,0], palette="Blues", fill_alpha=0.5, samples_alpha=0.1)
ax.plot(x, train_y[:,0], 'ok', ms=3, alpha=0.5, label="Data 1");

from pymc.gp.util import plot_gp_dist
fig = plt.figure(figsize=(12,5))
ax = fig.gca()

plot_gp_dist(ax, f_pred[:,200:400], X_new[200:400,0], palette="Blues", fill_alpha=0.9, samples_alpha=0.1)
ax.plot(x, train_y[:,1], 'ok', ms=3, alpha=0.5, label="Data 2");
ax.set_ylim([-4,4])

(-4.0, 4.0)

from pymc.gp.util import plot_gp_dist
fig = plt.figure(figsize=(12,5))
ax = fig.gca()

plot_gp_dist(ax, f_pred[:,400:], X_new[400:,0], palette="Blues", fill_alpha=0.9, samples_alpha=0.1)
ax.plot(x, train_y[:,2], 'ok', ms=3, alpha=0.5, label="Data 2");
ax.set_ylim([-4,4])

(-4.0, 4.0)

az.summary(gp_trace)

arviz - WARNING - Shape validation failed: input_shape: (1, 500), minimum_shape: (chains=2, draws=4)

az.plot_trace(gp_trace);

%load_ext watermark
%watermark -n -u -v -iv -w

Last updated: Sun Sep 04 2022

Python implementation: CPython
Python version       : 3.9.12
IPython version      : 8.3.0

numpy     : 1.22.4
matplotlib: 3.5.2
arviz     : 0.12.1
pymc      : 4.1.5

Watermark: 2.3.0

This work is supported by GSoC, NumFOCUS, and PyMC team.

1. What has been done?

In the previous weeks, I focused on implementing the Intrinsic Coregionalization Model (ICM) in PyMC.

In the beginning, I've started with a small goal, which is to run an Intrinsic Coregionalization Model (ICM) in PyMC. The main part of codes was already developed in PyMC v3 by Bill Engels (one of my mentors), so I just need to convert the PyMC v3 notebook into a PyMC v4 notebook.
The next goal is replicating the Coregionalized Regression Model example notebook in GPy. The result if ICM for this dataset is in this notebook. In addition, the example from GPytorch also be translated into PyMC here with 3 dimensional outputs.
What about two or more outputs with real datasets? Using the data sets here with 4 outputs: GOLD, OIL, NASDAQ, and USD. It seems to work alright in this notebook, but it still needs further improvement.

2. Discussions

There are several issues that I faced along the way:

The issue of Mass matrix contains zeros on the diagonal

This seems a popular issue: ValueError: Mass matrix contains zeros on the diagonal. when input y with shape [n,1]

Should we use inputs and outputs as a list similar to GPy: [x1, x2, x3] and [y1, y2, y3]? The pros is that it can include datasets of different sizes.

The output shape was also discussed on this pull request. I will need to look into it in detail.

How to use a Kronecker product?

with pm.Model() as model:
    ell = pm.Gamma("ell", alpha=2, beta=0.5)
    eta = pm.Gamma("eta", alpha=2, beta=0.5)
    cov = eta**2 * pm.gp.cov.ExpQuad(1, ls=ell, active_dims=[0])

    W = pm.Normal("W", mu=0, sigma=3, shape=(2,2), testval=np.random.randn(2,2))
    kappa = pm.Gamma("kappa", alpha=1.5, beta=1, shape=2)
    coreg = pm.gp.cov.Coregion(input_dim=2, active_dims=[1], kappa=kappa, W=W)
    cov_func = coreg * cov

This coreg * cov seems not a Kronecker product?

3. Next steps

Several things that I plan to do:

Implement the linear model of coregionalization (LMC) in PyMC: Use a kronecker product with two or more different kernels
Integrate ICM and LMC into PyMC GP module [Add and/or change several kernels]
Write an example with real data sets. This may extend the example with 4 outputs in Part 1 above.

1. Pre-knowledge

Before moving to Gaussian Process, a Bayesian non-parametric method, one should be familiar with parametric Bayesian models
Firstly, I will start from Richard McElreath's Statistical Rethinking by watching his lecture on Youtube, reading the book and doing excercises. The homework solution coded in PyMC is here thanks to Gabriel B.C. I prefer Python and PyMC, so I will use the PyMC implemetation of the book.
Secondly, Bayesian Analysis with Python (second edition) by Osvaldo Martin is a really good book to learn Bayseian data analysis with PyMC.
Thirdly, Probabilistic Programming and Bayesian Methods for Hackers: An introduction to Bayesian methods and probabilistic programming. This one really help to know how Bayesian methods are used in different applications.

I also found the PyMCon2020 talk: My Journey in Learning and Relearning Bayesian Statistics by Ali Akbar Septiandri is really helpful.

2. Gaussian Process [Once Gaussian always Gaussian]

2.1. Kernels

The lecture video and notes on the Machine Learning for Intelligent Systems course at Cornell University is a great introduction on general kernels as as Linear, Polynomial, Radial Basis Function (RBF) (aka Gaussian Kernel), Exponential Kernel, ...

https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote14.html

Note that not any function K(⋅,⋅) → R can be used as a kernel. Only the matrix K(xi,xj) has to correspond to real inner-products after some transformation x→ϕ(x), and if and only if K is positive semi-definite.

Later, to learn more on kernels,

2.2. Introduction to Gaussian Process

A Primer on Gaussian Processes for Regression Analysis from Chris Fonnesbeck | PyData NYC 2019, Youtube link Notebooks on Github link is a great place to start to learn about GP. He introduced with a simple regression problem, then move to a simple Gaussian Process model using PyMC.

To understand more on Gaussian Process, I found this lecture on Gaussian Processes of from Cornell Uni is really helpful. Many thanks to Kilian Weinberger to upload his notes as well as lecture videos publiclly.

2.3. Gaussian Process Summer Schools

Gaussian Process Summer Schools is a great place to learn various topics on GPs. The materials and slides can be found on gpschool github, while the records were published on Youtube.

I would suggest to start ton the 2017 Gaussian Process Summer Schools, as this year has a comprehensive introduction into GPs, and other topics. However, if you want to check more updated topics on GPs, just watch the recent workshops.

2.4. Deep dive into GP

On kernels:

Chapter 5, Carl Eduard Rasmussen and Christopher K.I. Williams, “Gaussian Processes for Machine Learning”, MIT Press 2006, the PDF version of the book here
The Kernel Cookbook: Advice on Covariance functions by David Duvenaud here
PyMC examples of GP: https://github.com/pymc-devs/pymc-examples/tree/main/examples/gaussian_processes

Deep dive into GP by implementing GP from scratch. Building GPs from numpy and scipy is a good way to deep understand how GPs work. From that, I think it also helps to know more insights into Multi-variate normal distributions.

Small notes

At the begining, it is kind of difficult to understand and work with GP. It needs resilient. I have watched and re-watched some videos and played with notebooks several times.

Knowning GP helps understanding more on parametric Bayesian models and distributions. Expecially Multivariate Normal distribution.

%matplotlib inline
%config InlineBackend.figure_format = 'svg'

import numpy as np
import scipy
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.axes_grid1 import make_axes_locatable
import matplotlib.gridspec as gridspec
import seaborn as sns

# Set matplotlib and seaborn plotting style
sns.set_style('darkgrid')
np.random.seed(42)

1. A kernel example

def exp_quadratic(xa, xb):
    """Exponentiated quadratic  with σ=1"""
    # L2 distance (Squared Euclidian)
    sq_norm = -0.5 * scipy.spatial.distance.cdist(xa, xb, 'sqeuclidean')
    return np.exp(sq_norm)

X = np.expand_dims(np.linspace(*xlim, 25), 1)
Σ = exp_quadratic(X, X)
plt.imshow(Σ, cmap=cm.YlGnBu);

zero = np.array([[0]])
Σ0 = exp_quadratic(X, zero)
plt.plot(X[:,0], Σ0[:,0]);

2. Sampling from prior

n_samples = 100
n_funcs = 8

X = np.expand_dims(np.linspace(-4,4, n_samples), 1)
Σ = exp_quadratic(X, X)

ys = np.random.multivariate_normal(mean=np.zeros(n_samples), cov=Σ, size=n_funcs)

for i in range(n_funcs):
    plt.plot(X, ys[i], linestyle='-', marker='o', markersize=3)
plt.xlabel('$x$', fontsize=13)
plt.ylabel('$y = f(x)$', fontsize=13)
plt.title((
    f'{n_funcs} different function realizations at {n_samples} points\n'
    'sampled from a Gaussian process with exponentiated quadratic kernel'))
plt.xlim([-4, 4])
plt.show()

exponentiated_quadratic = exp_quadratic

A = np.array([[1,-2j],[2j,5]])
A, A.shape

(array([[ 1.+0.j, -0.-2.j],
        [ 0.+2.j,  5.+0.j]]),
 (2, 2))

Cholesky decomposition in numpy

L = np.linalg.cholesky(A)

np.dot(L, L.T.conj())

array([[1.+0.j, 0.-2.j],
       [0.+2.j, 5.+0.j]])

A = [[1,-2j],[2j,5]] # what happens if A is only array_like?
np.linalg.cholesky(A) # an ndarray object is returned

array([[1.+0.j, 0.+0.j],
       [0.+2.j, 1.+0.j]])

np.linalg.cholesky(np.matrix(A))

matrix([[1.+0.j, 0.+0.j],
        [0.+2.j, 1.+0.j]])

def GP(X1, y1, X2, kernel_func):
    cov11 = kernel_func(X1, X1)
    cov12 = kernel_func(X1, X2)
    solved = scipy.linalg.solve(cov11, cov12, assume_a='pos').T
    
    mu2 = solved @ y1
    cov22 = kernel_func(X2, X2)
    cov2 = cov22 - (solved @ cov12)
    return mu2, cov2

TODO: Convert K-1 by using cholesky

https://jaketae.github.io/study/gaussian-process/

def GP2(X1, y1, X2, kernel_func):
    K11 = kernel_func(X1, X1)
    K12 = kernel_func(X1, X2)
    K22 = kernel_func(X2, X2)
    
    #L = np.linalg.cholesky(K11)    
    mu2 = K12.T.dot(np.linalg.inv(K11)).dot(y1)
    cov2 = K22 - K12.T.dot(np.linalg.inv(K11)).dot(K12)
    return mu2, cov2

ny = 10 # Number of functions
domain = (-6, 6)
domain[0]+2, domain[1]-2, (n1, 1)
X1 = np.random.uniform(domain[0]+2, domain[1]-2, size=(n1, 1))
X1.shape

(50, 1)

%%prun
f_sin = lambda x: (np.sin(x)).flatten()

n1 = 40 # Train points
n2 = 75 # Test points
ny = 5 # Number of functions
domain = (-6, 6)

X1 = np.random.uniform(domain[0]+2, domain[1]-2, size=(n1, 1))
y1 = f_sin(X1)

X2 = np.linspace(domain[0], domain[1], n2).reshape(-1, 1)

# mu2, cov2 = GP(X1, y1, X2, exp_quadratic)
mu2, cov2 = GP2(X1, y1, X2, exp_quadratic)
sigma2 = np.sqrt(np.diag(cov2))

y2 = np.random.multivariate_normal(mean=mu2, cov=cov2, size=ny)

:16: RuntimeWarning: invalid value encountered in sqrt
:18: RuntimeWarning: covariance is not positive-semidefinite.

         296 function calls (287 primitive calls) in 0.044 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.040    0.020    0.040    0.020 linalg.py:476(inv)
        1    0.001    0.001    0.001    0.001 linalg.py:1482(svd)
        3    0.001    0.000    0.001    0.000 3828527866.py:1(exp_quadratic)
        1    0.000    0.000    0.002    0.002 {method 'multivariate_normal' of 'numpy.random.mtrand.RandomState' objects}
        3    0.000    0.000    0.000    0.000 {built-in method scipy.spatial._distance_pybind.cdist_sqeuclidean}
        1    0.000    0.000    0.044    0.044 {built-in method builtins.exec}
     17/8    0.000    0.000    0.042    0.005 {built-in method numpy.core._multiarray_umath.implement_array_function}
        1    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
        4    0.000    0.000    0.000    0.000 {method 'dot' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.044    0.044 :1()
        1    0.000    0.000    0.000    0.000 function_base.py:23(linspace)
        3    0.000    0.000    0.000    0.000 socket.py:480(send)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(linspace)
        1    0.000    0.000    0.000    0.000 {method 'uniform' of 'numpy.random.mtrand.RandomState' objects}
        1    0.000    0.000    0.041    0.041 624826088.py:1(GP2)
        1    0.000    0.000    0.000    0.000 numeric.py:2344(within_tol)
        4    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(result_type)
        3    0.000    0.000    0.000    0.000 distance.py:2616(cdist)
        6    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 numeric.py:2264(isclose)
       16    0.000    0.000    0.000    0.000 {built-in method numpy.array}
        4    0.000    0.000    0.000    0.000 fromnumeric.py:70(_wrapreduction)
        2    0.000    0.000    0.000    0.000 warnings.py:35(_formatwarnmsg_impl)
        3    0.000    0.000    0.000    0.000 iostream.py:208(schedule)
        2    0.000    0.000    0.000    0.000 iostream.py:502(write)
        3    0.000    0.000    0.000    0.000 linalg.py:135(_commonType)
        1    0.000    0.000    0.000    0.000 :1()
        2    0.000    0.000    0.000    0.000 _ufunc_config.py:32(seterr)
        2    0.000    0.000    0.000    0.000 warnings.py:403(__init__)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.abs}
        3    0.000    0.000    0.000    0.000 linalg.py:107(_makearray)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:1513(diagonal)
        2    0.000    0.000    0.000    0.000 warnings.py:20(_showwarnmsg_impl)
        2    0.000    0.000    0.000    0.000 linecache.py:82(updatecache)
        2    0.000    0.000    0.000    0.000 _ufunc_config.py:132(geterr)
        1    0.000    0.000    0.000    0.000 twodim_base.py:229(diag)
        1    0.000    0.000    0.000    0.000 {built-in method numpy.arange}
        3    0.000    0.000    0.000    0.000 fromnumeric.py:2355(all)
        2    0.000    0.000    0.000    0.000 iostream.py:420(_is_master_process)
        3    0.000    0.000    0.000    0.000 threading.py:1071(is_alive)
        9    0.000    0.000    0.000    0.000 _asarray.py:23(asarray)
        3    0.000    0.000    0.000    0.000 linalg.py:102(get_linalg_error_extobj)
        6    0.000    0.000    0.000    0.000 _asarray.py:110(asanyarray)
        2    0.000    0.000    0.040    0.020 <__array_function__ internals>:2(inv)
        1    0.000    0.000    0.000    0.000 :1017(_handle_fromlist)
        3    0.000    0.000    0.000    0.000 threading.py:1017(_wait_for_tstate_lock)
        1    0.000    0.000    0.000    0.000 {method 'any' of 'numpy.generic' objects}
       12    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        2    0.000    0.000    0.000    0.000 numerictypes.py:285(issubclass_)
        4    0.000    0.000    0.000    0.000 fromnumeric.py:71()
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(dot)
        1    0.000    0.000    0.000    0.000 numeric.py:2186(allclose)
        3    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(all)
        3    0.000    0.000    0.000    0.000 {method 'acquire' of '_thread.lock' objects}
        2    0.000    0.000    0.000    0.000 linecache.py:15(getline)
        4    0.000    0.000    0.000    0.000 linalg.py:125(_realType)
        3    0.000    0.000    0.000    0.000 linalg.py:193(_assert_stacked_2d)
       12    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
        2    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 numerictypes.py:359(issubdtype)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:2256(any)
        7    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(allclose)
        3    0.000    0.000    0.000    0.000 iostream.py:97(_event_pipe)
        2    0.000    0.000    0.000    0.000 warnings.py:96(_showwarnmsg)
        1    0.000    0.000    0.001    0.001 <__array_function__ internals>:2(svd)
        8    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        2    0.000    0.000    0.000    0.000 linecache.py:37(getlines)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(diag)
        2    0.000    0.000    0.000    0.000 {built-in method posix.getpid}
        6    0.000    0.000    0.000    0.000 linalg.py:112(isComplexType)
        4    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
        1    0.000    0.000    0.000    0.000 {method 'diagonal' of 'numpy.ndarray' objects}
        2    0.000    0.000    0.000    0.000 linalg.py:199(_assert_stacked_square)
        2    0.000    0.000    0.000    0.000 {built-in method numpy.seterrobj}
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(diagonal)
        1    0.000    0.000    0.000    0.000 _methods.py:53(_any)
        2    0.000    0.000    0.000    0.000 iostream.py:439(_schedule_flush)
        2    0.000    0.000    0.000    0.000 warnings.py:117(_formatwarnmsg)
        1    0.000    0.000    0.000    0.000 numeric.py:1865(isscalar)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(any)
        1    0.000    0.000    0.000    0.000 _ufunc_config.py:433(__enter__)
        2    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(ndim)
        1    0.000    0.000    0.000    0.000 _ufunc_config.py:438(__exit__)
        4    0.000    0.000    0.000    0.000 {method '__array_prepare__' of 'numpy.ndarray' objects}
        4    0.000    0.000    0.000    0.000 {built-in method numpy.geterrobj}
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(isclose)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:3106(ndim)
        1    0.000    0.000    0.000    0.000 _ufunc_config.py:429(__init__)
        3    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        4    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
        3    0.000    0.000    0.000    0.000 threading.py:513(is_set)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
        2    0.000    0.000    0.000    0.000 {method 'endswith' of 'str' objects}
        2    0.000    0.000    0.000    0.000 linalg.py:472(_unary_dispatcher)
        3    0.000    0.000    0.000    0.000 {method 'append' of 'collections.deque' objects}
        3    0.000    0.000    0.000    0.000 fromnumeric.py:2350(_all_dispatcher)
        2    0.000    0.000    0.000    0.000 multiarray.py:716(dot)
        2    0.000    0.000    0.000    0.000 multiarray.py:644(result_type)
        1    0.000    0.000    0.000    0.000 numeric.py:2182(_allclose_dispatcher)
        3    0.000    0.000    0.000    0.000 {built-in method builtins.callable}
        1    0.000    0.000    0.000    0.000 function_base.py:18(_linspace_dispatcher)
        1    0.000    0.000    0.000    0.000 linalg.py:1478(_svd_dispatcher)
        1    0.000    0.000    0.000    0.000 {built-in method _operator.index}
        1    0.000    0.000    0.000    0.000 numeric.py:2260(_isclose_dispatcher)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:3102(_ndim_dispatcher)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 fromnumeric.py:2251(_any_dispatcher)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:1509(_diagonal_dispatcher)
        1    0.000    0.000    0.000    0.000 twodim_base.py:225(_diag_dispatcher)

fig, (ax1, ax2) = plt.subplots(
    nrows=2, ncols=1, figsize=(6, 6))

# Plot the distribution of the function (mean, covariance)
ax1.plot(X2, f_sin(X2), 'b--', label='$sin(x)$')
ax1.fill_between(X2.flat, mu2-2*sigma2, mu2+2*sigma2, color='red', 
                 alpha=0.15, label='$2 \sigma_{2|1}$')
ax1.plot(X2, mu2, 'r-', lw=2, label='$\mu_{2|1}$')
ax1.plot(X1, y1, 'ko', linewidth=2, label='$(x_1, y_1)$')

# Plot some samples from this function
ax2.plot(X2, y2.T, '-')
ax2.set_xlabel('$x$', fontsize=13)
ax2.set_ylabel('$y$', fontsize=13)
ax2.set_title('5 different function realizations from posterior')
ax1.axis([domain[0], domain[1], -3, 3])
ax2.set_xlim([-6, 6])
plt.tight_layout()
plt.show()

from transformers import pipeline

Sentiment analysis

classifier = pipeline('sentiment-analysis')

classifier("Ihave waiting for a course my whole life.")

[{'label': 'POSITIVE', 'score': 0.9817631840705872}]

alist = ["Covid is good", "I love covid"]
classifier(alist)

[{'label': 'POSITIVE', 'score': 0.9998581409454346},
 {'label': 'POSITIVE', 'score': 0.999816358089447}]

Zero-shot classification

classifier = pipeline("zero-shot-classification")

classifier("This is a sensitive topic on transport and libarary", 
          candidate_labels=["education", "math", "business"])

{'sequence': 'This is a sensitive topic on transport and libarary',
 'labels': ['business', 'education', 'math'],
 'scores': [0.5921958684921265, 0.22866113483905792, 0.17914298176765442]}

Text generation

generator = pipeline("text-generation")

generator("In this notebook, we will")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

[{'generated_text': 'In this notebook, we will create several templates to illustrate how to generate the JavaScript code. For simplicity, we used the Angular JS example from one of our previous posts.\n\nIn fact, we would not think you would not understand. The above'}]

gen_gpt2 = pipeline("text-generation", model="distilgpt2")
gen_gpt2("In this pandas notebook, we will", max_lenght=20)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

[{'generated_text': "In this pandas notebook, we will be using the first part of the first part of the series on Pandas as a learning tool for students to learn from Pandas. In the next post, we'll look at how to use both learning tools"}]

Mask filling

unmasker = pipeline("fill-mask")
unmasker("This notebook will show  direction", top_k=2)

[{'sequence': 'This notebook will show visual direction',
  'score': 0.05728420987725258,
  'token': 7133,
  'token_str': ' visual'},
 {'sequence': 'This notebook will show editorial direction',
  'score': 0.053508173674345016,
  'token': 8161,
  'token_str': ' editorial'}]

Named entity recognition

ner = pipeline("ner", grouped_entities=True)
ner("I am Dan P who carry out research at Monash Uni in Melbourne City")

/home/danph/.pyenv/versions/3.8.5/envs/.ml/lib/python3.8/site-packages/transformers/pipelines/token_classification.py:154: UserWarning: `grouped_entities` is deprecated and will be removed in version v5.0.0, defaulted to `aggregation_strategy="AggregationStrategy.SIMPLE"` instead.
  warnings.warn(

[{'entity_group': 'PER',
  'score': 0.9955268,
  'word': 'Dan P',
  'start': 5,
  'end': 10},
 {'entity_group': 'ORG',
  'score': 0.991117,
  'word': 'Monash Uni',
  'start': 37,
  'end': 47},
 {'entity_group': 'LOC',
  'score': 0.971781,
  'word': 'Melbourne City',
  'start': 51,
  'end': 65}]

Question answering

qa = pipeline("question-answering")
qa(question="Where to I work",
   context="I am Dan P who carry out research at Monash Uni in Melbourne City")

{'score': 0.8566566705703735, 'start': 37, 'end': 47, 'answer': 'Monash Uni'}

Summarization

summarizer = pipeline("summarization")

article = """

The ongoing discussions within the presidential palace in Kabul are “utterly extraordinary” after two decades of war, CNN International Security Editor Nick Paton Walsh reports.

“This has been a morning of stunning events and that looks like we are heading towards some sort of transitional government here,” Paton Walsh said. He said names are being floated around, though nothing is confirmed, and President Ashraf Ghani would need to agree to step aside to make way for a transitional administration.

Yesterday Ghani made a brief but sombre address to the nation in which he said he was consulting with elders and other leaders both inside and outside of the country. In the short speech, he told the Afghan people his "focus is to avoid further instability, aggression and displacement," but he did not resign.

As talks on Sunday continue, Paton Walsh said there hasn’t been evidence of Taliban fighters moving into the city. Earlier panic appeared to be a clash around a bank where people were trying to withdraw money.

“I've heard sporadic gunfire here but that seems to be traffic disputes. A quick drive around the city has shown traffic has dissipated until you get towards the airport, so utter chaos and panic here. The traffic in the skies we saw around the embassy appears to have quietened as well so perhaps that might suggest some of that operation is winding up,” he continued.
The apparently last-ditch diplomatic efforts would hopefully avoid the Taliban presumably moving to its next phase of slowly entering the city, which Paton Walsh said would “not be remotely pleasant for anybody living here.”

“There will be elements of resistance too so I think everybody would prefer to avoid that kind of situation,” he added.

"""

summarizer(article)

/home/danph/.pyenv/versions/3.8.5/envs/.ml/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)

[{'summary_text': ' President Ashraf Ghani made a brief but sombre address to the nation in which he said he was consulting with elders and other leaders both inside and outside of the country . CNN International Security Editor Nick Paton Walsh said there hasn’t been evidence of Taliban fighters moving into the city .'}]

Translation

translator = pipeline("translation", model="t5-base")

/home/danph/.pyenv/versions/3.8.5/envs/.ml/lib/python3.8/site-packages/transformers/pipelines/__init__.py:497: UserWarning: "translation" task was used, instead of "translation_XX_to_YY", defaulting to "translation_en_to_de"
  warnings.warn(

translator("Hi, my name is Dan")

[{'translation_text': 'Hallo, mein Name ist Dan'}]

import math
import numpy as np
import torch
import torch.nn as nn

Generate synthesis data

copus_a = ["one is one", "two is two", "three is three", "four is four", "five is five",
           "six is six", "seven is seven", "eight is eight", "nine is nine"]
copus_b = ["1 = 1", "2 = 2", "3 = 3", "4 = 4", "5 = 5",
           "6 = 6", "7 = 7", "8 = 8", "9 = 9"]

embed_a = {"one":  [1.0,0,0,0,0,0,0,0,0,0,0,0],
           "two":  [0,1.0,0,0,0,0,0,0,0,0,0,0],
           "three":[0,0,1.0,0,0,0,0,0,0,0,0,0],
           "four": [0,0,0,1.0,0,0,0,0,0,0,0,0],
           "five": [0,0,0,0,1.0,0,0,0,0,0,0,0],
           "six":  [0,0,0,0,0,1.0,0,0,0,0,0,0],
           "seven":[0,0,0,0,0,0,1.0,0,0,0,0,0],
           "eight":[0,0,0,0,0,0,0,1.0,0,0,0,0],
           "nine": [0,0,0,0,0,0,0,0,1.0,0,0,0],
           "is":   [0,0,0,0,0,0,0,0,0,1.0,0,0],
           "less": [0,0,0,0,0,0,0,0,0,0,1.0,0],
           "more": [0,0,0,0,0,0,0,0,0,0,0,1.0]
          }

embed_b = {"9": [1.0,0,0,0,0,0,0,0,0,0,0,0],
           "8": [0,1.0,0,0,0,0,0,0,0,0,0,0],
           "7": [0,0,1.0,0,0,0,0,0,0,0,0,0],
           "6": [0,0,0,1.0,0,0,0,0,0,0,0,0],
           "5": [0,0,0,0,1.0,0,0,0,0,0,0,0],
           "4": [0,0,0,0,0,1.0,0,0,0,0,0,0],
           "3": [0,0,0,0,0,0,1.0,0,0,0,0,0],
           "2": [0,0,0,0,0,0,0,1.0,0,0,0,0],
           "1": [0,0,0,0,0,0,0,0,1.0,0,0,0],
           "=": [0,0,0,0,0,0,0,0,0,1.0,0,0],
           "<": [0,0,0,0,0,0,0,0,0,1.0,0,0],
           ">": [0,0,0,0,0,0,0,0,0,1.0,0,0],
          }

def sentence_embed(sentence, embed_dict):
    """Generate an embedding for a sentence"""
    res = []
    for word in sentence.split():
        res.append(embed_dict[word])
    return res

inp = sentence_embed("one is one", embed_a) 
out = sentence_embed("1 = 1", embed_b)
inp = torch.tensor(inp, dtype=torch.float32)
out = torch.tensor(out, dtype=torch.float32)
inp.shape, out.shape

(torch.Size([3, 12]), torch.Size([3, 12]))

Scaled dot product attention

def dot_attention(q, k, v):
    """inp: input sentence, dk: keyword dimension"""
    # Initiate weight matrix for Query, Key and Value
    dk = k.size(-1)
    logit = (q @ k.transpose(0, -1)) / math.sqrt(dk)
    weights = torch.softmax(logit, dim=-1)
    res = weights @ v
    return res

q, k, v = inp, inp, inp
dot_attention(q, k, v)

tensor([[0.7275, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.2725, 0.0000, 0.0000],
        [0.5998, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.4002, 0.0000, 0.0000],
        [0.7275, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.2725, 0.0000, 0.0000]])

Multi-head Attention

class MultiHeadAttention(nn.Module):
    def __init__(self, dm, nh):
        """
        dm: model dimenstion
        nh: number of heads
        """
        super().__init__()
        self.dm, self.nh = dm, nh
        self.dk = dm // nh
        self.heads = [{"wq":nn.Linear(self.dm, self.dk),
                      "wk":nn.Linear(self.dm, self.dk),
                      "wv":nn.Linear(self.dm, self.dk)} for h in range(nh)
                     ]        
        self.out = nn.Linear(dm, dm)
        
    def forward(self, inp):
        res = []
        for head in self.heads:
            q, k, v = head["wq"](inp), head["wk"](inp), head["wv"](inp)
            print(q.shape, k.shape, v.shape)
            res.append(dot_attention(q, k, v))
        concat = torch.cat(res, 1)
        res = self.out(concat)
        print(concat.shape, res.shape)
        return res

dm = 12
nh = 3
# dk = 12/3 = 4
mul_head = MultiHeadAttention(dm, nh)
mul_head(inp)

torch.Size([3, 4]) torch.Size([3, 4]) torch.Size([3, 4])
torch.Size([3, 4]) torch.Size([3, 4]) torch.Size([3, 4])
torch.Size([3, 4]) torch.Size([3, 4]) torch.Size([3, 4])
torch.Size([3, 12]) torch.Size([3, 12])

tensor([[-1.4953e-01,  6.1958e-02, -9.2505e-02,  1.4574e-01,  1.0211e-01,
         -1.9842e-03,  8.9212e-02,  9.2313e-02, -2.3563e-01, -5.9226e-02,
         -2.6632e-01, -1.9141e-01],
        [-1.5497e-01,  6.4145e-02, -9.3638e-02,  1.4637e-01,  1.0329e-01,
         -6.2585e-07,  9.0302e-02,  9.7207e-02, -2.3449e-01, -5.7356e-02,
         -2.6794e-01, -1.9412e-01],
        [-1.4953e-01,  6.1958e-02, -9.2505e-02,  1.4574e-01,  1.0211e-01,
         -1.9842e-03,  8.9212e-02,  9.2313e-02, -2.3563e-01, -5.9226e-02,
         -2.6632e-01, -1.9141e-01]], grad_fn=)

References:

Encoding Categorical Data

Ordinal Encoding

# define data
data = asarray([['red'], ['green'], ['blue']])
print(data)
# define ordinal encoding
encoder = OrdinalEncoder()
# transform data
result = encoder.fit_transform(data)
print(result)

[['red']
 ['green']
 ['blue']]
[[2.]
 [1.]
 [0.]]

One-Hot Encoding

from numpy import asarray
from sklearn.preprocessing import OneHotEncoder
# define data
data = asarray([['red'], ['green'], ['blue']])
print(data)
# define one hot encoding
encoder = OneHotEncoder(sparse=False)
# transform data
onehot = encoder.fit_transform(data)
print(onehot)

[['red']
 ['green']
 ['blue']]
[[0. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]]

Dummy Variable Encoding

from numpy import asarray
from sklearn.preprocessing import OneHotEncoder
# define data
data = asarray([['red'], ['green'], ['blue']])
print(data)
# define one hot encoding
encoder = OneHotEncoder(drop='first', sparse=False)
# transform data
onehot = encoder.fit_transform(data)
print(onehot)

[['red']
 ['green']
 ['blue']]
[[0. 1.]
 [1. 0.]
 [0. 0.]]

Categorical Encoding example

from pandas import read_csv
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OrdinalEncoder
# define the location of the dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/breast-cancer.csv"
# load the dataset
dataset = read_csv(url, header=None)
# retrieve the array of data
data = dataset.values
# separate into input and output columns
X = data[:, :-1].astype(str)
y = data[:, -1].astype(str)
# summarize
print('Input', X.shape)
print('Output', y.shape)

Input (286, 9)
Output (286,)

dataset.head()

type(dataset), type(data), type(X), type(y)

(pandas.core.frame.DataFrame, numpy.ndarray, numpy.ndarray, numpy.ndarray)

y.shape, X.shape, data.shape, dataset.shape

((286,), (286, 9), (286, 10), (286, 10))

OrdinalEncoder Transform

ordinal_encoder = OrdinalEncoder()
X = ordinal_encoder.fit_transform(X)
# ordinal encode target variable
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)
# summarize the transformed data
print('Input', X.shape)
print(X[:5, :])
print('Output', y.shape)
print(y[:5])

Input (286, 9)
[[2. 2. 2. 0. 1. 2. 1. 2. 0.]
 [3. 0. 2. 0. 0. 0. 1. 0. 0.]
 [3. 0. 6. 0. 0. 1. 0. 1. 0.]
 [2. 2. 6. 0. 1. 2. 1. 1. 1.]
 [2. 2. 5. 4. 1. 1. 0. 4. 0.]]
Output (286,)
[1 0 1 0 1]

from numpy import mean
from numpy import std
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OrdinalEncoder
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)
# ordinal encode input variables
ordinal_encoder = OrdinalEncoder()
ordinal_encoder.fit(X_train)
X_train = ordinal_encoder.transform(X_train)
X_test = ordinal_encoder.transform(X_test)

# ordinal encode target variable
label_encoder = LabelEncoder()
label_encoder.fit(y_train)
y_train = label_encoder.transform(y_train)
y_test = label_encoder.transform(y_test)

# define the model
model = LogisticRegression()
# fit on the training set
model.fit(X_train, y_train)
# predict on test set
yhat = model.predict(X_test)
# evaluate predictions
accuracy = accuracy_score(y_test, yhat)
print('Accuracy: %.2f' % (accuracy*100))

Accuracy: 75.79

OneHotEncoder Transform

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)
# one-hot encode input variables
onehot_encoder = OneHotEncoder()
onehot_encoder.fit(X_train)
X_train = onehot_encoder.transform(X_train)
X_test = onehot_encoder.transform(X_test)
# ordinal encode target variable
label_encoder = LabelEncoder()
label_encoder.fit(y_train)
y_train = label_encoder.transform(y_train)
y_test = label_encoder.transform(y_test)
# define the model
model = LogisticRegression()
# fit on the training set
model.fit(X_train, y_train)
# predict on test set
yhat = model.predict(X_test)
# evaluate predictions
accuracy = accuracy_score(y_test, yhat)
print('Accuracy: %.2f' % (accuracy*100))

Accuracy: 70.53

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
W[0, 0]	-0.083	2.060	-3.471	4.098	0.138	0.115	221.0	244.0	NaN
W[0, 1]	-0.020	2.143	-4.281	3.720	0.120	0.103	320.0	263.0	NaN
W[1, 0]	0.015	2.243	-4.221	4.062	0.166	0.127	198.0	167.0	NaN
W[1, 1]	0.063	2.115	-3.989	3.917	0.154	0.109	188.0	253.0	NaN
W[2, 0]	-0.042	1.194	-2.528	2.206	0.086	0.066	269.0	205.0	NaN
W[2, 1]	0.108	1.108	-2.004	2.254	0.076	0.060	240.0	162.0	NaN
ell	0.333	0.045	0.255	0.417	0.002	0.002	360.0	281.0	NaN
eta	0.681	0.275	0.264	1.174	0.017	0.012	262.0	295.0	NaN
kappa[0]	1.941	1.455	0.082	4.454	0.068	0.048	336.0	168.0	NaN
kappa[1]	1.878	1.360	0.043	4.477	0.064	0.046	393.0	216.0	NaN
kappa[2]	1.485	1.146	0.105	3.712	0.058	0.041	314.0	276.0	NaN
sigma	0.157	0.010	0.139	0.175	0.000	0.000	466.0	311.0	NaN

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
W[0, 0]	-0.023	1.845	-3.354	3.235	0.102	0.099	324.0	303.0	NaN
W[0, 1]	0.072	2.035	-4.029	3.421	0.105	0.098	386.0	320.0	NaN
W[1, 0]	-0.026	2.003	-3.490	4.118	0.115	0.085	306.0	272.0	NaN
W[1, 1]	0.122	2.012	-3.543	3.761	0.126	0.089	254.0	326.0	NaN
W[2, 0]	-0.021	0.952	-1.726	1.855	0.038	0.047	615.0	408.0	NaN
W[2, 1]	0.041	0.932	-1.730	1.894	0.036	0.051	629.0	344.0	NaN
ell	0.355	0.047	0.258	0.429	0.002	0.002	406.0	408.0	NaN
eta	0.882	0.372	0.296	1.581	0.021	0.015	337.0	240.0	NaN
ell2	4.889	3.093	0.754	9.943	0.129	0.095	452.0	285.0	NaN
eta2	0.936	0.705	0.108	2.192	0.034	0.024	438.0	378.0	NaN
kappa[0]	1.727	1.435	0.043	4.577	0.059	0.046	449.0	116.0	NaN
kappa[1]	1.576	1.122	0.018	3.631	0.046	0.033	453.0	224.0	NaN
kappa[2]	1.145	0.937	0.052	2.991	0.045	0.032	408.0	372.0	NaN
sigma	0.156	0.010	0.141	0.175	0.000	0.000	641.0	345.0	NaN

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
ell	0.304	0.040	0.232	0.379	0.002	0.002	298.0	243.0	NaN
eta	1.333	0.418	0.740	2.175	0.024	0.017	288.0	170.0	NaN
sigma	0.156	0.009	0.140	0.173	0.001	0.000	232.0	235.0	NaN
B[0, 0]	0.783	0.000	0.783	0.783	0.000	0.000	500.0	500.0	NaN
B[0, 1]	0.503	0.000	0.503	0.503	0.000	0.000	500.0	500.0	NaN
B[0, 2]	0.808	0.000	0.808	0.808	0.000	0.000	500.0	500.0	NaN
B[1, 0]	0.503	0.000	0.503	0.503	0.000	0.000	500.0	500.0	NaN
B[1, 1]	1.630	0.000	1.630	1.630	0.000	0.000	500.0	500.0	NaN
B[1, 2]	1.065	0.000	1.065	1.065	0.000	0.000	500.0	500.0	NaN
B[2, 0]	0.808	0.000	0.808	0.808	0.000	0.000	500.0	500.0	NaN
B[2, 1]	1.065	0.000	1.065	1.065	0.000	0.000	500.0	500.0	NaN
B[2, 2]	1.876	0.000	1.876	1.876	0.000	0.000	500.0	500.0	NaN

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
ICM_W[0, 0]	-0.028	2.464	-5.053	4.267	0.162	0.114	236.0	275.0	NaN
ICM_W[1, 0]	0.135	2.409	-4.711	4.523	0.119	0.089	410.0	396.0	NaN
ICM_W[2, 0]	0.010	1.359	-2.843	2.462	0.072	0.065	362.0	312.0	NaN
ell[0]	0.342	0.045	0.269	0.433	0.002	0.002	395.0	388.0	NaN
ell[1]	4.898	2.873	0.668	10.170	0.145	0.103	356.0	340.0	NaN
eta[0]	0.628	0.258	0.266	1.105	0.015	0.011	328.0	359.0	NaN
eta[1]	0.713	0.510	0.086	1.692	0.025	0.017	385.0	364.0	NaN
sigma	0.156	0.010	0.142	0.177	0.000	0.000	563.0	374.0	NaN
ICM_kappa[0]	5.215	2.101	1.898	9.280	0.081	0.061	694.0	479.0	NaN
ICM_kappa[1]	5.205	2.075	1.950	9.319	0.065	0.051	1054.0	463.0	NaN
ICM_kappa[2]	3.623	1.886	0.983	7.143	0.083	0.059	498.0	391.0	NaN

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
W[0, 0]	0.073	2.862	-5.200	5.049	0.133	0.131	471.0	395.0	NaN
W[0, 1]	-0.022	2.673	-5.814	4.383	0.121	0.114	488.0	395.0	NaN
W[1, 0]	-0.170	2.883	-5.834	4.848	0.146	0.140	388.0	361.0	NaN
W[1, 1]	0.091	3.000	-5.205	5.548	0.138	0.139	477.0	409.0	NaN
W[2, 0]	-0.346	2.614	-5.511	4.344	0.149	0.116	307.0	311.0	NaN
W[2, 1]	0.011	2.646	-4.982	4.636	0.133	0.094	413.0	439.0	NaN
W2[0, 0]	0.094	3.283	-5.072	6.234	0.215	0.152	239.0	374.0	NaN
W2[0, 1]	0.174	3.597	-6.100	6.183	0.244	0.173	226.0	369.0	NaN
W2[1, 0]	0.096	3.332	-5.530	5.907	0.242	0.171	192.0	313.0	NaN
W2[1, 1]	-0.032	3.101	-4.922	6.101	0.220	0.156	199.0	408.0	NaN
W2[2, 0]	0.004	0.914	-1.628	1.598	0.056	0.040	261.0	315.0	NaN
W2[2, 1]	0.034	0.992	-1.661	2.019	0.057	0.043	301.0	280.0	NaN
ell	0.769	0.536	0.224	1.720	0.042	0.030	218.0	221.0	NaN
eta	0.500	0.527	0.108	1.240	0.038	0.027	224.0	248.0	NaN
ell2	3.982	2.831	0.173	8.617	0.125	0.089	375.0	305.0	NaN
eta2	3.867	2.728	0.032	8.138	0.133	0.094	247.0	95.0	NaN
kappa[0]	1.476	1.231	0.054	3.753	0.047	0.036	562.0	295.0	NaN
kappa[1]	1.476	1.254	0.001	3.782	0.050	0.036	316.0	189.0	NaN
kappa[2]	1.499	1.149	0.016	3.520	0.047	0.033	404.0	260.0	NaN
kappa2[0]	1.655	1.357	0.008	4.084	0.058	0.043	292.0	206.0	NaN
kappa2[1]	1.636	1.244	0.041	3.758	0.048	0.034	480.0	303.0	NaN
kappa2[2]	1.091	0.988	0.023	2.943	0.041	0.030	343.0	244.0	NaN
sigma	0.153	0.010	0.133	0.169	0.000	0.000	602.0	333.0	NaN

	0	1	2	3	4	5	6	7	8	9
0	'40-49'	'premeno'	'15-19'	'0-2'	'yes'	'3'	'right'	'left_up'	'no'	'recurrence-events'
1	'50-59'	'ge40'	'15-19'	'0-2'	'no'	'1'	'right'	'central'	'no'	'no-recurrence-events'
2	'50-59'	'ge40'	'35-39'	'0-2'	'no'	'2'	'left'	'left_low'	'no'	'recurrence-events'
3	'40-49'	'premeno'	'35-39'	'0-2'	'yes'	'3'	'right'	'left_low'	'yes'	'no-recurrence-events'
4	'40-49'	'premeno'	'30-34'	'3-5'	'yes'	'2'	'left'	'right_up'	'no'	'recurrence-events'

Danh Phan’s Blog

Multi-output Gaussian Processes in PyMC [GSoC Final Report]

1. Work has been done?

2. Work needs to be done

3. A few thoughts on the project

Multi-output Gaussian Processes in PyMC [GSoC Week 10-12]

Set up training data: same X, three Y outputs

Option 1: Implement ICM (one kernel) by using LatentKron with Coregion kernel

Create a model

Prediction

Plot the first GP

Plot the second GP

Option 2.1: Implement ICM (one kernel) by using pm.gp.cov.Kron with pm.gp.Marginal

Prediction

Plot the GP prediction

Option 2.2: Implement LCM by using pm.gp.cov.Kron with pm.gp.Marginal

Prediction

Plot the GP prediction

Multi-output Gaussian Processes in PyMC [GSoC Week 07-09]

Set up training data

ICM: one kernel

LCM: two or more kernels

Multi-output Gaussian Processes in PyMC [GSoC Week 04-06]

Set up training data

LCM model in PyMC

Multi-output Gaussian Processes in PyMC [GSoC Week 01-03]

1. What has been done?

2. Discussions

The issue of Mass matrix contains zeros on the diagonal

How to use a Kronecker product?

3. Next steps

A [maybe] better way to learn Gaussian Process

1. Pre-knowledge

2. Gaussian Process [Once Gaussian always Gaussian]

2.1. Kernels

2.2. Introduction to Gaussian Process

2.3. Gaussian Process Summer Schools

2.4. Deep dive into GP

Small notes

Gaussian Process from Scratch [WIP]

1. A kernel example

2. Sampling from prior

Cholesky decomposition in numpy

TODO: Convert K-1 by using cholesky

Transformers from HuggingFace :)

Sentiment analysis

Zero-shot classification

Text generation

Mask filling

Named entity recognition

Question answering

Summarization

Translation

Self Attention from scratch

Generate synthesis data

Scaled dot product attention

Multi-head Attention

References:

Encoding categorical features

Encoding Categorical Data

Ordinal Encoding

One-Hot Encoding

Dummy Variable Encoding

Categorical Encoding example

OrdinalEncoder Transform

OneHotEncoder Transform