# Riken AIP Workshop 2019

## Weakly Supervised Classification

### Motivation

• Machine learning from big data is already successful
• In some cases, massive labelled data is not available
• Classification from limited information

### Supervised Classification

A large number of labeled samples yield better classification performance. Optimal convergence rate is $$O(n^{-\frac{1}{2}})$$.

### Unsupervised Classification

Since collecting labelled samples is costly, we can learn a classifier from unlabelled data. This is equivalent to clustering

### Semi-supervised Classification

• Use a large number of unlabelled samples and a small number of labelled samples.
• Find a decision boundary along cluster structure induced by unlabelled samples.

### Positive Unlabelled Classification

Given positive and unlabelled samples:

$${x_i^P}_{i=1}^{n_P} \sim P(x | y = + 1)$$

$${x_i^U}_{i=1}^{n_U} \sim P(x)$$

Risk of classifier can be decomposed into two terms:

1. Risk for positive data
2. Risk for negative data

Since we do not have negative data in the positive unlabelled data in the PU setting, the risk cannot be directly estimated.

U-density is a mixture of positive and negative densities:

$$R(f) = \pi E_{p(x|y=+1)} \left[ l(f(x)) \right] + (1-\pi) E_{p(x|y=-1)}\left[ l(-f(x)) \right]$$

Through this we can find an unbiased risk estimator.

Estimating error bounds, we can show that PU learning can be better than PN provided a large number of PU data.

### PNU Classification

• Train PU, PN, and NU classification, and combine them.
• Unlabelled data always helps without cluster assumptions
• Use unlabelled data for loss evaluation (reducing the bias), not for regularisation.

### Pconf Classification

Only positive data is available:

1. data from rival companies cannot be obtained
2. Only successful examples are available

If we have positive data with confidence, we can train a classifier.

Others: Similar-unlabelled etc.

## Fast Computation of Uncertainty in Deep Learning

author
Emtiyaz Khan
https://emtiyaz.github.io/

Uncertainty quantifies the confidence in the prediction of a model, i.e., how much it does not know.

### Uncertainty in Deep Learning

$$p(D|\theta) = \prod_{i=1}^{N} p(y_i | f_\theta (x_i))$$

Data given parameters, output given NN(input)

1. Generate a prior distribution $$\theta \sim p(\theta)$$

$$p(\theta | D) \approx q(\theta) = N(\theta | \mu, \sigma^2)$$

Find the $$\mu$$ and $$\sigma^2$$ such that $$q$$ is close to the posterior distribution.

$$max L(\mu, \sigma^2) = E_q\left[ \log \frac{p(\theta)}{q(\theta)} \right] + \sum_{i=1}^N E_q \left[ \log p(D_i|\theta) \right]$$

## Data-efficient Probabilistic Machine Learning

Bryan Low

Gaussian Process (GP) Models for Big Data.

### Gaussian Process

• Is a rich class of Bayesian, non-parametric models
• A GP is a collection of rvs any finite subset of which belongs to a univariate

$$R(z_t, s_t) \overset{\Delta}{=} R_1(z_t) + R_2(z_t) + R_3(s_t)$$