Riken AIP Workshop 2019
Weakly Supervised Classification
Motivation
- Machine learning from big data is already successful
- In some cases, massive labelled data is not available
- Classification from limited information
Supervised Classification
A large number of labeled samples yield better classification
performance.
Optimal convergence rate is
Unsupervised Classification
Since collecting labelled samples is costly, we can learn a classifier from unlabelled data. This is equivalent to clustering
Semi-supervised Classification
- Use a large number of unlabelled samples and a small number of labelled samples.
- Find a decision boundary along cluster structure induced by unlabelled samples.
Positive Unlabelled Classification
Given positive and unlabelled samples:
Risk of classifier can be decomposed into two terms:
- Risk for positive data
- Risk for negative data
Since we do not have negative data in the positive unlabelled data in the PU setting, the risk cannot be directly estimated.
U-density is a mixture of positive and negative densities:
Through this we can find an unbiased risk estimator.
Estimating error bounds, we can show that PU learning can be better than PN provided a large number of PU data.
PNU Classification
- Train PU, PN, and NU classification, and combine them.
- Unlabelled data always helps without cluster assumptions
- Use unlabelled data for loss evaluation (reducing the bias), not for regularisation.
Pconf Classification
Only positive data is available:
- data from rival companies cannot be obtained
- Only successful examples are available
If we have positive data with confidence, we can train a classifier.
Others: Similar-unlabelled etc.
Fast Computation of Uncertainty in Deep Learning
- author
- Emtiyaz Khan
- links
- https://emtiyaz.github.io/
Uncertainty quantifies the confidence in the prediction of a model, i.e., how much it does not know.
Uncertainty in Deep Learning
Data given parameters, output given NN(input)
- Generate a prior distribution
Approximating Inference with Gradients
Find the
Using natural-gradients leads to faster and simpler algorithm than gradients methods.
Data-efficient Probabilistic Machine Learning
Bryan Low
Gaussian Process (GP) Models for Big Data.
Gaussian Process
- Is a rich class of Bayesian, non-parametric models
- A GP is a collection of rvs any finite subset of which belongs to a univariate
Task Setting
- Agent explores unknown environment modelled by GP
- Every location has a reward
Lipschitz Continuous Reward Functions
- R_1 Lipschitz continuous (current measurement)
- R_2 Lipschitz continuous after convolution with Gaussian kernel (current measurement)
- R_3 Location History, independent of current measurement