http://www.cs.columbia.edu/~blei/topicmodeling.html LDA survey - Github

LDA

The Little Book on LDA https://www.youtube.com/watch?v=FkckgwMHP2s http://www.cs.columbia.edu/~blei/papers/Blei2012.pdf

Dirichlet Distribution

https://www2.ee.washington.edu/techsite/papers/documents/UWEETR-2010-0006.pdf

Dirichlet distribution is a family of continuous multivariate probability distributions parameterized by a vector α of positive reals.

\begin{equation} \theta \sim Dir(\alpha) \end{equation}

\begin{equation} p(\theta) = \frac{1}{\beta(\alpha)} \prod_{i=1}^n \theta_i^{\alpha_i-1} I(\theta \in S) \end{equation}

Where \(\theta = (\theta_1, \theta_2, \dots, \theta_n), \alpha = (\alpha_1, \alpha_2, \dots, \alpha_n), \alpha_i > 0\) and

\begin{equation} S = \left{x \in \mathbb{R}^n : x_i \ge 0, \sum_{i=1}^{n} x_i = 1 \right} \end{equation}

and \(\frac{1}{\beta(\alpha)} = \frac{\Gamma(\alpha_0)}{\Gamma(\alpha_1)\Gamma(\alpha_2)\dots\Gamma(\alpha_n)}\)

The infinite-dimensional generalization of the Dirichlet distribution is the Dirichlet process.

The Dirichlet distribution is the conjugate prior distribution of the categorical distribution (a generic discrete probability distribution with a given number of possible outcomes) and multinomial distribution (the distribution over observed counts of each possible category in a set of categorically distributed observations). This means that if a data point has either a categorical or multinomial distribution, and the prior distribution of the distribution’s parameter (the vector of probabilities that generates the data point) is distributed as a Dirichlet, then the posterior distribution of the parameter is also a Dirichlet.

Exploring a Corpus with the posterior distribution

Quantities needed for exploring a corpus are the posterior expectations of hidden variables. Each of these quantities are conditioned on the observed corpus.

Visualizing a topic is done by visualizing the posterior topics through their per-topic probabilities \(\hat{\beta}\).

Visualizing a document uses the posterior topic proportions \(\hat{\theta}_{d,k}\) and the posterior topic assignments \(\hat{z}_{d,k}\).

Finding similar documents can be done through the Hellinger distance:

\begin{align*} D_{d,k} = \sum_{k=1}^K \left( \sqrt{\hat{\theta}_{d,k}} - \sqrt{\hat{\theta}_{f,k}}\right)^2 \end{align*}

Posterior Inference

Markov Chains

http://setosa.io/ev/markov-chains/

Shortcomings

TopicRNN

http://www.columbia.edu/~jwp2128/Papers/DiengWangetal2017.pdf

In TopicRNN, latent topic models are used to capture global semantic dependencies so that the RNN can focus its modeling capacity on the local dynamics of the sequences

Potential Research Topics

TODO Visualization of Perplexity for topic models as a potential topic?