Over the past few weeks, I spent some time building this slide deck1 as a comprehensive overview of the subject of topic models in NLP. It has been assembled from many other online resources, which I took care to cite throughout.
My motivations were many. I wanted to push myself to provide more visual content on this blog, and an original slide deck seemed like a good stepping stone towards that.
I also wanted to learn more about unsupervised learning techniques (having spent most of my recent time in the supervised domain), in hopes of ultimately building a prototype that could annotate notes to improve product search at work2.
Further, I wanted to brush up on my statistical fundamentals, as dimensionality reduction is a technique that seems to keep popping up everywhere in machine learning.
The fact that there’s a robust, popular Python library specifically for topic modeling in gensim
proves (to me, at least) how interesting and practical topic modeling is. I hope you agree!
Exploratory Jupyter notebook for the slide deck: http://nbviewer.jupyter.org/github/iconix/nlp-sandbox/blob/master/topic_models.ipynb