Topic Models and Dimensionality Reduction

07 Dec 2017 . category: Notes . Comments

Topic flow chart (SVG)

Over the past few weeks, I spent some time building this slide deck1 as a comprehensive overview of the subject of topic models in NLP. It has been assembled from many other online resources, which I took care to cite throughout.


My motivations were many. I wanted to push myself to provide more visual content on this blog, and an original slide deck seemed like a good stepping stone towards that.

I also wanted to learn more about unsupervised learning techniques (having spent most of my recent time in the supervised domain), in hopes of ultimately building a prototype that could annotate notes to improve product search at work2.

Further, I wanted to brush up on my statistical fundamentals, as dimensionality reduction is a technique that seems to keep popping up everywhere in machine learning.

The fact that there’s a robust, popular Python library specifically for topic modeling in gensim proves (to me, at least) how interesting and practical topic modeling is. I hope you agree!

Exploratory Jupyter notebook for the slide deck: http://nbviewer.jupyter.org/github/iconix/nlp-sandbox/blob/master/topic_models.ipynb

Footnotes

  1. PowerPoint today, HTML5 next. I was not aware of how low res PowerPoint Online renders slide decks… 

  2. The prototype will have to wait though, as unfortunately, business priorities on the team have shifted enough to put the ML incubation team on pause for several months. 


Me

Nadja does not particularly enjoy writing about herself.