People worry that computers will get too smart and take over the world, but the real problem is that they’re too stupid and they’ve already taken over the world.
— The Master Algorithm by Pedro Domingos (p. 286)
Since reading The Master Algorithm by Pedro Domingos, a University of Washington professor of Computer Science, I’ve been engaged in a thought experiment on how to run Principal Component Analysis (PCA) on the field of machine learning itself… namely, what are the principal components of ML? How can I reduce and differentiate all the jargon, discovering the shape of the field, reducing its dimensionality?
It’s a fun and challenging problem to think about, searching for the minimum set of concepts to describe “the direction along which the spread [of machine learning] is greatest” (p. 214)4. Domingos favors his “five tribes” idea in the book as something like a first principal component, an overarching structure for understanding machine learning at its most essential.
Tribe5 Representation6 Evaluation7 Optimization8 Symbolists Logic Accuracy Inverse Deduction Connectionists Neural Networks Squared Error Gradient Descent Evolutionaries Genetic Programs Fitness Genetic Search Bayesians Graphical Models Posterior Probability Probabilistic Inference Analogizers Support Vectors Margin Constrained Optimization translated from figure on p. 240.
The five tribes analogy also got me thinking about Brandon Rohrer’s breakdown of the “5 questions data science answers”9:
When I view the “5 tribes” through the lens of the “5 questions,” I realize that Domingos’s tribe analogy only extends to the first 3 questions, which all can be answered with supervised machine learning techniques. Domingos touches on techniques that deal with unsupervised learning (question 4) and reinforcement learning (question 5), but he does so in a single chapter (8~Learning Without a Teacher) that strangely does not tie back to the discussion of which tribes prefer what techniques… rather, unsupervised learning and reinforcement learning are treated like tricks in an opaque bag. It’s as if Domingos is suggesting the tribes have no opinions outside of the supervised domain, which if true, would be a huge weakness for the tribes being the first principle component. The 5 questions may be a better choice.
So, what is The Master Algorithm? It is … “a general-purpose learner” (p. xxi) … an algorithm that, “if it exists, [it] can derive all knowledge in the world—past, present, and future—from data” (p. xviii) … an algorithm that can “learn to simulate any other algorithm by reading examples of its input-output behavior” (p. 34) … “the unifier of machine learning: it lets any application use any learner, by abstracting the learner into a common form that is all the applications need to know” (p. 237).
What a promise.
What drew me to the book was the idea that I could get layman exposure to the entire problem space of machine learning: what types of problems it can solve, what current solutions look like, and how to express these pieces. It delivered exceedingly well for me on that level.
Originally this post was going to be extremely long and encompass many sources to output a veritable “guide to machine learning” - I figured that teaching what it is would help me learn what it is. My notes and references got longer and longer, and compiling them into a post became too daunting a task. I’m also trying to keep in mind that I tend to not bother reading “walls of words” whenever I come across them in the wild… so I should try not to write them. I haven’t given up on compiling a guide - but I’d like it to be much less verbose, maybe even visual.
From the book: “Textbooks are liable to give you math indigestion. This difficulty is more apparent than real, however. All of the important ideas in machine learning can be expressed math-free” (9). ↩
A couple flourishes that pulled me out of the book: “A programmer—someone who creates algorithms and codes them up—is a minor god, creating universes at will” (p. 4); day-in-the-life scenarios from the Prologue like, “Crime in your city is noticeably down since the police started using statistical learning to predict where crimes are most likely to occur and concentrating beat officers there” (p. xv) - sure, because ethics in policing will be that straightforward. ↩
The This Is the World on Machine Learning chapter is ostensibly the chapter to tackle these issues, and he does discuss topics such as robot warfare, restaurants with human waiters as the future’s hipster/nostalgic trend, the idea that reducing the employment rate will be the new sign of progress in this future (“The transition will be tumultuous, but thanks to democracy, it will have a happy ending” …right), and killer AI (favorite contradiction found on p. 283: “The chances that an AI equipped with the Master Algorithm will take over the world are zero… Of course, if we’re so foolish as to deliberately program a computer to put itself above us, then maybe we’ll get what we deserve”). I don’t know which it is - whether I lack imagination at this level, or if Domingos is too quick to brush off how engrained certain human behavior (ambition, pride, harming the ‘other’) is. I did, however, enjoy the discussion in this same chapter of the digital mirror and a society of models and federally-insured data banks. ↩
Borrowing from Domingos’s definition of PCA. ↩
Symbolists believe in the power of pre-existing knowledge and manipulating symbols; Connectionists base their practice on reverse-engineering of the brain; Evolutionaries see natural selection and learning structure as the key; Bayesians hold uncertainty as inescapable; Analogizers recognize similarities and use them to infer other similarities. ↩
Representation: “the formal language in which the learner expresses its models” (p. 240). ↩
Evaluation: “a scoring function that says how good a model is” (p. 241). ↩
Optimization: “the algorithm that searches for the highest-scoring model and returns it” (p. 241). ↩
This 25-minute “Data Science for Beginners” video series is one of my favorite data science references. ↩