The Master Algorithm

23 Sep 2017 . category: Notes . Comments

People worry that computers will get too smart and take over the world, but the real problem is that they’re too stupid and they’ve already taken over the world.

— The Master Algorithm by Pedro Domingos (p. 286)

Book Takeaways

+ Great at summarizing a broad field that is chock-full of seemingly complex concepts; math-free¹ with lots of vocabulary clearly illustrated.
+ Does a great job at one of its stated goals of providing a conceptual model of machine learning, by breaking down its prevailing schools of thought on how to answer questions like, “how do we learn? is there a better way? what can we predict? can we trust what we’ve learned?” (p. xvii).
+ As one blurb on the back cover of the book says, “writing breezily but with deep authority” is something that seems to come easily to Domingos. I enjoyed treating this book like a textbook full of gems from an almost 30-year career.
− Bombastic at times about the power and opportunities of machine learning, especially in the earlier chapters.²
− Its second stated goal of enabling readers to invent the Master Algorithm for themselves (p. xviii)… yeah, I’m not quite there. I appreciate Domingos’s optimism though.
− Did not go as deeply into the ethics of machine learning as I had hoped, despite touting this in the prologue.³

What is Machine Learning?

Since reading The Master Algorithm by Pedro Domingos, a University of Washington professor of Computer Science, I’ve been engaged in a thought experiment on how to run Principal Component Analysis (PCA) on the field of machine learning itself… namely, what are the principal components of ML? How can I reduce and differentiate all the jargon, discovering the shape of the field, reducing its dimensionality?

It’s a fun and challenging problem to think about, searching for the minimum set of concepts to describe “the direction along which the spread [of machine learning] is greatest” (p. 214)⁴. Domingos favors his “five tribes” idea in the book as something like a first principal component, an overarching structure for understanding machine learning at its most essential.

Tribe⁵ Representation⁶ Evaluation⁷ Optimization⁸

Symbolists Logic Accuracy Inverse Deduction

Connectionists Neural Networks Squared Error Gradient Descent

Evolutionaries Genetic Programs Fitness Genetic Search

Bayesians Graphical Models Posterior Probability Probabilistic Inference

Analogizers Support Vectors Margin Constrained Optimization

translated from figure on p. 240.

Tribe⁵	Representation⁶	Evaluation⁷	Optimization⁸
Symbolists	Logic	Accuracy	Inverse Deduction
Connectionists	Neural Networks	Squared Error	Gradient Descent
Evolutionaries	Genetic Programs	Fitness	Genetic Search
Bayesians	Graphical Models	Posterior Probability	Probabilistic Inference
Analogizers	Support Vectors	Margin	Constrained Optimization

The five tribes analogy also got me thinking about Brandon Rohrer’s breakdown of the “5 questions data science answers”⁹:

Is this A or B? => Classification algorithms
Is this weird? => Anomaly detection algorithms
How much—or—How many? => Regression algorithms
How is this organized? => Clustering algorithms, Dimensionality reduction
What should I do next? => Reinforcement learning algorithms

When I view the “5 tribes” through the lens of the “5 questions,” I realize that Domingos’s tribe analogy only extends to the first 3 questions, which all can be answered with supervised machine learning techniques. Domingos touches on techniques that deal with unsupervised learning (question 4) and reinforcement learning (question 5), but he does so in a single chapter (8~Learning Without a Teacher) that strangely does not tie back to the discussion of which tribes prefer what techniques… rather, unsupervised learning and reinforcement learning are treated like tricks in an opaque bag. It’s as if Domingos is suggesting the tribes have no opinions outside of the supervised domain, which if true, would be a huge weakness for the tribes being the first principle component. The 5 questions may be a better choice.

The Master Algorithm

So, what is The Master Algorithm? It is … “a general-purpose learner” (p. xxi) … an algorithm that, “if it exists, [it] can derive all knowledge in the world—past, present, and future—from data” (p. xviii) … an algorithm that can “learn to simulate any other algorithm by reading examples of its input-output behavior” (p. 34) … “the unifier of machine learning: it lets any application use any learner, by abstracting the learner into a common form that is all the applications need to know” (p. 237).

What a promise.

What drew me to the book was the idea that I could get layman exposure to the entire problem space of machine learning: what types of problems it can solve, what current solutions look like, and how to express these pieces. It delivered exceedingly well for me on that level.

A note on my blogging process

Originally this post was going to be extremely long and encompass many sources to output a veritable “guide to machine learning” - I figured that teaching what it is would help me learn what it is. My notes and references got longer and longer, and compiling them into a post became too daunting a task. I’m also trying to keep in mind that I tend to not bother reading “walls of words” whenever I come across them in the wild… so I should try not to write them. I haven’t given up on compiling a guide - but I’d like it to be much less verbose, maybe even visual.

Footnotes

From the book: “Textbooks are liable to give you math indigestion. This difficulty is more apparent than real, however. All of the important ideas in machine learning can be expressed math-free” (9). ↩
A couple flourishes that pulled me out of the book: “A programmer—someone who creates algorithms and codes them up—is a minor god, creating universes at will” (p. 4); day-in-the-life scenarios from the Prologue like, “Crime in your city is noticeably down since the police started using statistical learning to predict where crimes are most likely to occur and concentrating beat officers there” (p. xv) - sure, because ethics in policing will be that straightforward. ↩
The This Is the World on Machine Learning chapter is ostensibly the chapter to tackle these issues, and he does discuss topics such as robot warfare, restaurants with human waiters as the future’s hipster/nostalgic trend, the idea that reducing the employment rate will be the new sign of progress in this future (“The transition will be tumultuous, but thanks to democracy, it will have a happy ending” …right), and killer AI (favorite contradiction found on p. 283: “The chances that an AI equipped with the Master Algorithm will take over the world are zero… Of course, if we’re so foolish as to deliberately program a computer to put itself above us, then maybe we’ll get what we deserve”). I don’t know which it is - whether I lack imagination at this level, or if Domingos is too quick to brush off how engrained certain human behavior (ambition, pride, harming the ‘other’) is. I did, however, enjoy the discussion in this same chapter of the digital mirror and a society of models and federally-insured data banks. ↩
Borrowing from Domingos’s definition of PCA. ↩
Symbolists believe in the power of pre-existing knowledge and manipulating symbols; Connectionists base their practice on reverse-engineering of the brain; Evolutionaries see natural selection and learning structure as the key; Bayesians hold uncertainty as inescapable; Analogizers recognize similarities and use them to infer other similarities. ↩
Representation: “the formal language in which the learner expresses its models” (p. 240). ↩
Evaluation: “a scoring function that says how good a model is” (p. 241). ↩
Optimization: “the algorithm that searches for the highest-scoring model and returns it” (p. 241). ↩
This 25-minute “Data Science for Beginners” video series is one of my favorite data science references. ↩

Nadja does not particularly enjoy writing about herself.