Podcast: Jordan Ellenberg -- Why Math Matters in Business

Great business leaders are often seen as innovators, inspirational storytellers and brilliant leaders. They are keen and decisive observers. But would we envision any mathematical principles in their toolkit? Just as business finds solutions to various problems and hurdles, mathematical formulas and practices make sense of our chaotic world. What can business-minded individuals learn from these insights? How do principles of randomness and probability factor into shrewd business planning?

Jordan Ellenberg is the internationally-bestselling author of How Not To Be Wrong and the recently-published Shape: The Hidden Geometry of Information, Biology, Strategy, Democracy, and Everything Else. He holds a master’s in fiction writing from John Hopkins and a Ph.D. in math from Harvard. He has been writing for a general audience about math for over fifteen years and advocates for leaning into the anxieties and misunderstandings many of us have about mathematics.

Transcript: Jordan Ellenberg

Chris Riback: Jordan, thanks for joining me. I appreciate your time.

Jordan Ellenberg: Thanks for having me on, Chris.

Chris Riback: We live obviously in an age where everything has to be measured. And here I am talking to a mathematician as we sit, we wallow in this sea of data, making the point that yes, data matters but the narrative matters, too. The prose matters, too. How do you think about our times and the role? Is it an oversized role that we seem to give data in terms of driving decision making?

Jordan Ellenberg: I think that can be the case, yes, and I think it often is the case. There’s a great book by my friend Cathy O’Neil called Weapons of Math Destruction, which is all about the things that can go wrong, and I mean really wrong, when you blindly say, “Okay, there’s an algorithm, there’s a spreadsheet and whatever the spreadsheet says, that’s the truth. That’s what we’re going to go by.”

I’ll give you an example. To Facebook, I’m a long list of numbers. I’m maybe an ASCII characters and a long list of numbers that represents to it where I sit in a social network, what products it may think I’m likely to buy, what age it thinks I am, because I haven’t told it my birthday. But, it can almost certainly figure out that pretty well. Some long list of some number of numbers.

Now, those are facts about me. The danger is if you start to think that is me, because I’m not a series of 1000 numbers and neither are you.

Chris Riback: That’s fantastic.

Jordan Ellenberg: That’s the mistake you don’t want to make.

Chris Riback: For our business listeners, would you argue that’s a pretty good reason why business people should understand geometry?

Jordan Ellenberg: Yes, I think, among most teachers to say why are we teaching your kids, or you, all this stuff about triangles? Is it because you got to go out into the world with a lot of knowledge about triangles? No, not so much. I love triangles, but it is true that I think most people see it as the point is to teach a certain mode of thinking.

Now, what is that mode? What we do in geometry that’s different from everything else we do in the high school curriculum is we prove theorems. We really prove them by a strict process of logical deduction. Again, you might think, “Oh, I see. The point is to learn to prove theorems.” Kind of, but it’s also true that out there in the business world, nobody’s ever going to ask you to prove to theorem. That’s not what you’re actually doing.

Okay, now I seem to have painted myself into a corner. So, what is the point? Well, the point is that people say they’re proving things all the time. Prove is a much used word. I always say every time I hear someone say the word therefore, my red flag goes up. People all the time are trying to basically assertively express a chain of opinions and then say, “See my logic, you have to accept my conclusion.” I think the point of knowing what a theorem is to know what a theorem is not.

It’s to be able to look at that and say, “That is not the thing we do in Euclidean geometry. I can see the spaces between your assertions where you want B to follow from A, but really all that happened is you said A and then you said B.”

Chris Riback: I am so glad that you made that point. I had a teacher once … and as a result, it was just beaten into me and I don’t use it … who would not allow us to ever us the word therefore in our writing and highlighted that therefore was a symbol of an illogical conclusion.

Jordan Ellenberg: Right. It’s a sign. If you find yourself reaching for it and having to say it, it’s probably a sign that you have not justified what you’re saying well enough because you wouldn’t feel the need to reach for that big hammer of a word.

Chris Riback: I’m going to make an assumption here and you’ll please, please correct me if I have it wrong. Among the most significant current intersections of math and business are around machine learning and around the applications of that. I can see on the Zoom here that you’re shaking your head yes, so maybe I’m not so far off.

Explain to me, if you would, what kind of math is machine learning and what in the world does it have to do with being a mountaineer.

Jordan Ellenberg: This is one of the parts of the book I was most excited about writing because, of course, machine learning is incredibly important. It’s incredibly exciting from a scientific point of view. It’s not my research area but I go to tons of seminars about it and learn a lot about it from the practitioners. In many ways, it’s a new field of math that we’re building under our feet as we go. It’s tremendously exciting, and most of my scientific and research life, I’m working in very classical areas of math where the ground rules were laid literally thousands of years ago.

This is a very exciting scientific moment. That being said, a lot of people treat this stuff like magic and it’s not magic. In some sense, the basic mathematical principles, which I’m about to tell you of what machine learning is, are very simple. And, it’s this: It’s a dressed up version of trial and error. The metaphor that I rely on in the book is imagine you’re a mountaineer and you’re trying to get to the top of mountain. But now let’s imagine that the landscape that you’re exploring is completely overgrown with brush.

It’s a deep forest, and you cannot see the top of the mountain from where you are. You can’t even see what direction it is. All you can see is the immediate neighborhood around you. What would you do? Well, there’s actually a pretty good answer to what you would do. You would look at the ground where you’re standing and try to see in which direction is the upward slope, and maybe of all the directions you would walk, you could see which one has the steepest upward slope and then you take a couple steps that direction and then you reassess.

Now you’re standing in new place where maybe the slope is difference and now you figure out which way to go affords you the highest slope. What I just told you seems very simple minded, right? But that’s it, dude. That’s machine learning.

Chris Riback: That’s all it is.

Jordan Ellenberg: It’s that plus some technical details, but literally that’s a process which in math we would call gradient descent. That’s the fancy name for just being mountaineer and looking for the direction of steepest slope.

There are details, but the basic thing that’s driving the method is this very simple trial and error. What’s the best small change I can make to the thing I’m doing right now? Do that and then ask yourself again what’s the best small change from the thing I’m doing right now? It’s such a simple principle, but nobody ever says that. People describe it as if just some Merlin person is stirring a vast cauldron of math and Spotify comes out of it or something. It’s not like that.

Chris Riback: I think the use of something like therefore, or it is evident that the tendency of many us to make things like say machine learning, make an explanation of something so much more complicated than it has to be where you just made it clear, think of it like mountaineering, is also a bit of a trick to try to intimidate others and keep others … Explain something in a convoluted, confusing way and it puts the other person in a situation where they now have to ask, “What do you mean?”, and then well, if they don’t understand what you mean, maybe they’re not smart enough to get it in the first place.

Jordan Ellenberg: Yes, and I would hope that in the scientific community, that motive is there. Most of us are teachers by profession, like those of us who are in academia, so hopefully we are not trying to confuse or intimidate. The problem is that when academic scientists talk to each other, we have a technical shorthand that we’ve built up over many years and it’s a very efficient means of communication. It’s not there to be confusing or to intimidate other people. It’s there to get things done quickly and precisely.

But, it’s not built for outsiders. I want to be a conduit to say I’m going to put aside the complicated … because as I said, in the implementation, there’s lots of super, super complicated stuff. I don’t want to lie about that, but I also want to be able to tell people what the main ideas are.

Chris Riback: Well, it’s quite a skill to be able to speak both languages for sure. Continuing with this idea and the machine learning, and its applications to the rest of our lives, how significant of a risk is local optimum?

Jordan Ellenberg: A local optimum, what that means is the following: you’re carrying out this strategy of at each moment, find one small change you can make that improves your strategy a little bit, that seems to give you better results. Now it might happen that you get to a place and you’re like, “Wow, any little change I make, it doesn’t seem to improve my situation. Maybe any little change would make my situation worse.”

You could say, “Okay, at this point, I found it. I found it. I’m at the summit. I’m at the best of all possible worlds,” but that might not always be true. You might be at the top of some little helix that’s actually far away from the main summit. The example I use of this in the book is it’s like when you’re caught in a procrastination loop. You have some massive pile of crap that’s destroying your desk and office space, and you know you have to deal with it. You know it’s stuff you’re never going to deal with, so you should throw it out but you can’t bring yourself to do it.

At any given moment, the small step of starting to do that, is that going to make your life better? No, it’s going to make your life worse in the short term and in the immediate situation. Your day is going to be better if you don’t start dealing with it.

But of course, you’re not at the real optimum because the real optimum is if you actually deal with the whole pile and deal with it all. It’s like in mountaineering, sometimes you have to go down in order to go up. Sometimes you have to get off you’re little helix, go down through the valley to get up to the main summit. That being said, one of the great mysteries of this subject in contemporary machine learning is that this very simple minded process of always just looking for short term advantage, this so called greedy algorithm of gradient descent and just making small changes that improve strategy, that definitely theoretically and in principle can get caught in a local optimum like that.

But usually, it doesn’t and we don’t know why. That’s one of the most interesting theoretical questions. Again I, with one hand, am trying to make the subject sound simple, what the actual mechanism you use is pretty simple, but I also want to make the subject sound deep, because it is because it works much better than it has any right to. That’s a huge open theoretical question. We don’t actually really understand why.

Chris Riback: Would you apply that in any way as you think about the we don’t know why? Is there any aspect of that around that can applied to business decision making? On the one hand, I’m listening to you and the first half of what you said made me feel really good about myself. You were giving a mathematical explanation to why I procrastinate, and I plan to quote you on it with my wife and explain why my stuff’s all over. But then you said well, but sometimes you do have to take a step down in order to go forward.

Are there ways to think about it mathematically where you know when you should take that step back, let’s say, to go forward? That step down to go up? Or, does it not work that way?

Jordan Ellenberg: First of all, I think just having the conceptual language of talking about it I find helpful. At least for me. I’m N of one so I can only say that anecdotally. I would also say … Well, a couple things. One, absolutely there’s a real difficulty there because it may not be obvious. You can say, “This hill is not that high. Maybe I need to go down in order to go up.” But now it may not be so obvious which direction to go. Maybe if you choose the wrong way, you’re going down in order to go even further down.

Look, nobody said business was going to be easy. There’s some questions that you can’t immediately answer. But I would say actually to me it seems like some of the strategies that machine learning people have devised to get out of local optimum when you are stuck in them, one is just take a random big step. Literally random. Choose a direction at random and take a big step instead of a small step and see if maybe that gets you some place from which you can get to the summit.

The randomness is important because you can get stuck saying, “But, how do I know what big leap to take?” You got to accept that you don’t know, and certainly in math, being a mathematician, you have to accept all the time that there’s a lot of stuff you don’t know and that you’re going to make some leap, make some action forward, try some strategy with no good reason to think it will work.

Chris Riback: What are under-fitting and over-fitting? Which is more problematic? How do we avoid both of them?

Jordan Ellenberg: Oh, it’s like trying to pick between your two kids. I can never say which is more problematic. Each child is problematic in their own way, right? There are two things that can go wrong when you’re trying to develop a machine learning solution to some problem that you’re trying to study. Under-fitting is a little simpler. It means you just haven’t taken into account enough the data that you have in front of you, the examples that you’ve already seen.

So if you were trying to develop some business strategy, it might mean you just didn’t really take into account what happened before when you did similar strategies. And if you do that, you’re probably going to develop a strategy that doesn’t work so well going forward.

Over-fitting is more subtle. Let’s see how best to do it verbally, because over-fitting means taking the data of the past too much into account, which it seems like wait, isn’t more data always good? Isn’t refining your strategy to more closely be something that would have worked well if you’ve done it in the past, isn’t that the best possible strategy? Well, not necessarily.

For example, you might say … All right, give me a moment to formulate this one because this is a little bit of a challenge but I’m going to do it. I’m just going to sip my coffee, because that’s what I do when I need to think.

Chris Riback: That’s the card you pulled from the deck.

Jordan Ellenberg: Exactly. A typical example of an over-fit strategy might be something like that. You say my strategy is if I encounter a situation in my business that is exactly something I’ve encountered before, I take the course of action that would’ve been the best for that situation knowing what I know now. I look back on all the decisions I’ve made, I decide whether they were right or wrong, and then if I encounter that situation again, I’m going to do the thing that’s right instead of the thing that’s wrong.

And, if I encounter any situation that is not exactly something I’ve encountered before, then I just flip a coin because I have no idea what to do.

Okay, that is not so useful of a strategy to have. It’s not awful, but it’s pretty bad because it means it’s perfectly fit to the data that you have in front of you. It works perfectly on every situation you’ve already experienced, but it totally fails to generalize to new experiences.

So in some sense, where you want to be is somewhere in between those two things. You want to be able to learn from the past, but you want to do something that’s not just memorizing, it’s actually learning. One way to put it, because this brings us back to geometry, is that you want to recognize when a situation you’re facing right now is not identical with a situation before but it’s close enough. You might say, “What’s the closest situation to the current predicament I face?”

Well, there’s that word again, close. The moment you say that, you’re saying there’s some geometry of all possible situations where some are near to each other and some are far from each other. Any machine learning scenario is exactly based on that.

Chris Riback: In this part, I found myself thinking about the cost and benefits of postmortems. Perhaps you do them after each semester after you teach a class. Maybe you personally, or with a colleague or the dean of your school, go through postmortem. What went well? What could I have done better? It happens in business. It happens in life all the time. What I took from my reading of over-fitting and trying to understand what you just described was what becomes important is not just to go through the postmortem of what previously occurred and understand that, and maybe document it in some way, but that becomes the utility of it is its applicability going forward.

The going forward, when I then get into another situation, the key perhaps is not necessarily only having all of my notes from that previous postmortem, but is having an understanding of how close or distant the new situation is to what I went through previously and in what ways is it different, and do I now have to change my strategy based … Are those differences material or not so relevant? Can I apply the data from previous or no, is that not relevant because the delta to the new situation is so significant that it might appear to be the same as previous but in reality it’s not.

Am I getting close to what I’m supposed to be taking away?

Jordan Ellenberg: Totally. In fact, I like that so much. This happens to me a lot. Once the book comes out and you converse with people about it, it is a process of realizing all the cool stuff that you could have put in the book because different people read it and I’m like, “Oh, that concept of the postmortem.” When you say it, I’m like oh. That literally is humans doing the thing that a machine learning algorithm does. You have your existing strategy. You see the results. Then, you go back … This is akin to the process called back propagation in machine learning … and say, “Okay, these are the results I got from this strategy. If the strategy had been a little different in some way, would I have gotten better or worse results?”

That kind of reflection is, in a human way, exactly what the machine is doing. In particular, the point you make about understanding which deltas matter and which deltas don’t, this brings us to Henri Poincare who is a major figure in the book, one of the founders of modern geometry as we have it today. He’s back from the late 19th, early 20th century. He is also one of the most quotable mathematicians ever. I had to restrain myself from quoting him on every page, but one of his best slogans is he says, “Mathematics is the art of calling different things by the same name.”

Now, this is deep. This is deep because it’s exactly this point you make that you never step in the same river twice. No two situations are identical, but some differences matter and some differences don’t. You can’t think mathematically and you can’t think reasonably unless you’re willing to understand which things you’re willing to call the same because they’re the same enough. They’re the same in the ways that matter, even though they are not literally the same. They are literally different.

So this process of calling different things by the same name is so central to modern geometry but I actually think that that principle is useful for all kinds of thinking.

Chris Riback: And, it would seem to me that there are people who can be incredibly successful because they innately do the math. My presumption is it’s a skill, and maybe it’s one that some people are born with and maybe in others we can develop it, but somehow they do the math very quickly, instinctively in their heads on how big that delta is, how much the difference is and which lessons apply and which don’t.

Jordan Ellenberg: No, but actually I’m curious. Can I ask a question, Chris?

Chris Riback: Please.

Jordan Ellenberg: Because when you say there seem to be some people who seem to be able to more rapidly or more accurately mentally capture which deltas are important and which are not, are there particular people you have in mind? Are there people in the business world who are known for this? I’m just curious.

Chris Riback: I’m certain that there are. Just off the cuff, hearing your question, there are maybe two that come the mind, one in particular. One that comes to mind, and this is not an endorsement but I mean it in a very particular way potentially is Elon Musk. The part that I mean it on, and I was just having a conversation on this with a friend of mine the other day, is electronic cars, battery powered cars totally predated Elon Musk. He didn’t create … but what he noticed, what’s the delta on his execution versus most of the ones that occurred before?

In my opinion and my friend’s opinion, the biggest difference that he saw was it wasn’t necessarily about the fact that the car was electric. That wasn’t going to drive people to buy that car. It was, was it a great car? How did it drive? What were the other interfaces that it provided? What were the other things that it could do that made the experience differentiating and special?

He took what other people were doing, Chevrolet, like many other car makers, were making-

Jordan Ellenberg: And, is.

Chris Riback: And is, and yet he defined which difference really mattered. Apparently, I don’t know, I don’t own a Tesla, the batteries are really good but let’s just say the difference wasn’t that oh my God, I have to make this incredible other worldly battery that’s going to last decades longer than the other ones. No, he recognized that it was the other differences that were most material. Does that work?

Jordan Ellenberg: It does, although it’s funny. For me, the whole appeal is that it’s electric and when I actually sit in one, it feels plastic and cheap and unpleasant but I’m like, “But, it is cool that it’s electric” even though it feels dingy as a car.

Chris Riback: On the seating of the car, as I’ve sat in them, sometimes I feel that way, yes, but the other aspects of what it does, the self-driving, the integration of technology, the fact that you can just download the upgrades and all of a sudden have the next generation of the car, that stuff all seems really cool to me.

Jordan Ellenberg: Yes, and I got to say on the subject of my lack of intrinsic geometric ability, if you saw me try to parallel park, they would take away my PhD for real. That aspect does appeal to me.

Of course, in reality, the space of strategies that you’re searching when you try to develop some algorithm using a machine learning system is way more than three dimensional, right? In the case of GPT-3, one algorithm I write about which generates natural language in a really cool way … or, seemingly natural language … I think if I remember right, it’s 175 billion dimensional.

Okay, so nobody can visualize that. We are not literally visualizing that, and in some sense, one of the glories of modern geometry is we erect all this formal structure. Going back to the beginning, this set of formal and rigid rules that we learn in school, why is it important to have formal and rigid rules? Because the formality entwines with our intuition and allows us to go farther than our intuition can go. We can’t visualize 170 billion dimensional space directly but once we really have a feel for how two dimensional space and three dimensional space work, 175 billion dimensional space works pretty much the same way.

So in some sense, the level of formality I had to learn in order to overcome maybe some slight lack of innate ability to visualize things in three dimension. It serves you very well if you want to take the geometry farther.

Chris Riback: That’s fascinating. Jordan, thank you. Thank you for your insights. Thanks for just a terrific mind bending book that required and rewarded with all sorts of new thinking. Thank you for your time.

Jordan Ellenberg: Thank you for having me on. This was great.