This article is part of the series "Kaiser Fung and Number Sense". Check out the rest:
Leave a review for our podcast & we'll send you a pack of infosec cards.
We continue the conversation by discussing the accuracy of statistics and algorithms. With examples such as shoe recommendations and movie ratings, you’ll learn where algorithms fall short.
Cindy Ng: In part one, Kaiser taught us the importance of looking at the process behind a numerical finding. And today, we’ll to continue in part two on how to cultivate numbersense.
Kaiser, do you think algorithms are the answer. And when you’re looking at a numerical finding, how do you know what questions to ask?
Kaiser Fung: So I think…I mean, they are obviously a big pile of questions that you ask but I think that the most important question not asked out there is the question of accuracy. And I’ve always been strucken, I keep mentioning to my blog readers this, is that if you open up any of the articles that are written up, whether the it’s the New York Times, Wall Street Journal, you know all these papers have big data articles and they talk about algorithms, they talk about predictive models and so on. But you can never find a quantified statement about the accuracy of these algorithms.
They would all qualitatively tell you that they are all amazing and wonderful. And really it all starts with understanding accuracy. And in the Numbersense book, I addressed this with the target example of the tendency models. But also in my previous book, I talk in the whole thing around steroids and also lie detector testing, because it’s all kind of the same type of framework. It’s really all about understanding the multiple different ways of measuring accuracy. So starting with understanding false positive and false negative. But really they are all derived with other more useful metrics. And you’ll be shocked how badly these algorithms are.
I mean it’s not that…like for a statistical perspective, they are pretty good. I mean, I try to explain to people, too. It’s not that we’re all kind of snake oil artist that we…these algorithms do not work at all. I mean, usually, they work if you were to compare with not using the algorithm at all. So you actually have incremental improvements and sometimes pretty good improvements over the case of not using an algorithm.
Now, however, if the case of not using the algorithm leads to, let’s say 10% accuracy, and now we have 30% accuracy, you would be three times better. However, 30% accuracy still means that 70% of the time you got the wrong thing, right? So there’s an absolute versus relative measurement here that’s important. So once you get into that whole area, it’s very fascinating. It’s because usually the algorithms also do not really make decisions and they are specific decision rules that are in place because often times the algorithms only calculate a probability of something.
So by analogy, the algorithm might tell you that there’s a 40% chance of raining tomorrow. But somebody has to create a decision rule that says that, you know, based on…I mean, I’m going to carry umbrella if it’s over 60%…So there’s all these other stuff involved. And then you have to also understand the soft side of it which is the incentive of the various parties to either go one or the other way. And the algorithm ultimately reflects the designer’s because the algorithm will not make that determination of whether you should bring an umbrella since … however, it’s over 60% or under 60%. All it can tell you is that for today it’s 40%.
So I think this notion that the algorithm itself is running on its own, it’s false anyway. And then so once you have human input into these algorithms, then you have to also have to wonder about what the humans are doing. And I think in a lot of these books, I try to point out that what also complicates it is that in every case, including the case of Target, there will be different people coming from this in angles where they are trying to optimize objectives that are conflicting.
That’s the beginning of this…that sort of asking the question of the output. And I think if we start doing that more, we can avoid some of this, I think a very reticent current situation that runs into our conversation here is this whole collapse of this…company. I’m not sure if you guys have been following that.
Well, it’s an example of somebody who’s been solving this algorithm people have been asking. Well, a lot of people have not been asking for quantifiable results. The people have been asking for quantifiable results have been basically pushed back and, you know, they refused all the time to present anything. And then, at this point, I think it’s been acknowledged that it’s all…you know, empty, it’s hot air.
Andy Green: Right, yeah. You had some funny comments on, I think it was on your blog about, and this is related to these algorithms, about I guess buying shoes on the web. On, I don’t know, one of the website. And you were always saying, “Well,” they were coming up with some recommendations for other types of items that they thought you would be interested in. And what you really wanted was to go into the website and at least, when you went to buy the shoe, they would take you right to the shoe size that you ordered in the past or the color that you ordered.
Kaiser Fung: Right, right, yes.
Andy Green: And it would be that the simple obvious thing to do, instead of trying to come up with an algorithm to figure out what you might like and making suggestions…
Kaiser Fung: Yeah. So I think there are many ways to think about that. Part of it is it’s that often times the most unsexy problems are the most impactful. But people tend to focus on the most sexy problems. So in that particular case, I mean the whole article was about that the idea is that what makes prediction inaccurate is not just the algorithm being bad…well I mean the algorithms are often times actually, are not bad. It is that the underlying phenomenon that you are predicting is highly variable.
So I love to use examples like movies since movie ratings was really big some time ago. So how you rate a movie is not some kind of constant. It depends on the mood, it depends on what you did. It depends on who you are with. It depends on so many things. And you hear the same person in movies and under different settings, you probably gave different ratings. So in that sense, it is very difficult for an algorithm to really predict how you’re going to rate the movie. But what I was pointing out is that there are a lot of other types of things that these things could…the algorithms could predict that have essentially, I call invariable nature of property.
And a great example of that is the fact that almost always, I mean it’s like it’s still not a hundred percent but 90% of the time you’re buying stuff for yourself, therefore, you have certain shirt sizes, shoe sizes and so on. And therefore it would seem reasonable that they should just show you the things that is appropriate for you. And that’s a…it’s not a very sexy type of prediction. But it is a kind of prediction. And there are many, many other situations like that, you know. It’s like if you just think about just even using an email software, there are certain things that you click on there… it’s because the way it’s designed is not quite the way you use it. So we have all the data available, they’re measuring all this behavior, it could very well be predicted.
So I feel like everybody who has done the same with the clicks every time because they’re very much like, “Well, I just say what I mean.”