Tales of Caution for the times of Big Data and Machine Learning

“Big Data processes codify the past. They do not invent the future.”
Cathy O’Neil

A good ( meaning smart) title for a book can be an important factor that drives me to take it up for reading. The last book that I read had one of the best done puns and used it effectively throughout the book as a theme.

The title in question is Weapons of Math Destruction, and if you already haven’t noticed it is a smartly done pun on Weapons of Mass Destruction. The author, Cathy O Neil, does not simply stop at using this pun in the title but the recurrent theme in the book is also to point out how math has been turned into an effective weapon, operating at mass scales and in most instances used for destructive purposes.

These are the main attributes according to Cathy O Neil to call a mathematical model a WMD: if it is opaque, has a damaging effect and is scalable. While these features seem simple enough, the author takes up revealing examples from stock markets, to college ranking and the controversial recidivism models used in courts of law.

The book was also topical for me, as I now work along the peripheries of the much hyped Machine Learning models, which are perfect examples of black box mathematical models. The story however, is not all grim; as with all technologies this force can be turned either way. As the author also points out, depending on the objective function that is being optimised one can use these models for the benefit of societies as much as it can be used to hamper it. Unfortunately, in most cases it seems, these models are easily adaptable to work on optimising efficiency which can have negative consequences. When the trade-off is between efficiency and fairness, it is easier to work out models that can improve efficiency simply because it can be measured and fairness is not a tangible quantity that can be measured and plugged into models (at least in comparison to attributes that measure efficiency). Efficiency can be measured as reduction in costs of some form or another and in markets driven by profits these translate as effective models. When models are used to drive efficiency, they end up dislodging the fairness equation. This is further aggravated when the models, that the author attributes as WMD, usually do not get feedback from their predictions. And a model that does not correct based on its predictions is either destined to go bonkers (the 2008 recession) or becomes so powerful that it propagates a self-fulfilling prophecy by generating data that might fit its predictions. For instance, recidivism models used to predict criminality punish people from poor background (and in the US of non-white ethnicity) and the ones who are scored by these models to be more prone to committing crime have lesser chance of getting back to a normal life (fewer jobs, for one thing) and end up fulfilling the WMD’s prophecies by most likely committing more “crime”.

All models are wrong. And no models generalise well. This means, any given model can only be applied to fit majority of the data and, in cases of people, it can explain only the average person. While average people are a useful notion when they are seen as data points, in reality, each one of us is far from that average data point person. The outliers of the system, who are not described by the models come on both extremes: invariably, these are the super rich and the poor. The models, going by the examples in the book, seem to benefit the rich and punish the poor.

The caution that the author places on using these models which use proxies to measure intangible quantities, use no feedback, are opaque and applied to a broad population are growing to become all the more powerful with Big Data and Machine Learning. The time is rife to take tremendous caution when we are either prescribing these models or to be wary of the results that are provided to us. Unfortunately, however, there is very little we can do if we are victims of these increasingly pervasive models. Except unless we are able to convince our governments towards stringent regulations of these Weapons of Math Destruction.

1 Response to Tales of Caution for the times of Big Data and Machine Learning

Philomath says:

March 7, 2018 at 3:43 pm

Good post! The problem of the back box is here to stay, since its intrinsic to the dissemination of big data. The challenges I see in the near future is the preference of choosing a machine that is accurate over one that the results can be understood.