Everybody knows Google collects a lot of personal data. And we are all more or less OK with that. But as all this info gets fed into the mighty Google machinery for creating artificial intelligence (AI), a lot of bad human instincts mixed in with the results.
It’s like that rumor that trace amounts of cocaine and Adderall can be found all over New York public spaces. That, but the trace elements are racism, hatred, and violence, and they are all embedded in the models Google uses to appeal to our sympathies, i.e. to influence our behavior.
In the paper that allegedly got her fired from Google, AI researcher Timnit Gebru argues that large-scale learning models built on Google’s indiscriminate data harvesting have led to racial biases in the company's products.
The purpose of a language model (LM) is to predict the likelihood of a word given the previous words and the larger context of the discussion. Recently, the industry has been leaning towards large "transformer models," which benefit from ever larger quantities of data. "As increasingly large amounts of text are collected from the web in datasets […] this trend of increasingly large LMs can be expected to continue as long as they correlate with an increase in performance," the paper, “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” asserts.
The process is also environmentally wasteful, the researchers also argue.
These unsupervised, black box "deep learning" models ingest and incorporate the whole range of personal opinions, informed or less so.
“Large datasets based on texts from the Internet overrepresent hegemonic viewpoints and encode biases potentially damaging to marginalized populations,” Gebru and her co-authors write.
There are environmental costs as well that should be considered, and companies like Google are not really doing so, the researchers say. As an instance, they point to how training a single large scale model can consume as much data center energy as a trans-American airplane flight.
A lot more effort should instead, they argue, be put into understanding the potential damages that these large scale learning models are causing.
Or, better yet, look for ways to learning this user information from smaller, curated, data sets, ones that will be easier on the environment to run as well.
On attempted approach to solving the bias problem is instrumenting machine learning workflows with "fairness-aware learning algorithms," each to adjust for some "fair" outcome.
Such interventions, however, are "ineffective, inaccurate, and sometimes dangerously misguided when they enter the societal context that surrounds decision-making systems," wages a 2019 Association for Computing Machinery paper co-authored by Microsoft AI researcher Danah Boyd ("Fairness and Abstraction in Sociotechnical Systems").
By not taking into account the shifting cultural norms around them, the models remain an insufficient accounting of their subjects, and, in fact will remain biased in all sorts of ways.
Let us count the ways in which bias can creep into machine learning systems.
Lack of social context can show up as a bug in many cases where software is reused. In ML, problems are categorized by the learning task they need to be solved with ("clustering," "reinforcement learning,"). But borrowing a tool, you may borrow some of its assumptions as well.
Models in criminal risk assessment may not carry over to say, automated hiring.
A hiring system would be worried about false positives because they'd result in unneeded interviews and paperwork. The criminal justice system focuses on false positives to keep people from being unnecessarily locked up, or otherwise detained.
Each use case has a differing acceptable balance of false positives, its own incompatible definition of "fair."
"Certain assumptions will hold in some social contexts but not others," the researchers assert.
And these assumptions can change over time as well, and even be influenced by the use of the technology itself. Every technology introduced into the public sphere brings forth a reaction, good bad or otherwise. This reaction must be measured as well.
In most courts, risk assessment scores are generated for a judge as a recommendation. Some judges factor them into their decisions, others don't. But that risk assessment itself does not factor in how judges respond to the score. A judge may not use mathematically-generated risk assessment, wary of an outside influence to her decision-making.
"More attention is needed to understand when technologies trigger value shifts in social systems," the authors write.
It all leads to the challenge of mathematically defining what a vague concept like "fair" is in the first place.
"Fairness and discrimination are complex concepts that philosophers, sociologists, and lawyers have long debated. They are at times procedural, contextual, and politically contestable," the researchers write.
The law is primarily procedural, though machine learning models themselves are entirely defined by outcomes. Firing someone by race is illegal but firing someone in itself is legal: In either case, the outcome is the same, by ML understanding. The difference, and the definition of discrimination itself, is in the procedural.
Computer science is all about abstractions. With abstractions, all the complexity hidden behind, or surrounding, the inputs and outputs is whisked away.vThis abstraction flattens what it represents, boiling it down to a collection of performance metrics, which are entirely devoid of the procedural reasoning behind concepts such as justice and legality.
A "sociotechnical system" combats these mistakes, by moving out the abstraction boundary, so to speak, to include the decision -making process of humans and their institutions.