The Dirty Data Feeding Predictive Police Algorithms

The ability to predict crimes before they happen has long been a topic of fascination for science fiction writers and filmmakers. But in real life, the data feeding predictive algorithms is riddled with problems, according to a researcher at the UC Davis School of Law.

The ability to predict crimes before they happen has long been a topic of fascination for science fiction writers and filmmakers. In real life, predictive policing is getting a similar buzz, as dozens of police departments experiment with algorithm-driven programs to help them deploy resources more effectively.

But more attention should be focused on problems with the data that feed predictive algorithms, argues one researcher from the UC Davis School of Law.

“Predictive policing programs can’t be fully understood without an acknowledgment of the role police have in creating its inputs,” writes Elizabeth E. Joh, in a paper forthcoming in the William & Mary Bill of Rights Journal. Police aren’t just passive end-users of these data-driven programs–  they generate the information that feeds them.

The difference between crime, and crime data  

“A closer look at the “raw data” fed to these algorithms reveals some familiar problems,” the study maintains.

Even under the best of circumstances, crime data only partially captures the actual crime that occurs in any given place. To become fact, a crime must first be discovered, investigated, and recorded by police.

See also: Measuring the “Dark Figure” of Crime

Racial bias is only one factor that can influence the way police record crime, as well as the rate at which they record it, writes Joh. Other factors include workplace pressures, contract disputes, funding crises, the seriousness of the offense, “wishes of the complainant, the social distance between the suspect and the complainant, and the respect shown to the police.”

Changes in policy, such as the ‘broken windows’ campaign of the early 90s, also leave an indelible footprint on crime data.

There is also a concern that the algorithms produce self-fulfilling prophecies; send police to an area where crimes occurred in the past, and chances are that they’ll see something, reinforcing the the prediction.

As crime forecasting programs become more and more commonplace in police departments across the country, the consequences of data gaps will also grow in scale.

“Many of these issues will become even more difficult to isolate and identify as algorithmic decisionmaking becomes integrated into larger data management systems used by the police.”

The legitimacy of the “black box” algorithms themselves, which remain hidden behind proprietary information laws, is also uncertain. Last year, ProPublica investigated an algorithm created by Northpointe, Inc, and after comparing risk scores to actual recidivism rates, found the program to be only “somewhat more accurate than a coin flip.”

Ultimately, Joh cautions against “the assumption that algorithmic models don’t have subjectivity baked into them because they involve math.”

The same goes for law enforcement’s role in generating crime data:

“As long as policing is fundamentally a set of decisions by people about other people, the data fed to the machine will remain a concern.”

This summary was prepared by Deputy Content Editor Victoria Mckenzie. A full copy of the report can be obtained here. Readers’ comments are welcome.