New Research Casts More Doubt on Risk Assessment Tools

Two computer scientists, writing in the journal “Science Advances,” say the two-decade-old COMPAS system is no more accurate or fair than predictions made by people with little or no criminal justice expertise.” Over the past two decades, the program has been used to assess more than one million criminal offenders.

Two computer scientists have cast more doubt on the accuracy of risk assessment tools.

After comparing predictions made by a group of untrained adults to those of the risk assessment software COMPAS, authors found that the software “is no more accurate or fair than predictions made by people with little or no criminal justice expertise,” and that, moreover, “a simple linear predictor provided with only two features is nearly equivalent to COMPAS with its 137 features.”

Julia Dressel, a software engineer, and Hany Farid, a computer science professor at Dartmouth, concluded, in a paper published Tuesday by Science Advances, that “collectively, these results cast significant doubt on the entire effort of algorithmic recidivism prediction.”

COMPAS, short for Correctional Offender Management Profiling for Alternative Sanctions, has been used to assess more than one million criminal offenders since its inception two decades ago.

In response to a May 2016 investigation by Propublica that concluded the software is both unreliable and racially biased, Northpointe defended its results, arguing the algorithm discriminates between recidivists and non recidivists equally well for both white and black defendants. Propublica stood by its own study, and the debate ended in a stalemate.

Rather than weigh in on the algorithm’s fairness, authors of this study simply compared the software’s results to that of “untrained humans,” and found that “people from a popular online crowdsourcing marketplace—who, it can reasonably be assumed, have little to no expertise in criminal justice—are as accurate and fair as COMPAS at predicting recidivism.”

Each of the untrained participants were randomly assigned 50 cases from a pool of 1000 defendants, and given a few facts including the defendant’s age, sex and criminal history, but excluding race. They were asked to predict the likelihood of re-offending within two years. The mean and median accuracy of these “untrained humans” to be 62.1% and 64%, respectively.

Authors then compared these results to COMPAS predictions for the same set of 1000 defendants, and found the program to have a median accuracy of 65.2 percent.

These results caused Dressel and Farid to wonder about the software’s level of sophistication.

Although they don’t have access to the algorithm, which is proprietary information, they created their own predictive model with the same inputs given participants in their study.

“Despite using only 7 features as input, a standard linear predictor yields similar results to COMPAS’s predictor with 137 features,” the authors wrote. “We can reasonably conclude that COMPAS is using nothing more sophisticated than a linear predictor or its equivalent.”

Both study participants and COMPAS were found to have the same level of accuracy for black and white defendants.

The full study, “The accuracy, fairness, and limits of predicting recidivism,” was published in Science Advances and can be found online here. This summary was prepared by Deputy Editor Victoria Mckenzie. She welcomes readers’ comments.