A risk assessment tool used for two decades to assess sex offenders’ likelihood of committing a future offense has been repeatedly exposed as “pseudo-scientific humbug.” So why do New York State courts continue to use it?
A New York Appeals court has rejected the notion that risk prediction under the state’s Sex Offender Registration Act (SORA) should have a scientific basis. According to the July 2017 decision in People v. Curry, courts must not only adhere to a risk assessment instrument (RAI) that has been repeatedly exposed as pseudo-scientific humbug, they may not even consider a scientifically validated instrument such as the Static-99.
It wasn’t the first time. For the 20 years since SORA was enacted, courts have used the RAI to classify individuals after they’ve completed their sentences for a designated “sex offense.” The classifications purport to show the person’s likelihood of committing another sex offense in the future.
Persons adjudicated as level 2 or 3 are thought to be very dangerous indeed, and must register with law enforcement for the rest of their lives.
Their photographs, addresses, and a description of the past offense are made publicly available online at the sex offender registry. They may legally be denied jobs and housing, including shelters. They may be evicted, fired or hounded from the neighborhood by civic-minded vigilantes such as Parents for Megan’s Law.
This looks an awful lot like advance punishment for a future crime, like the science fiction film “Minority Report.” It also looks like a second punishment for a past offense—a practice the Constitution frowns on in the Double Jeopardy Clause.
Not at all, say the courts. SORA isn’t punishment, but merely a regulatory measure to protect public safety. As one legislator put it, it’s like affixing warning labels to toxic substances.
In that case, you’d think everyone would be deeply concerned to make sure that the label is as accurate as possible. It hardly contributes to public safety to broadcast over the Internet that Mr. Jones might commit a sex offense at any minute, when in fact he presents no such risk.
But that’s not how courts think.
Risk level under SORA is determined through an adversarial hearing in criminal court where the prosecutor proffers the RAI and typically seeks the highest possible classification. The RAI is a chart, cobbled together by employees of the Department of Parole, that adds up points for factors such as whether the past offense involved contact over or under clothing, or whether the victim was under age eleven or over 62.
The more points, the higher the risk level.
Defense attorneys have repeatedly proffered peer-reviewed research and the uncontested expert testimony of psychologists specializing in sex offender recidivism showing that the RAI is based on the facile but discredited assumption that “if he did it before he’ll do it again.” The instrument takes no account of the scientific consensus that recidivism isn’t correlated to the perceived heinousness of the past offense.
The scientific articles cited by the RAI are not only outdated; they don’t remotely stand for the conclusions for which they’re cited. Although the RAI purports to be an objective scientific instrument, it uses its own idiosyncratic system of assigning and weighing points that’s heavily biased towards a finding of maximum risk.
We’ve proffered instruments such as the Static-99 and the SVR-20 which, unlike the RAI, have been tested and validated by mental health professionals. In contrast, nobody except New York judges and District Attorneys uses the RAI.
The judicial response ranges from numb indifference to sputtering indignation. The outstanding exception is Daniel Conviser, a trial judge in Manhattan, who issued a 100-page opinion in 2010 after hearing expert testimony. After analyzing the RAI in detail, he concluded that the instrument is so arbitrary that it violates due process. Unfortunately, his decision isn’t binding on other courts and has been ignored.
The crystal ball approach to risk assessment. Illustration by Squawk
It’s like a drug test that can’t tell the difference between coffee and cocaine.
Even courts that recognize that the RAI may not be “the optimal tool” initially reasoned that there’s no harm in using it because it’s “only a recommendation.” But the Court of Appeals subsequently held that the RAI is so “presumptively reliable” that courts are bound by its conclusions unless the defendant can somehow prove that it overestimates his future risk.
The obvious course, until now, was for the defendant to show that a scientifically tested and validated instrument such as Static-99 put him at a lower risk. No dice, says the Appellate Division. Why? Because although the Static-99 measures the probability of re-offending, it doesn’t say what offense the person will commit if he re-offends.
Which conveniently ignores that no matter what the RAI claims, it doesn’t accurately predict anything.
It’s hard to see how this implacable rejection of science squares with the notion that SORA isn’t punishment but merely a regulatory measure to protect public safety. So long as risk prediction is based on the perceived heinousness of the past crime, it’s nothing but punishment under an alias.
There are now over 40,000 New Yorkers on the sex offender registry, most of whom have been adjudicated as level 2 or 3 based on the RAI. Public safety isn’t served by creating a permanent, ever-growing underclass of people who will remain forever barred from normal civic life based on a pseudo-scientific instrument.
Appellate Squawk is the pseudonym of an appellate attorney in New York City, and the author of a satirical legal blog of that name. Readers’ comments are welcomed.
A Washington State parole board rejected our columnist’s appeal for release from prison for a crime committed when he was a juvenile on the grounds that he had a “moderate to high” likelihood of re-offending. But they appear to have based the decision on a psychological risk assessment tool used to measure adult offenders.
Across the United States, there are hundreds of prisoners serving sentences of life without the possibility of parole for crimes committed when they were juveniles, but who now have an opportunity to be freed from newly imposed indeterminate sentences once they complete lengthy minimum terms of confinement. I am one of them.
Call us the Miller family. (After the 2003 Supreme Court Miller vs. Alabamaruling that determined imposing a life without parole sentence on a juvenile violated constitutional protections from cruel and unusual punishment.)
My original sentence was imposed for crimes that I committed when I was 14. However, in light of the Court ruling, the Washington State legislature gave prisoners like me the opportunity to be freed—provided that we are deemed by the parole board to be unlikely to “commit new criminal law violations if released.”
I must admit I rejoiced at this news after serving 20 years of a natural-life sentence. Yet as I moved closer to completing my newly imposed minimum term, I came to realize that the light at the end of the tunnel might actually be a train: my former cellmate, Anthony Powers, was denied parole even though, to many in the know, he was a model of reform.
Take the Deputy Secretary of the Department of Corrections (DOC), for example. Prior to the parole hearing, he wrote to Powers declaring:
Nevertheless, when Powers later underwent the requisite psychological assessment to determine whether he posed a recidivism risk, the conclusion was that he posed a high risk to reoffend.
This made me wary—for the arc of our lives had striking similarities. I too had committed a heinous crime when I was a teen. Therefore, to my mind, if it could be said that “a role model for other offenders” posed a risk to public safety, surely the same could be said for me.
My history provided all the elements necessary to craft a narrative to support keeping me confined, permanently, or setting me free—notwithstanding the results of a potentially negative psychological risk analysis.
Quite simply, there was the good, the bad, and the ugly.
Were this the parole board’s conception of me, undoubtedly I would be freed.
This is the narrative that I tried to focus upon to prevent being consumed by worry over psychological methodologies that were, quite frankly, a mystery to me. But worrying was becoming all too easy. In doing research to understand the legal landscape governing the authority vested in parole boards, the case law that I read further unsettled me.
All of this reading was chilling. Given the “multiplicity of imponderables” involved in this decision making, it seemed parole boards could do damn near anything.
Although the standard for parole eligibility is less discretionary when (as here) the governing statutes require prisoners to be freed unless a preponderance of the evidence shows that a disqualifying condition is present; in the final analysis, how a parole board weighs the evidence is entirely subjective.
Educated guesses and static risk assessments are all that most parole boards are left with. As a consequence, little has changed in the 50 years since the Washington Supreme Court gave voice to the mindset of parole boards:
This begs the question: How can a parole board with any degree of certainty utilize a rational means to separate prisoners who are “depraved, sadistic, cruel and ruthless” from those who pose little risk to public safety?
Psychological evaluations to measure a prisoner’s recidivism risk are one way to go about the process. In fact, they are mandated for Washington State prisoners affected by Miller v. Alabama and its progeny.
Prisoners just like me.
Stafford Creek Corrections Center, Aberdeen, Wa., where Jeremiah Bourgeois is currently serving a sentence of 25 years to life. Photo courtesy Washington State Dept. of Corrections
Which leads us back to my pre-parole hearing wariness about psychological risk assessments.
On which side of the coin would I fall on after undergoing such an analysis?
Rehabilitated or likely recidivist?
This question was resolved for me on Nov.7, 2017, when the Indeterminate Sentence Review Board informed me of the following:
“The Board commends Mr. Bourgeois for completing a significant amount of programming. However the Board has determined that he does not meet the statutory criteria for release at this time for the following reasons. Mr. Bourgeois has been assessed in his most recent psychological evaluation at a ‘Moderate to High’ risk to reoffend. Additionally, he has a history of serious violence while in prison, to include two felony assaults against Corrections Officers during his prison stay. Also, Mr. Bourgeois’ offense is particularly heinous as it was a revenge killing against victims of a crime for which they had been willing to testify in court to assist in securing a conviction of their perpetrator, Mr. Bourgeois’ brother.”
And that was the end for me: The parole board took note of the good, but was primarily influenced by the bad—and ugly.
Since this decision was reached, I have come to understand the methodology behind the DOC psychologist’s finding that I am a “Moderate to High risk to reoffend” if conditionally released. Indeed, my discovery gives insight into the difficulty in assessing the recidivism risk of those who have spent decades confined for crimes that they committed when they were minors.
Since there is no large-scale data specific to the parole outcomes of prisoners like me, psychologists within DOC rely upon the Violence Risk Appraisal Guide (VRAG) which was constructed and validated on a cohort comprised mostly of white Canadian male forensic patients.
Further, in its revised edition (VRAG-R), relies upon a sample of individuals who, for the most part, either plead or were found not guilty by reason of insanity and spent an average of four years imprisoned.
The VRAG-R is designed to measure the risk of future violence by those who committed their instant offense when they were adults, not adolescents and, as Dr. John Monahan, a preeminent expert on risk assessments, explains:
The VRAG-R scoring sheet, for instance, gives higher points if a person did not live with their parent(s) until they were at least age 16, are unmarried, and their crime(s) took place before they were age 26. These strikes are therefore baked in the cake when assessing those who are confined as adolescents because, ultimately, the assessment does not account for the fact that “children are different.”
Notwithstanding the efficacy of utilizing the VRAG-R to assess the potential risk I pose to public safety—as I said in the beginning—my history provided the means for crafting a narrative to support keeping me confined permanently, or setting me free.
In this instance, I just happened to fall within the category of those believed to be cloaking their criminogenic propensities.
I am still coming to terms with the notion that I am a likely recidivist.
Having been denied parole after 25 years of confinement for crimes committed when I was 14-years-old, I can now envision the day when all I will have to live for is writing my monthly columns for The Crime Report.
Jeremiah Bourgeois is a regular contributor to TCR, and an inmate in Washington State, where he has been serving a life sentence since the age of 14. He welcomes comments from readers. Those who wish to express their opinion regarding the decision to deny his release can contact the Indeterminate Sentence Review Board. Readers’ comments are welcome.
Two computer scientists, writing in the journal “Science Advances,” say the two-decade-old COMPAS system is no more accurate or fair than predictions made by people with little or no criminal justice expertise.” Over the past two decades, the program has been used to assess more than one million criminal offenders.
Two computer scientists have cast more doubt on the accuracy of risk assessment tools.
After comparing predictions made by a group of untrained adults to those of the risk assessment software COMPAS, authors found that the software “is no more accurate or fair than predictions made by people with little or no criminal justice expertise,” and that, moreover, “a simple linear predictor provided with only two features is nearly equivalent to COMPAS with its 137 features.”
Julia Dressel, a software engineer, and Hany Farid, a computer science professor at Dartmouth, concluded, in a paper published Tuesday by Science Advances, that “collectively, these results cast significant doubt on the entire effort of algorithmic recidivism prediction.”
COMPAS, short for Correctional Offender Management Profiling for Alternative Sanctions, has been used to assess more than one million criminal offenders since its inception two decades ago.
In response to a May 2016 investigation by Propublica that concluded the software is both unreliable and racially biased, Northpointe defended its results, arguing the algorithm discriminates between recidivists and non recidivists equally well for both white and black defendants. Propublica stood by its own study, and the debate ended in a stalemate.
Rather than weigh in on the algorithm’s fairness, authors of this study simply compared the software’s results to that of “untrained humans,” and found that “people from a popular online crowdsourcing marketplace—who, it can reasonably be assumed, have little to no expertise in criminal justice—are as accurate and fair as COMPAS at predicting recidivism.”
Each of the untrained participants were randomly assigned 50 cases from a pool of 1000 defendants, and given a few facts including the defendant’s age, sex and criminal history, but excluding race. They were asked to predict the likelihood of re-offending within two years. The mean and median accuracy of these “untrained humans” to be 62.1% and 64%, respectively.
Authors then compared these results to COMPAS predictions for the same set of 1000 defendants, and found the program to have a median accuracy of 65.2 percent.
These results caused Dressel and Farid to wonder about the software’s level of sophistication.
Although they don’t have access to the algorithm, which is proprietary information, they created their own predictive model with the same inputs given participants in their study.
“Despite using only 7 features as input, a standard linear predictor yields similar results to COMPAS’s predictor with 137 features,” the authors wrote. “We can reasonably conclude that COMPAS is using nothing more sophisticated than a linear predictor or its equivalent.”
Both study participants and COMPAS were found to have the same level of accuracy for black and white defendants.
The full study, “The accuracy, fairness, and limits of predicting recidivism,” was published in Science Advances and can be found online here. This summary was prepared by Deputy Editor Victoria Mckenzie. She welcomes readers’ comments.
Critics charge that despite claims of objectivity, algorithms reproduce existing biases, disproportionately targeting people by class, race, and gender. Reformers say another New York City bill, the Right to Know Act, doesn’t go far enough.
New York City is taking steps to address algorithmic bias in city services. The City Council passed a bill that will require the city to address bias in algorithms used by the police department, courts, and dozens of city agencies, Vice reports. The bill would create a task force to figure out how to test city algorithms for bias, how citizens can request explanations of algorithmic decisions when they don’t like the outcome, and whether it’s feasible for the source code used by city agencies to be made publicly available.
Criminal justice reformers and civil liberties groups charge that despite claims of objectivity, algorithms reproduce existing biases, disproportionately targeting people by class, race, and gender. A Pro Publica investigation found that a risk assessment tool was more likely to mislabel black than white defendants. Studies have found facial recognition algorithms were less accurate for black and female faces.
Critics of predictive policing—which uses statistics to determine where cops should spend time on their beats—say it reinforces existing biases and brings cops back to already over-policed neighborhoods.
Rachel Levinson-Waldman of the Brennan Center of Justice said New York’s police department refuses to disclose the source code for the predictive policing program, claiming it would help criminals evade the cops. (Three academics argue in the New York Times that even imperfect algorithms improve the justice system.) The City Council on Tuesday approved the Right to Know Act, which requires changes to day-to-day interactions between police officers and those they encounter.
The measures drew opposition from criminal justice reform groups and the city’s largest officers’ union, the New York Times reports. Reformers said the bill omitted many common street encounters, including car stops and questioning by officers in the absence of any reasonable suspicion of a crime.
Early evidence suggests some risk assessment tools offer promise in rationalizing decisions on granting bail without racial bias. But we still need to monitor how judges actually use the algorithms, says a Boston attorney.
Next Monday morning, visit an urban criminal courthouse. Find a seat on a bench, and then watch the call of the arraignment list.
Files will be shuffled. Cases will be called. Knots of lawyers will enter the well of the court and mutter recriminations and excuses. When a case consumes more than two minutes you will see unmistakable signals of impatience from the bench.
Pleas will be entered. Dazed, manacled prisoners—almost all of them young men of color—will have their bails set and their next dates scheduled.
Some of the accused will be released; some will be detained, and stepped back into the cells.
You won’t leave the courthouse thinking that this is a process that needs more dehumanization.
But a substantial number of criminal justice reformers have argued that if the situation of young men facing charges is to be improved, it will be through reducing each accused person who comes before the court to a predictive score that employs mathematically derived algorithms which weigh only risk.
This system of portraiture, known as risk assessment tools, is claimed to simultaneously reduce pretrial detentions, pretrial crime, and failures to appear in court—or at least that was the claim during a euphoric period when the data revolution first poked its head up in the criminal justice system.
We can have fewer prisoners and less crime. It would be, the argument went, a win/win: a silver bullet that offers liberals reduced incarceration rates and conservatives a whopping cost cut.
These confident predictions came under assault pretty quickly. Prosecutors—represented, for example, by Eric Sidall here in The Crime Report—marshaled tales of judges (“The algorithm made me do it!”) who released detainees who then committed blood-curdling crimes.
Other voices raised fears about the danger that risk assessment tools derived from criminal data trails that are saturated with racial bias will themselves aggravate already racially disparate impacts.
A ProPublica series analyzed the startling racial biases the authors claim were built into one widely used proprietary instrument. Bernard Harcourt of Columbia University argued that “risk” has become a proxy for race.
A 2016 study by Jennifer Skeem and Christopher Lowenkamp dismissed Harcourt’s warnings as “rhetoric,” but found that on the level of particular factors (such as the criminal history factors) the racial disparities are substantial.
Meanwhile, a variety of risk assessment tools have proliferated: Some are simple checklists; some are elaborate “machine learning” algorithms; some offer transparent calculations; others are proprietary “black boxes.”
Whether or not the challenge of developing a race-neutral risk assessment tool from the race-saturated raw materials we have available can ever be met is an argument I am not statistician enough to join.
But early practical experience seems to show that some efforts, such as the Public Safety Assessment instrument, developed by the Laura and John Arnold Foundation and widely adopted, do offer a measure of promise in rationalizing bail decision-making at arraignments without aggravating bias (anyway, on particular measurements of impact).
The Public Safety Assessment (PSA), developed relatively transparently, aims to be an objective procedure that could encourage timid judges to separate the less dangerous from the more dangerous, and to send the less dangerous home under community-based supervision.
At least, this practical experience seems to show that in certain Kentucky jurisdictions where (with a substantial push from the Kentucky legislature) PSA has been operationalized, the hoped-for safety results have been produced—and with no discernible increase in racial disparity in outcomes.
Unfortunately, the same practical experience also shows that those jurisdictions are predominately white and rural, and that there are other Kentucky jurisdictions, predominately minority and urban, where judges have been—despite the legislature’s efforts—gradually moving away from using PSA.
These latter jurisdictions are not producing the same pattern of results.
The judges are usually described as substituting “instinct” or “intuition” for the algorithm. The implication is that they are either simply mobilizing their personal racial stereotypes and biases, or reverting to a primitive traditional system of prophesying risk by opening beasts and fowl and reading their entrails, or crooning to wax idols over fires.
As Malcolm M. Feeley and Jonathan Simon predicted in a 2012 article for Berkeley Law, past decades have seen a paradigm shift in academic and policy circles, and “the language of probability and risk increasingly replaces earlier discourse of diagnosis and retributive punishment.”
A fashion for risk assessment tools was to be expected, they wrote, as everyone tried to “target offenders as an aggregate in place of traditional techniques for individualizing or creating equities.”
But the judges at the sharp end of the system whom you will observe on your courthouse expedition don’t operate in a scholarly laboratory.
They have other goals to pursue besides optimizing their risk-prediction compliance rate, and those goals exert constant, steady pressure on release decision-making.
Some of these “goals” are distasteful. A judge who worships the great God, Docket, and believes the folk maxim that “Nobody pleads from the street” will set high bails to extort quick guilty pleas and pare down his or her room list.
Another judge, otherwise unemployable, who needs re-election or re-nomination, will think that the bare possibility that some guy with a low predictive risk score whom he has just released could show up on the front page tomorrow, arrested for a grisly murder, inexorably points to detention as the safe road to continued life on the public payroll.
They are just trying to get through their days.
But the judges are subject to other pressures that most of us hope they will respect.
For example, judges are expected to promote legitimacy and trust in the law.
It isn’t so easy to resist the pull of “individualizing “and “diagnostic” imperatives when you confront people one at a time.
Somehow, “My husband was detained, so he lost his job, and our family was destroyed, but after all, a metronome did it, it was nothing personal” doesn’t seem to be a narrative that will strengthen community respect for the courts.
Rigorously applying the algorithm may cut the error rate in half, from two in six to one in six, but one in six are still Russian roulette odds, and the community knows that if you play Russian roulette all morning (and every morning) and with the whole arraignment list, lots of people get shot.
No judge can forget this community audience, even if the “community” is limited to the judge’s courtroom work group. It is fine for a judge to know whether the re-offense rate for pretrial releases in a particular risk category is eight in ten, but to the judges, their retail decisions seem to be less about finding the real aggregated rate than about whether this guy is one of the eight or one of the two.
Embedded in this challenge is the fact that you can make two distinct errors in dealing with difference.
First, you can take situations that are alike, and treat them as if they are different: detain an African-American defendant and let an identical white defendant go.
Second, you can take things that are very different and treat them as if they are the same: Detain two men with identical scores, and ignore the fact that one of the two has a new job, a young family, a serious illness, and an aggressive treatment program.
A risk assessment instrument at least seems to promise a solution to the first problem: Everyone with the same score can get the same bail.
But it could be that this apparent objectivity simply finesses the question. An arrest record, after all, is an index of the detainee’s activities, but it also a measure of police behavior. If you live in an aggressively policed neighborhood your history may be the same as your white counterpart’s, but your scores can be very different.
And risk assessment approaches are extremely unwieldy when it comes to confronting the second problem. A disciplined sticking-to-the-score requires blinding yourself to a wide range of unconsidered factors that might not be influential in many cases, but could very well be terrifically salient in this one.
This tension between the frontline judge and the backroom programmer is a permanent feature of criminal justice life. The suggested solutions to the dissonance range from effectively eliminating the judges by stripping them of discretion in applying the Risk Assessment scores to eliminating the algorithms themselves.
But the judges aren’t going away, and the algorithms aren’t going away either.
As more cautious commentators seem to recognize, the problem of the judges and the algorithms is simply one more example of the familiar problem of workers and their tools.
If the workers don’t pick up the tools it might be the fault of the workers, but it might also be the fault of the design of the tools.
And it’s more likely that the fault does not lie in either the workers or the tools exclusively but in the relationship between the workers, the tools, and the work. A hammer isn’t very good at driving screws; a screw-driver is very bad at driving nails; some work will require screws, other work, nails.
If you are going to discuss these elements, it usually makes most sense to discuss them together, and from the perspectives of everyone involved.
The work that the workers and their tools are trying to accomplish here is providing safety—safety for everyone: for communities, accused citizens, cops on the streets. A look at the work of safety experts in other fields such as industry, aviation, and medicine provides us with some new directions.
To begin with, those safety experts would argue that this problem can never be permanently “fixed” by weighing aggregate outputs and then tinkering with the assessment tool and extorting perfect compliance from workers. Any “fix” we install will be under immediate attack from its environment.
Among the things that the Kentucky experience indicates is that in courts, as elsewhere, “covert work rules”, workarounds, and “informal drift” will always develop, no matter what the formal requirements imposed from above try to require.
The workers at the sharp end will put aside the tool when it interferes with their perception of what the work requires. Deviations won’t be huge at first; they will be small modifications. But they will quickly become normal.
And today’s small deviation will provide the starting point for tomorrow’s.
What the criminal justice system currently lacks—but can build—is the capacity for discussing why these departures seemed like good ideas. Why did the judge zig, when the risk assessment tool said he or she should have zagged? Was the judge right this time?
Developing an understanding of the roots of these choices can be (as safety and quality experts going back to W. Edwards Deming would argue) a key weapon in avoiding future mistakes.
We can never know whether a “false positive” detention decision was an error, because we can never prove that the detainee if released would not have offended. But we can know that the decision was a “variation” and track its sources. Was this a “special cause variation” traceable to the aberrant personality of a particular judge? (God knows, they’re out there.)
Or was it a “common cause variation” a natural result of the system (and the tools) that we have been employing?
This is the kind of analysis that programs like the Sentinel Events Initiative demonstration projects about to be launched by the National Institute of Justice and the Bureau of Justice Assistance can begin to offer. The SEI program, due to begin January 1, with technical assistance from the Quattrone Center for the Fair Administration of Justice at the University of Pennsylvania Law School, will explore the local development of non-blaming, all-stakeholders, reviews of events (not of individual performances) with the goal of enhancing “forward-looking accountability” in 20-25 volunteer jurisdictions.
The “thick data” that illuminates the tension between the algorithm and the judge can be generated. The judges who have to make the decisions, the programmers who have to refine the tools, the sheriff who holds the detained, the probation officer who supervises the released, and the community that has to trust both the process and the results can all be included.
We can mobilize a feedback loop that delivers more than algorithms simply “leaning in” to listen to themselves.
What we need here is not a search for a “silver bullet,” but a commitment to an ongoing practice of critically addressing the hard work of living in the world and making it safe.
James Doyle is a Boston defense lawyer and author, and a frequent contributor to The Crime Report. He has advised in the development of the Sentinel Events Initiative of the National Institute of Justice. The opinions expressed here are his own. He welcomes readers’ comments.
The first baseline measurement of pretrial justice across the U.S. has found most states to be failing, with a few “promising” exceptions, according to the Pretrial Justice Institute.
The first baseline measurement of pretrial justice across the U.S. has found most states to be failing, with a few “promising” exceptions, according to a national advocacy group.
In a study released Wednesday by the Pretrial Justice Institute, authors measured the rates of pretrial detention, use of available risk assessment tools, and the status of money bail systems in every state.
“Needless” incarceration before trail is the primary cause for states’ failing grades: according to PJI’s findings, two thirds of the current U.S. jail population has not yet been to trail.
At the forefront of pretrial justice reform are Washington D.C., where 92 percent of those arrested are released pretrial and no one is detained for inability to pay; and New Jersey, which implemented statewide pretrial services earlier this year, resulting in a 15 percent reduction of pretrial detainees within the first six months.
The report also highlights legislative advances made by Alaska, Arizona, California, Indiana, Maryland, and New Mexico in the area of pretrial justice reform.
While the number of jurisdictions using risk assessment tools has more than doubled in the past four years, authors note that the increase is driven by “a few states and densely populated jurisdictions,” adding that “evidence-based pretrial assessments show that most people released before trial will appear in court and not be arrested on new charges pending trial.”
The study used money bail as its final measure because “financial conditions play such a large role in needlessly detaining people and giving us a false sense of safety,” according to the authors. New Jersey is the only state to have eliminated money bail, so this is where the U.S. pretrial justice score hovers closest to zero: only 3% of Americans live in a jurisdiction that has eliminated cash bail.
“As long as pretrial systems use money as a condition of pretrial release,” concludes the report, “poor and working class people will remain behind bars while those who are wealthy go home, regardless of their likelihood of pretrial success. This is a fundamental injustice.”
In Philip K. Dick’s “Minority Report,” criminals could be identified before they committed a crime. Computer-generated risk algorithms used by courts to determine whether individuals should be released ahead of trial have brought us a step closer to that world–and our challenge is to use them responsibly, says a George Mason University professor.
Should the increased use of computer-generated risk algorithms to determine criminal justice outcomes be cause for concern or celebration?
This is a hard question to answer, but not for the reasons most people think.
Judges around the country are using computer-generated algorithms to predict the likelihood that a person will commit crime in the future. They use these predictions to help determine pretrial custody, sentence length, prison security-level, probation, parole, and post-release supervision.
Proponents argue that by replacing the ad-hoc and subjective assessments of judges with sophisticated risk assessment instruments, we can reduce incarceration without affecting public safety.
Critics respond that they don’t want to live in a “Minority Report” state where people are punished for crimes before they are committed—particularly if risk assessments are biased against blacks.
Which side is right?
Should the increased use of computer-generated risk algorithms to determine criminal justice outcomes be cause for concern or celebration? This is a hard question to answer, but not for the reasons most people think.
It’s hard to answer because there is no single answer: The impacts that risk assessments have in practice depend crucially on how they are implemented.
Risk assessments are tools—no more and no less. They can be used to increase incarceration or decrease incarceration. They can be used to increase racial disparities or decrease disparities.
They can be used to direct “high risk” people towards support and services or to punish them more harshly.They can be implemented in such a broad set of ways that thinking about them monolithically just doesn’t make sense.
Take bail reform, for example.
Bail reform is one of the most active areas of change in criminal justice right now, and risk assessments have been a key part of many reform efforts. The idea behind the current bail reform movement is that pretrial custody decisions should be made on the basis of risk, not resources.
Instead of conditioning pretrial release on the ability to pay bail—which discriminates against the poor—reformers argue that pretrial release should be determined by a defendant’s risk of crime or flight.
Traditionally, risk of crime or flight was evaluated informally by a judge. Now, many jurisdictions are providing judges with computer-generated risk scores to help them decide whether the defendant can be safely released.These risk scores take into account factors like criminal history, age and sometimes even socio-economic characteristics like employment or stable housing.
One of the more popular pretrial risk assessment instruments, called the PSA, was developed by the Laura and John Arnold Foundation in 2013 and has since been adopted in some thirty jurisdictions as well as three entire states. The results have been mixed.
New Jersey has seen a dramatic decline in its pretrial detention rate: the number of people detained pretrial has dropped by about a third since the PSA was adopted in January. Lucas County which hosts the low-income city of Toledo, Ohio, has actually seen an increase in the pretrial detention rate since the PSA was adopted.
And a recent report suggests that Chicago judges have been largely ignoring the PSA. Why such different results in different places? It’s too soon to say for sure, but there are a number of details related to implementation that could make all the difference.
For one, determining what level of risk should be considered “high” is a subjective determination.
In fact, there is little consensus on this issue: depending on the instrument and the jurisdiction, a high risk classification can correspond with a probability of re-arrest that’s as low as 10% or as high as 42%.
With the PSA, jurisdictions can decide themselves where to set the cutoff points between a low, moderate, and high risk ranking.
These groupings are important, because many jurisdictions also adopt specific recommendations for each risk classification. For example, New Jersey uses a decision-making framework that recommends pretrial detention only for defendants with the highest risk scores: this has been defined so as to include only about 5% of arrestees.
In Mecklenberg County, another PSA site, generally only defendants who are ranked “low” or “below-average” on their risk score are recommended for release without secured monetary bond, making it less likely that risk assessment will increase release rates very much.
The impact that risk assessments have in practice will also depend on the extent to which judges use them. In most jurisdictions, judges are given the final say, and if they do not want to follow the recommendations associated with the risk assessment they don’t have to.
A recent survey showed that only a small minority of judges thought that risk assessments were better at predicting future crime than judges.
If judges are skeptical, what would them motivate them? They will be more likely to use the risk assessment if they are incentivized to do so; for example, if deviating from the recommendations requires a detailed written reason for doing so.
Or, if there is a system of accountability where their actions are tracked and monitored. Finally, it’s always possible to implement risk assessment in a way that doesn’t involve judicial discretion at all.
Kentucky, a leader in the use of pretrial risk assessments, recently revised its procedures so that all low and moderate risk defendants facing non-serious charges are automatically released immediately after booking.
As for racial disparities, we know very little about how these have been impacted by the adoption of risk assessment. But what little we do know suggests that implementation details are important.In a recent study, I found that pretrial risk assessment in Kentucky benefited white defendants more than black, but this was solely because judges in the predominantly-white rural counties followed the recommendations of the risk assessment more than judges in the more racially mixed urban counties.
In other words, the increased racial disparities brought on by risk assessment were caused by regional trends in use, not by the bias of the instrument.This pattern might have been reversed if training, monitoring, and accountability in urban areas were higher.
Furthermore, risk assessment is more likely to reduce racial disparities if it is used to replace monetary bail. Since black defendants tend to have lower incomes, they tend to be less able to afford bail than white defendants.
One study shows that half the race gap in pretrial detention is explained by race differences in the likelihood of posting a given bond amount.
We already live in a “Minority Report” state: the practice of grounding criminal justice decisions on predictions about future crime has been around a long time. The recent shift towards adopting risk assessment tools simply formalizes this process—and in doing so, provides an opportunity to shape what this process looks like.
Instead of embracing risk assessment wholeheartedly or condemning it without reserve, reformers should ask whether there is a particular implementation design by which risk assessment could advance the much-needed goals of reform.
Megan T. Stevenson is an economist and Assistant Professor of Law at George Mason University. She welcomes comments from readers.
Tools that use algorithms to determine whether to detain accused individuals before a trial are increasingly being used across the country as an alternative to the bail system. But the vice president of the Los Angeles County Association of Deputy District Attorneys argues that the tools also lead to tragedies.
If you aren’t following bail reform, you may not be aware that accompanying the attempt to eliminate bail across the country is the touting of “risk assessment tools” to determine who should be detained on bail before trial.
However, the use of this tool has led to the wholesale release of violent criminals—and tragedy.
Three recent examples in New Mexico, New Jersey and San Francisco illustrate my point.
A story published by the conservative website The Daily Wire said the assessment tool has led to virtually every defendant arrested in New Mexico for a violent crime being released without bail.
The story quoted a report from Albuquerque NBC affiliate KOB4, saying, “Even with the highest rate of failing to appear in court and the highest rate of new criminal activity for a defendant, the tool still recommends that person[s] be released on their own recognizance unless the prosecutors have filed for preventative detention.”
In New Jersey, according to the Washington Post, the tool determined that a man jailed for illegally possessing a gun was not a danger and recommended his release. Days later, that man hunted down a rival and shot at him 22 times, killing him. The family of the victim is now suing the Arnold Foundation, amongst others, for the death.
In San Francisco, the online website SFGate reported that a man suspected of murder was released days earlier after being arrested for possession of two guns. According to the website, the judge, relying on the assessment tool, rejected the District Attorney’s office recommendation that the man be kept in jail on a probation violation.
A spokesman for the DA’s office was quoted as saying the use of the tool has caused “many instances of contention.”
He continued: “As it relates to this case along with many other cases, we have a disagreement with how that risk assessment is being calculated. They suggested release with certain conditions, and the judge carried out that recommendation and this defendant was released.”
The Arnold Foundation argues that its tool is needed because “failing to appropriately determine the level of risk that a defendant poses impacts future crime and violence, and carries enormous costs–both human and financial.”
The examples in New Mexico, New Jersey and San Francisco certainly attest to the truth of that statement.
Additional Reading: Risk assessment tools have triggered a contentious debate in the criminal justice community. In June, the Supreme Court refused to hear the case of a Wisconsin man who was sentenced to six years in prison by a judge who consulted the results of a risk assessment algorithm.The plaintiff argued that the use of the algorithm violated his rights to due process.
Eric W. Siddall is Vice President of the Los Angeles Association of Deputy District Attorneys (ADDA), the collective bargaining agent representing nearly 1,000 deputy district attorneys who work for the County of Los Angeles. This is an edited version of an essay that appeared earlier this month on ADDA’s website. Readers’ comments are welcome.
New Jersey’s use of an algorithm to advise judges on pretrial release “is what the new vision of American justice looks like,” NBC News reports. Six months into the new practice, New Jersey jails are already starting to empty, and the number of people locked up while awaiting trial has dropped.
New Jersey’s use of an algorithm to advise judges on pretrial release “is what the new vision of American justice looks like,” NBC News reports. Created by data scientists and criminal-justice researchers, the algorithm — one of dozens of “risk assessment tools” being used around the U.S. — promises to use data to scrub the system of bias by keeping only the most dangerous defendants behind bars, regardless of socioeconomic status. Six months into the new practice, New Jersey jails are already starting to empty, and the number of people locked up while awaiting trial has dropped.
It’s also clear that data is no wonder drug. The new system — driven by years of research involving hundreds of thousands of cases and requiring multimillion-dollar technology upgrades and the hiring of more judges, prosecutors and court workers — still produces contentious decisions about who deserves freedom and who does not. Police officials and prosecutors complain about the release of people charged with gun crimes, fleeing police, attacking an officer, sex offenses and domestic violence, and of those who keep getting re-arrested. In at least two cases, people have been killed by men who’d been released on earlier charges. The bail bond industry, facing extinction, has backed two federal lawsuits seeking to end the algorithm’s use. Defense lawyers and civil rights advocates say people who pose little risk have been ordered detained, only to be given plea deals or have their charges dropped. They fear that authorities are exploiting the new system to generate convictions. It remains unclear whether the new approach will reduce racial disparities, drive down crime rates or be fiscally sustainable. If it works in New Jersey, it could become a model for the rest of the nation.
The proposal by Democrat Kamala Harris and Republican Rand Paul would authorize total spending of $10 million a year for states that replace cash bail with a system that considers community risk, not a defendant’s ability to pay. New Jersey has already moved forward with a system that some call a model for the nation.
U.S. Sen. Kamala Harris, a California Democrat, has introduced bipartisan legislation to prod states to reform their bail systems, reports the San Jose Mercury News. The new bill, which Harris co-wrote with Sen. Rand Paul, a Kentucky Republican, and was introduced yesterday, would spend $10 million annually for three years on grants for states that reform their bail systems.
Most courts in the U.S. require money bail, holding defendants in jail before trial until they pay. Advocates say cash bail is unfair to poor defendants who haven’t been convicted of a crime.
Under Harris’ bill — her first major bipartisan legislation — states would be eligible for a grant if they enact reforms such as replacing money bail with systems based on assessing a defendant’s risk to the community, releasing inmates before trial in most cases, or appointing public defenders at the earliest stages of pretrial detention.
In a New York Times commentary, Harris and Paul wrote, “Our justice system was designed with a promise: to treat all people equally. Yet that doesn’t happen for many of the 450,000 Americans who sit in jail today awaiting trial because they cannot afford to pay bail.” They said their proposal encourages better data collection, empowers states to build on best practices, and holds them accountable.
Some states have already moved to change their approach to bail. New Jersey, for example, is shifting away from “money-based” pretrial justice through pretrial risk assessment in a system NPR describes in the latest episode of its “Planet Money” podcasts as a “model” for the nation.