COMPAS Case Study: Investigating Algorithmic Fairness of Predictive Policing

Mallika Chawla
7 min readFeb 23, 2022

--

Bernard Parker, left, was rated high risk; Dylan Fugett was rated low risk. (Josh Ritchie for ProPublica)

In July 2016, the Wisconsin Supreme Court upheld the use of Northpointe Inc’s algorithm ‘Correctional Offender Management Profiling for Alternative Sanctions (COMPAS)’¹ in judicial decision making (State v Loomis, 2016). The defendant, Eric Loomis, had been assessed by COMPAS as a high risk individual and was consequently sentenced to eight years in prison — a ruling that he challenged as a violation of his due process rights. While his ‘risk assessment score’ results were shared with him, the calculations that transformed the underlying data into this score weren’t revealed. Multiple cases of this nature have been heard all over the United States where privately developed algorithms are protected due to their proprietary status, even in the face of legitimate issues of social concern and justice. A closer examination of such case laws has helped in demonstrating the risks associated with large scale algorithmisation of justice alongside the limitations of the ‘technical black box’ in protecting basic rights such as due process, transparency and equality.

A few weeks before the Loomis decision, ProPublica² had fuelled intense debate around ‘algorithmic fairness’ when it alleged that COMPAS was racially biased against black defendants (ProPublica. 2016). Their analysis of COMPAS risk assessment scores for over 7,000 defendants at Boward County, Florida (which they released publicly as evidence for their findings) led them to the conclusion that black defendants were incorrectly labelled — “got a false positive” — as ‘high risk’ to commit a future crime twice as often as their white counterparts. Correspondingly, they found that white defendants were incorrectly judged as ‘low risk’ more often than black defendants were. They even stated that “the algorithm was somewhat more accurate than a coin flip” (ProPublica, 2016), a sentiment that was shared by Dressel and Farid’s study which showed that COMPAS risk assessments were “no more accurate or fair than the predictions of individual laypeople on the Internet”(Dressel; Farid. 2018).

In response, Northpointe insisted that COMPAS was accurate under its own measure of fairness. They published a technical analysis of the data that was used in ProPublica’s study challenging the regression models, classification terms and measures of discrimination used by ProPublica as well as ProPublica’s interpretation of model errors. Their central rebuttal was the lapse on ProPublica’s part to take into account “the different base rates of recidivism for blacks and whites” (Deiterch, William et al. 2016). Furthermore, it was their contention that ProPublica combined High and Medium levels and referred to that combined level as ‘Higher Risk’ which had the effect of inflating the false positive rate and the corresponding Target Population Error. Yet, Northpointe’s founder has ever since suggested that race-correlated factors may have been at play and that he never intended for COMPAS to be the sole evidence that a decision would be based upon.

This heavy public debate between Northpointe and ProPublica also ignited discussion among data science activists, experts and the general public. Six months later on 10 January, 2017, Northpointe Inc. merged with its sister company, Courtview Justice Solutions Inc. and Constellation Justice Systems Inc. as ‘Equivant’. The merger announcement also confirmed that “all existing product lines would remain intact” (Equivant, 2017). It could only be assumed that Equivant continued to compute risk assessments in broadly the same way as Northpointe had done and that Equivant’s re-branding effort was focussed solely on pivoting the narrative about ‘The Northpoint Suite’, which included COMPAS in the range of Risk Needs Assessment tools.

Zooming out from the case at hand, large-scale algorithmisation, in any domain, has common issues that need attention³. Since data is often tainted with historic and structural biases, usage of such data in training perpetuates a ‘runaway feedback loop’ or the ‘garbage in — garbage out’ phenomenon in future predications by that algorithm. This obstructs progress and keeps societies trapped in their own mirror image of status quo. Even in cases where discriminatory factors are not used in algorithms, other factors can serve as proxies for discriminatory factors. Hence, while COMPAS may not have used race as one of its factors, race could well have seeped-in through use of any other factor including pincodes, education level, poverty or joblessness — all of which have historic baggage attached to them.

Another crucial element to consider is the notion of fairness used in an algorithm’s design. For instance, while Northpointe focused on accurately predicting defendants who had a likelihood to reoffend, ProPublica was more concerned with defendants who did not end up reoffending (and were mistakenly classified as high risk). Their competing notions of fairness led to their statistical conflict, even though each was right in its own context. Since it is impossible for a model to satisfy all fairness criteria, defining and prioritising what we consider fair as a society is of utmost importance in our endeavours to balance the harms and benefits of such technological solutions.

Furthermore, cognitive biases such as the automation bias and confirmation bias, amongst others, have a likelihood of causing judges to excessively outsource their decision-making to AI facilitated scores. Strikingly, even the Loomis judgement may have inadvertently facilitated prejudice as a result of bias blind spot since COMPAS scores are theoretically supposed to be used only to determine probation conditions and not severity of punishment — something that the judgement assumed would be understood by all judges (Lee, Andrew. 2019).

Looking at how ubiquitous algorithms have become in our lives and how likely they are to have a greater degree of involvement in the future, this is a critical juncture in time to assess when data- driven decision making is ethically justifiable. In order to avoid previously discriminated and marginalised groups from falling into sustained cycles of disadvantage, it is imperative that the extent of AI’s role in fundamental domains such as education, employment and justice be re- assessed by law and policymakers and human contribution not be completely supplanted (Liu, H.W et.al. 2018). A set of standardised principles such as transparency, accountability, fairness and inclusivity need to be framed and adopted while balancing performance with social interests like racial equality (Lee, Andrew. 2019). Barriers to ethical usage, such as the trade secret law, need reformation in terms of providing recourse where legitimate issues are exposed. Finally, the biggest challenge for today is the need for critical thought on ways to mitigate historical and structural biases from training data.

Footnotes

¹ Designed to asses a defendant’s risk of recidivism, that is, the potential risk that the defendant will commit a crime in the future and their needs in terms of housing, rehabilitation etc. (Equivant. 2019)

² An independant non-profit investigative journalism organisation.

³ While some critics have labeled predictive algorithms a form of “tech-washing” that gives racial bias the appearance of objectivity, it may not always be the case (Lau, Tim. 2020).

BIBLIOGRAPHY

  1. Angwin, Julia; Larson, Jeff; Mattu, Surya; and Kirchner, Lauren. (2016, 23 May) Machine bias. ProPublica. Accessed online: https://www.propublica.org/article/machine-bias-risk- assessments-incriminal-sentencing
  2. Angwin, Julia; Larson, Jeff; Kirchner, Lauren; and Mattu, Surya. (2017, 5 April) Minority neighborhoods pay higher car insurance than white neighborhoods with the same risk. ProPublica, co-published with Consumer Reports. Accessed online https://www.propublica.org/ article/minority-neighborhoods-highercar-insurancepremiums- white-areas-same-risk Criminal justice by algorithm
  3. Angwin, Julia. (2016) Sample COMPAS Risk Assessment COMPAS — “CORE”. Accessed online https://www.documentcloud.org/documents/2702103-Sample-Risk-Assessment- COMPAS-CORE.html
  4. Angwin, Julia; Larson, Jeff. (2016, 29 July) Technical Response to Northpointe. ProPublica. Accessed online: https://www.propublica.org/article/technical-response-to-northpointe
  5. Barenstein, Matias. (2019, 8 July) ProPublica’s COMPAS Data Revisited. arXiv. Accessed online https://arxiv.org/pdf/1906.04711.pdf
  6. Bathaee, Yavar. (2018). The Artificial Intelligence Black Box and the Failure of Intent and Causation. Harvard Journal of Law and Technology. Accessed online https:// jolt.law.harvard.edu/assets/articlePDFs/v31/The-Artificial-Intelligence-Black-Box-and-the- Failure-of-Intent-and-Causation-Yavar-Bathaee.pdf
  7. Deiterch, William; Mendoza, Christina; and Brennan, Tim. (2016, 8 July) COMPAS risk scales: Demonstrating Accuracy equity and predictive parity. [Report] Northpointe, Inc. Research Department. Accessed online: https://www.documentcloud.org/documents/2998391- ProPublicaCommentary-Final07061 6.html
  8. Diakopoulos, Nicholas; Friedler, Sorelle. (2016, 17 November). How to hold Algorithms Accountable. MIT Technology Review. Accessed online https://www.technologyreview.com/ 2016/11/17/155957/how-to-hold-algorithms-accountable/
  9. Dressel; Farid. (2018) The accuracy, fairness, and limits of predicting recidivism in Science Advances The Age of Secrecy and Unfairness in Recidivism Prediction https:// www.ncbi.nlm.nih.gov/pmc/articles/PMC5777393/
  10. Equivant. Accessed at https://www.equivant.com/northpointe-risk-need-assessments/
  11. Equivant. (2017, 10 January). CourtView Justice Solutions Inc., Constellation Justice Systems Inc., and Northpointe Inc. Announce Company Rebrand to Equivant. Accessed online https:// cjisweb.s3.amazonaws.com/s3fs-public/CourtView%20Justice%20Solutions%20Inc. %2C%20Constellation%20Justice%20Systems%20Inc. %2C%20and%20Northpointe%2C%20Inc..pdf
  12. Equivant (2019) Practitioner’s Guide to COMPAS Core http://www.equivant.com/wp-content/ uploads/Practitioners-Guide-toCOMPAS-Core-040419.pdf
  13. Flores, A.W.; Lowenkamp, C.T.; and Bechtel, K. (2016). False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias”. Community Resources for Justice. Accessed online http://www.crj.org/assets/2017/07/9_Machine_bias_rejoinder.pdf
  14. Humerick, Jacob. (2020, 15 April) Reprogramming Fairness: Affirmative Action in Algorithmic Criminal Sentencing. Columbia Human Rights Law Review. Accessed online http:// hrlr.law.columbia.edu/hrlr-online/reprogramming-fairness-affirmative-action-in-algorithmic- criminal-sentencing/
  15. Jackson, Eugenie & Mendoza, Christina (2020, 31 March) Setting the Record Straight: What the COMPAS Core Risk and Need Assessment Is and Is Not. Harvard Data Science Review. Accessed online https://hdsr.mitpress.mit.edu/pub/hzwo7ax4/release/4
  16. Lau, Tim. (2020, 1 April). Predictive Policing Explained. Brennan Center For Justice. Accessed online https://www.brennancenter.org/our-work/research-reports/predictive-policing-explained
  17. Lee, Andrew. (2019, 19 February). Injustice Ex Machina: Predictive Algorithms in Criminal Sentencing. UCLA Law Review. Accessed online https://www.uclalawreview.org/injustice-ex- machina-predictive-algorithms-in-criminal-sentencing/
  18. Liu, H.W; Lin, C.F.; and Chen, Y.J. (2018, 20 December). Beyond State v. Loomis: Artificial Intelligence, Government Algorithization, and Accountability. International Journal of Law and Information Technology. Accessed online https://papers.ssrn.com/sol3/papers.cfm? abstract_id=3313916
  19. Mesa, Natalia. (2021, 13 May). Can the criminal justice system’s artificial intelligence ever be truly fair? Accessed online https://massivesci.com/articles/machine-learning-compas-racism- policing-fairness/
  20. Rudin, Cynthia; Wang, Caroline; and Coker, Beau. (2020, 31 March). Broader Issues Surrounding Model Transparency in Criminal Justice Risk Scoring. Harvard Data Science Review. Accessed online https://hdsr.mitpress.mit.edu/pub/8jy98s9q/release/1
  21. Sandvig, Christian, Hamilton, Kevin, Karahalios, Karrie, & Langbort, Cedric. (2014). Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and Discrimination: Converting Critical Concerns Into Productive Inquiry, 22. Accessed online http://www-personal.umich.edu/~csandvig/research/Auditing%20Algorithms%20-- %20Sandvig%20 — %20ICA%202014%20Data%20and%20Discrimination%20Preconference.pdf
  22. Spielkamp, Matthias. (2017, 12 June). Inspecting Algorithms for Bias. MIT Technology Law Review. Accessed online https://www.technologyreview.com/2017/06/12/105804/inspecting- algorithms-for-bias/
  23. State v. Loomis. 881 N.W.2d 749 (Wis. 2016).
  24. Walch, Kathleen. (2019, 26 July). The Growth of AI Adoption in Law Enforcement. Accessed
  25. online https://www.forbes.com/sites/cognitiveworld/2019/07/26/the-growth-of-ai-adoption-in-law-enforcement/?sh=730b6858435d
  26. Washington, Anne. (2021). How to argue with an algorithm: Lessons from the COMPAS-ProPublica Debate. Colorado Technology Law Journal. Accessed online http://ctlj.colorado.edu/wp-content/uploads/2021/02/17.1_4-Washington_3.18.19.pdf
  27. Yong, Ed. (2018, 17 January). A Popular Algorithm Is No Better at Predicting Crimes Than Random People. Accessed online https://www.theatlantic.com/technology/archive/2018/01/equivant-compasalgorithm/550646/

--

--

Mallika Chawla

Hey, thanks for checking out my page! I share knowledge on UX research, AI, law + tech, and more! #Genderdata for equality.