Can neural networks predict the death penalty?

I recently came across an article on the use of a neural network to predict which death row inmates would be executed and which would not. The authors of “An Artificial Intelligence System Suggests Arbitrariness of Death Penalty” argued that because they were able to train a neural network to successfully predict execution decisions using only irrelevant variables, then the (human) decisions being made must be arbitrary. Confused yet? Although their neural network achieved 93% accuracy, they argue that because information about DNA testing and the quality of each defendant’s legal representation was omitted, this performance is concerning. In their words,

“What we have demonstrated here is that ANN technology can predict death penalty outcomes at better than 90%. From a practical point of view this is impressive. However, given that the variables employed in the study have no direct bearing on the judicial process raises series questions concerning the fairness of the justice system.”

That is, the neural network must have identified a useful predictive pattern in the data, but in a sense it was “not supposed to,” so a pattern may exist where one should not be.

There are several problem with the arguments in and conclusion of this paper.

First, I don’t think the authors interpreted their result correctly. “Arbitrariness” was not at all demonstrated (despite the paper title). The neural network identified some sort of pattern in the data set that allowed it to successfully predict the outcome for 93% of previously unseen inmates. If they were executed “arbitrarily” (i.e., a random decision was made for each inmate), then the neural network would not have been able to learn a successful predictor. Instead, if the features really are irrelevant to the judicial process (they include sex, race, etc.), then high performance of the neural network instead shows bias in the system. There is some sort of predictive signal even in features that shouldn’t directly affect execution decisions.

Second, I’m not convinced that the features really are irrelevant. While sex, race, month of sentencing, etc., should (presumably) not be deciding factors in who gets executed, “type of capital offense” sounds quite relevant to me. If the neural network placed a heavy weight on that feature, I would be much less concerned than if it placed a high weight on “sex”. What was the neural network’s performance if the capital offense features were omitted? In fact, it would be interesting to use a machine learning feature selection method to pick out the “most useful” features from the 17 used in this study, to help identify any bias present.

Finally, the evaluation was quite limited, so our confidence in the conclusions should also be limited. The authors trained a single neural network on a single training set and evaluated it on a single test set. More typical methodology would be to use cross-validation, splitting the data set into, say, 10 test sets and, for each one, training a network on the remaining 9. This yields a much better estimate of generalization performance. Also, what about other machine learning methods? Is 93% achieved only by a neural network? What about a support vector machine? (SVMs have been shown to out-perform neural networks on a variety of problems.) What about a decision tree, which would yield direct insight into the decisions being made by the learned model? For that matter, what about neural networks with other network structures? Why was a network with a single hidden layer of five nodes used? Was that the only one that worked?

Naturally, my critique comes from a machine learning perspective. I have no legal training. I would be very interested in any insights or opinions on this work from those who do have a legal background. What is the value of this kind of study to the field? Is this an important subject to investigate? How could the results be used to positive benefit? What other questions were left unanswered by the authors of this paper?

5 Comments
5 of 5 people learned something from this entry.

  1. Susan said,

    March 9, 2009 at 2:23 am

    (Learned something new!)

    Wow. That’s fascinating and disturbing. And while I understand that the goal here is not to find the optimal predictive method or even the very most accurate, I do think that a goal should be to shed some light on what factors are predictive.

    In their defense, I think “arbitrary” is probably used in a less precise way here to mean, “based on personal whims rather than the pursuit of justice.” Thus, I think overwhelming bias is exactly what they believe they have found.

  2. Katie said,

    March 10, 2009 at 9:35 am

    (Learned something new!)

    I second what Susan said. That is a lot to think about, and a bit scary.

  3. Lowell said,

    March 13, 2009 at 11:03 pm

    (Learned something new!)

    It sounded like they were only out to prove their own bias and that they could have improved their experiment in a number of ways as stated in the review. To me it was obvious that the “researchers” are against the death penalty and used this model to try and discredit the practice instead of advancing the science of AI. But that’s just my two cents.

  4. Will Dwinnell said,

    March 27, 2009 at 12:46 pm

    (Learned something new!)

    I wonder to what extent the model inputs might be confounded with more directly relevant factors. For instance, while gender in itself is not a reason to put someone to death, it might very well be that gender is related to the nature of the crime. Suppose, for instance, (and I am making this up for illustration’s sake) men in this population commit capital crimes which involve the death of a police officer at a rate of 28%, whereas women do this at a rate of 4%. A model which does not have direct access to a variable indicating the death of an officer might very well exploit the correlation to gender.

    I suggest that a better test would have been an investigation of the incremental improvement of a model built on legitimate variables, once irrelevant ones are included.

  5. Yong said,

    April 7, 2009 at 11:05 am

    (Learned something new!)

    This is a very interesting study – perhaps more in terms of the questions raised than the conclusions and methodology. I would think a good question would be to see which of the factors are most predictive of the outcome – but that’s probably more a statistical correlation study than a neural network one.

    I suppose I would not be necessarily surprised that jury decisions on capital punishment can be successfully modeled by a neural network based on certain inputs, some of which we might think is reasonable/sensible (# of prior crimes) and others which we would find troubling (race, e.g.).

    If as a society, we thought that certain “troubling” factors were “inordinately” driving decisions and wanted to change that, perhaps the process could be modified in light of these findings to be more “fair”? (All these are in quotes, since each such term represents a value judgment.)

    Now, should we use properly trained neural networks in lieu of juries? (I suspect not, but an amusing thought.)

    I would think that a study of legal outcomes (and law generally) by using scientific methods should be of great interest. There is a way to approach law by using economic methods – “law and economics”, but because lawyers aren’t good economists/mathematicians generally (and economists/mathematicians not good lawyers), I don’t know if the results are so good (at least to date)…

    Definitely an interesting article and analysis.

Post a Comment

I knew this already. I learned something new!