Does capital punishment work? I hate the idea of the death penalty - not because I think there’s a load of good evidence that it doesn’t really deter criminals, or because I know the exact number of people put to death who later turned out to be innocent. I haven’t actually really taken a proper look at the evidence at all. Honestly, my real reason for opposing the death penalty is that it feels wrong - I don’t think a civilised country should be killing criminals, regardless of how heinous their crimes were. Depending on your moral intuitions, that statement is likely to either resonate with you a lot or barely at all, but I feel very strongly instinctively that capital punishment is wrong. Obviously, if there was overwhelming evidence that capital punishment had a huge deterrant effect that would save lots and lots of lives, I could (in theory) be swayed. But how good actually is the evidence?
Among the best papers on the deterrence effect of capital punishment is Dezbakhsh and Shepherd (2007), which uses a really nice design. In the 1970s, the United States Supreme Court imposed a moratorium on executions - I wasn’t aware of this (am I too young? Too British? Too ignorant?), but there were apparently zero executions in the United States between 1967 and 1977, and states only brought back executions gradually. This gives us a pretty great opportunity for a quasi-experiment - you can compare the murder rate directly before and after the moratorium is introduced, and see whether capital punishment appears to have a deterrent effect or not. The chart above shows the main result - the number of executions is pretty clearly negatively correlated with the number of murders, and there at least seems to be a strong deterrent effect. Here’s the conclusion from the paper:
The results are boldly clear: Executions deter murders, and murder rates increase substantially during moratoriums. The results are consistent across before-and-after comparisons and regressions regardless of the data’s aggregation level, the time period, or the specific variable used to measure executions.
The robustness checks look pretty legit, and I’m generally impressed with the study design. There are other papers I also find convincing. The moratorium in the United States seems like one of the best opportunities to figure out whether there really is a deterrent effect from capital punishment - while we should definitely have concerns about generalisability, these papers are probably the most convincing of most of the literature.
So, let’s jump into this paper. Here, they’re using county-level post-moratorium panel data, so that’s data from after the moratorium ended. Since 1977, when the moratorium ended, up until the point that this paper was written in 2003, there had been 683 executions in 31 states, and seven other states had adopted the death penalty but not actually executed anybody. So, what change in murders do we see in counties where there was a high probability of a criminal being executed as opposed to counties where there was a low probability of being executed (controlling for a ton of other county-level variables)? The result is pretty clear - one additional execution in a county is associated with an average of 18 fewer future murders (and a 95% confident interval of 10 to 28 fewer murders). Again, the evidence looks good for the argument that the death penalty does deter people from murder.
Here’s another paper that seems to show roughly the same thing, finding that each extra execution in a state is associated with three fewer murders. Some people think that the death penalty would probably only deter criminals who are ‘rational’ in some way - they might not work for crimes of passion or husbands murdering their wives (or vice versa). The authors here find that even this isn’t true, and seemingly the death penalty also decreases the number of murders by intimates, and that decreasing the wait times for people on death row seems to have some effect in decreasing crime.
It’s also worth looking at meta-analyses, not just the studies that seem to have the most rigorous designs. Yang and Lester (2008) seems to be the most useful one here, going over a lot of the best economics literature. Here’s the juicy bit:
For the ninety-five studies with adequate data, sixty indicated an overall deterrent effect while only thirty-five indicated a brutalization effect, a statistically significant difference (X2 = 6.58, df = 1, two-tailed p b .02). A simple unweighted mean for all of the Pearson correlation coefficients was - 0.115 (standard error = .025), a statistically significant deterrent effect (t = 4.60, df = 94, two-tailed p b .001). Weighting the coefficients by the number of estimates in each study and including studies with inadequate data also indicated a statistically significant deterrent effect.
So yeah, I’m mostly convinced that the death penalty has a deterrent effect. How strong the deterrent effect is might be up for debate, but meta-analyses I trust (i.e. the ones that look over well-crafted economics papers rather than, ahem, more dubious criminology papers) seem to indicate that there is a deterrent effect, and the best papers that I found also seem to find the same thing.
Lies, Damned Lies, and Meta-Analyses
Okay, time for a confession. The studies above are all cherry-picked, and I don’t really know if the evidence that the death penalty has a strong deterrent effect is particularly good (maybe that’s a question I can try to answer properly at some point). I’ve been annoyed in the past about how many of my articles seem to end up with a fuzzy answer - for instance, my piece on the gender equality paradox ended like this:
The literature on the Gender Equality Paradox is still pretty new, and anything we say about it should reflect that. A rational person is likely to be uncertain about the correlation between gender equality and the number of women in STEM.
This happens over and over again. Here’s the conclusion of the piece on the gender wage gap:
The main thing I think that is worth taking away from the literature on the GWG is that figuring on the effect of gender discrimination on wages is hard! You can’t just compare the amount of money that women earn to the amount of money that men earn, and you can’t just use regression with a ton of controls that are correlated with income. Experiments work a lot better to isolate the effects of discrimination, but their external validity might be pretty low (finding there is discrimination against young mothers with STEM degrees in Mexico doesn’t tell you much about discrimination against childless women in the UK in their 40s who want to work in PR). So, basically: *shrugs*.
There are tons of interesting questions where the answer is basically something like this: “It seems like X increases Y, but there are also a bunch of studies that show X not increasing Y. And these other studies showed that in one country it seems like X actually decreases Y. And there was another paper that showed that the effect of X on Y goes away when you control for Z. And in a few more studies it seems like X actually has an even bigger impact on Y than the first studies showed”. These conclusions can be pretty unsatisfying, and more importantly, it’s extremely easy to write an article that doesn’t have a conclusion like this, like I’ve done above with the death penalty. How easy is it?
Open up Elicit1 or Google scholar. Write in the question you want to answer - for this article, ‘does the death penalty reduce murders?’ will work well. Hmm, the ‘takeaway from abstract’ column results don’t look too promising for the point I want to make here, although that final study looks like it could be useful. Let’s read over that and write up a quick summary. Ah, it looks like the authors of the paper wrote another study that finds the same thing! I can slot that in there too, although I shouldn’t make it clear that it’s by the same people. How about I write ‘let’s jump into this paper’ instead of giving the names of the authors? Perfect.
But I also probably want to get a meta-analysis, because Scott Alexander has already warned people to beware the man of one study. None of the meta-analyses I found on Elicit look especially useful for me here, but I’ll check out the third one just in case it gives me something juicy. Ah, here we go. Even though the most interesting conclusion of this meta-analysis is that evidence for a deterrent effect is contingent on the type of study, it also claims that more studies than not did find an effect. I can just quote that bit and leave out any other stuff.
And you’re done! You’ve cooked up a piece that seems to be evidence-based while also making the point you wanted to make from the beginning. You get bonus points for making it clear that you don’t have any personal attachment to the position you’re arguing for (in fact, you’re instinctively sceptical of it). If you’re writing a fairly short op-ed, you don’t even need to write any proper details about the studies, you just link to the meta-analysis after giving some anecdotes that support your position.
You’re probably aware of all of this stuff already. You know about cherry-picking, and you know about bad science journalism, and you’ve encountered sensationalist charlatans. But I think some people might underestimate just how easy it is to make an argument that appears to be evidence-based just by cherry-picking studies. And it’s also easy for writers to fool themselves into thinking there is consensus in the literature by just flicking through the first few studies they find, even if they’re not intentionally trying to deceive people. And it’s also easy to conclude that the studies that support their argument seem to be extremely robust and rigorous, but the ones that oppose their argument seem to have flaws that they’ve found after looking at them extremely closely. I know I’ve made this point over and over again, but please be wary of arguments that imply that there is some strong academic consensus on a controversial topic - it’s just so easy to give that impression when it isn’t actually true.
This definitely isn’t intended to criticise Elicit - it’s certainly not a tool that mostly helps people lie! I love Elicit, it’s really useful for looking over the literature when thinking about how to answer a question, but it’s also easy to cherry-pick the answers you want. This applies to Google scholar and everything else too, I just prefer to use Elicit because it’s so easy to use.
Really great example of how easy it is to shape the story you want. Another great example I saw recently was a paper on unaffordable housing. It claimed "Almost nowhere in the United States Can You Afford the Median Rent While Making Minimum Wage."
Of course not; you're comparing the lowest incomes with the middle of the road housing cost.
You can make statistics say pretty much anything you want, but it's not healthy for actually having a productive discussion.
In machine learning, it's very common to encounter over-fitting, where your model (analogous to a "simplified summary of the literature") works for the training data ("the literature I'm citing") but not in general ("the real world"). It's standard practice to train on e.g. 80% of your data and run that model against both the training data and the randomly withheld 20%. If it's worse on the 20%, it will probably be that bad in the real world.
It would be interesting to try something like that, acknowledging that there are far fewer "data points" to split out. But it would encourage humility (and /specificity/) in simplified summaries, especially once you realize that, unless you specify the boundaries of where your summary applies, your holdout set might randomly include the effect's attempted replication with baby Eskimos.