Discussion about this post

User's avatar
Cole Noble's avatar

Really great example of how easy it is to shape the story you want. Another great example I saw recently was a paper on unaffordable housing. It claimed "Almost nowhere in the United States Can You Afford the Median Rent While Making Minimum Wage."

Of course not; you're comparing the lowest incomes with the middle of the road housing cost.

You can make statistics say pretty much anything you want, but it's not healthy for actually having a productive discussion.

Expand full comment
Matthew Ritter's avatar

In machine learning, it's very common to encounter over-fitting, where your model (analogous to a "simplified summary of the literature") works for the training data ("the literature I'm citing") but not in general ("the real world"). It's standard practice to train on e.g. 80% of your data and run that model against both the training data and the randomly withheld 20%. If it's worse on the 20%, it will probably be that bad in the real world.

It would be interesting to try something like that, acknowledging that there are far fewer "data points" to split out. But it would encourage humility (and /specificity/) in simplified summaries, especially once you realize that, unless you specify the boundaries of where your summary applies, your holdout set might randomly include the effect's attempted replication with baby Eskimos.

Expand full comment
4 more comments...

No posts