On Recht's bureaucratic theory of statistics
Engaging in the statistics wars. Does it make sense to talk about ex ante policy on its own?
Like most PhD students several years into their degree, I spend a non-trivial amount of time caught somewhere between cynicism and angst. What is all this statistical theory for, exactly? My days are spent trying to prove esoteric theorems about martingales and type-I errors, but who cares? What are we doing here?
Ben Recht, professor at Berkeley and author of the argmin substack, has also been trying to figure out what statistics is up to as a discipline. I've learned a lot from his blog (I particularly recommend his collection on Paul Meehl's course) and I share many of his criticisms of statistics run amok. We're both, I think, wary of the over-mathematization of systems that are difficult to mathematize, and the ensuing misguided enthusiasm to throw mathematical optimization at everything.
Recht recently wrote a paper on a bureaucratic theory of statistics, which advances his view of what statistics is, and should be, all about. It's a refreshing take and adds a new dimension to the (surprisingly lively) debate about the role of statistical inference. But I find that I’m not entirely convinced by his thesis. This is me thinking through it.
Recht's view is that statistics is less in the business of truth-seeking than in the business of providing clear, transparent rules for decision-making in large organizations. That is, statistics helps facilitate decision-making in bureaucracies. He writes:
From its inception, statistics has been the mathematics of bureaucracy. It provides a numerical foundation for governance by clear, transparent, aggregate rules. Statistics helps governments measure what experts on the ground see and create reasonable metrics for consensus to move forward with policy.
Recht calls the specification of these rules ex ante policy. Concrete examples of such policies are:
The FDA deciding that a drug can enter the market if a clinical trial with 10,000 participants shows a treatment–control difference with estimated Cohen’s d at least 0.4, and the difference is statistically significant at the 0.01 level
Google committing to changing its logo if an A/B test shows that the click-through rate increases by at least 0.1%
The IRS using doubly robust confidence sequences on audit outcomes to give estimates on the tax gap each year
A genomics company searching for associations between genetic variants and a disease commits to applying the Benjamini–Hochberg procedure to control the false discovery rate among the variants it tests
"Ex ante" here means "before we run the experiment." So ex ante policies are statistical commitment devices that organizations adopt to be transparent about their decision-making.
Recht differentiates ex-ante policy from ex post inference, which concerns "drawing conclusions about the verisimilitude of theory or the nature of material reality from empirical evidence." In other words, ex post inference is what everyone thinks statistics is in the business of doing: helping us discover things about the world. (There's a reason we call it statistical inference after all!)
(The vocabulary "ex post" will automatically trigger statisticians. Recht doesn't mean ex post in the sense of illegally choosing your parameters after analyzing the data (also called peeking, p-hacking, harking, etc.) but rather trying to infer parameters from the data, which necessarily occurs after the data are in.)
Examples of ex post inference are:
Helping determine that exposure to asbestos increases your risk of mesothelioma (example)
Helping determine that smoking increases cancer risk (example)
Every discovery of a subatomic particle at CERN or any high energy particle collider (eg bosons, quarks, neutrinos, and so on) (example)
Helping detect the existence of gravitational waves (example)
Helping determine that ibuprofen reduces pain (example)
Recht's paper highlights the importance of ex ante policy, and argues both that (i) statistics has, perhaps unbeknownst to some of its more starry-eyed practitioners, always been in the business of ex ante policy making, and (ii) that it should more openly embrace this role.
What's unclear to me about Recht's thesis is the extent to which he thinks ex ante policy is the only role for statistics. Should we dismiss ex post inference entirely? On page two, Recht says seems to say no:
Rulemaking is certainly not the singular valuable application of statistics, but it has been a revolutionary and uncelebrated use. Causal inference researchers, whether potential outcomes advocates for whom all causation stems from hypothetical RCTs, or do-intervention scholars who still argue RCTs should have less relevance, should reckon with this underappreciated role of statistical theory. (emph mine)
But he is also quick to say that ex post inference doesn't work very well. He argues that the so-called scientific "discoveries" made by ex post inference are never clear cut, always messy, and more sociological than logical.
While statistics has admirable aspirations to help answer questions about ex post inference, it’s hard to find grand scientific discoveries solely enabled by RCTs or other causal inference methods. Scientific inference is rarely numerical and always cultural and heuristic.
And he frames the paper as trying to answer the question of what statistical tests actually do, since he's unconvinced that the "lofty goals" aspired to by many statisticians, which include ex post inference, are tenable:
Despite a century of rigorous development of statistical hypothesis testing and causal inference methods, it is not at all clear that Statistics actually does any of these things. We see something different when we look at the practice of statistical testing and statistical inference. Cyberneticist Stafford Beer famously asserted that “the purpose of a system is what it does.” If, after 100 years of fighting and browbeating, we see that statistical testing consistently fails to provide strong epistemological guarantees or determinations of causation, then it’s counterproductive to continue teaching our students that this is the purpose of statistics. But then, what exactly do statistical tests do?
I agree with Recht that ex ante policy is an important part of statistics. I agree that differentiating ex ante policy from ex post inference is a good move, and I agree that statisticians should remain cognizant of this divide.
But I don't think ex ante policy is the only role for statistics. So insofar as Recht is dismissing ex post inference, I disagree with him. Ex post inference will always be an important part of the game for two reasons: (i) it does help us discover things and (ii) you can’t sensibly evaluate a proposed ex ante policy without discussing its properties as a tool for ex post inference.
Regarding (i): dismissing ex post inference too quickly puts you in an awkward position. You have to start denying that statistics played any helpful role in the five examples listed above. Do we actually not know if asbestos or cigarettes are bad for you? Does none of our medicine work? All of these discoveries relied on statistical inference because the effects are difficult to register with your own senses. (Try and feel a gravitational wave.) They require sorting out signal from noise, comparing two datasets and trying to determine if there’s any significant difference between them. This is precisely the domain of statistics.

Elsewhere, Recht has indeed disputed some examples of ostensible ex post inference. He questioned whether we really discovered the Higgs Boson, for instance. His criticism seems to rely on the fact that the process was messy and the statistical model very complicated. Not everyone was immediately convinced, and there are still doubters. As he put it above, the inference wasn't purely numerical but also cultural and heuristic.
But this isn't particularly damning of statistics. Scientific discovery is always messy, even in domains that rely little on formal statistics. Darwin’s theory of natural selection took decades to gain acceptance; evidence slowly dribbled in from fossils, embryology, comparative anatomy, and biogeography. So too with plate tectonics: Wegener’s idea of continental drift was mocked for half a century until mid-20th-century oceanographic data made it undeniable. And again with the germ theory of disease. And again with the double-helix model of DNA. And so on and so forth.
Recht is correct that “it’s hard to find grand scientific discoveries solely enabled by RCTs,” but that’s too high a bar for statistics. Statistics, on its own, will rarely enable discovery. But statistics in conjunction with a detailed theory of the phenomenon of interest does enable discovery.
Statistical testing exists as a mechanism to keep us from fooling ourselves. It’s the first line of defense against naive realism—the tendency to accept uncritically what your eyes or gut tell you. It’s easy to come up with a theory of why something works, or of how to cure a disease. It’s harder to get your theory to pass muster in a clinical trial. And while doing so is not proof of correctness, it’s an important bulwark against nonsense.
If the government tells me that I have to inject myself with a virus every year, then they damn well better be able to show the efficacy of doing so in a clinical trial. I’d rather not rely on RFK Jr’s magical ability to diagnose “mitochondrial challenges” by sight. If you feel the same way, then you also value statistics as a tool for ex post inference.
Regarding (ii): If one's only consideration is ex ante policy, then the precise details of the statistical procedure don't matter. Should we set the significance level to 0.05 or 0.01? Should we use asymptotic or non-asymptotic tests? Should we use outlier-robust methods? Should we report error probabilities or risk control? Do we use FDR control or family-wise error rate? Should we report p-values or e-values (a question close to my own heart)?
From the perspective of ex ante policy, what matters is only that the procedure is transparent, well understood, and fixed ahead of time. But an infinite number of procedures satisfy these criteria. How do you justify the use of one statistical method over another? That is, on what basis do you advocate that ex ante policy be set?
Suppose we set totally arbitrary regulatory rules on the adoption of a new drug. One hundred drugs get proposed, and our rule is: select one at random. This is clear, unbiased, and transparent—everything you could want in an ex ante policy qua policy. It is also clearly unsatisfactory in nearly every way imaginable, most notably because the adoption of a new drug under this scheme has no relationship to its actual effect.
I doubt Recht would advocate for such a policy. But why? Because the policy has terrible properties as a tool of ex post inference!

When we adopt one statistical method over another, we appeal to the method's properties as a tool of inference, i.e., its ability to infer various parameters from the data. Any justification of using one statistical method over another cashes out in some reason like: "we think this gets us adequately close to the truth of the matter." Different methods have different properties, and they are more or less efficient at doing different things. We obviously do, and we obviously should, take that into account when setting policy.
Overall, Recht's take is a refreshing reminder of an important side of statistics. I like the differentiation between ex ante policy and ex post inference, and I'll be using these terms going forward. But I think his focus on ex ante policy leaves an unwary reader with the sense that ex post inference either doesn’t work or doesn't matter. (Again, it's unclear to me to what extent Recht himself actually believes this.) And that's importantly wrong.
You can also watch me ask Ben about some of this in our conversation: