Why Randomized Controlled Trials Work in Public Health…and Not Much Else

If there is a next big thing at the moment in the field of economics, it is the application of techniques from medical research—specifically, “randomized controlled trials,” or RCTs—to assess the effectiveness of development or other government-initiated projects. The general thrust of the work is as simple as it is brilliant: rather than employ complex, and often unreliable, statistical methods to tease out the extent to which a project or policy actually had a beneficial impact on intended beneficiaries, why not follow the tried and true methods employed by pharmaceutical researchers to assess the efficacy of medical treatments? The steps are: 1) randomly divide the experimental population into a treatment group that receives “benefits” from the program, and a control group that doesn’t; 2) assess outcomes for both groups; and 3) determine whether or not a significant difference exists between the two groups.

This approach, championed by MIT economist Esther Duflo (winner of the prestigious Bates Clark Medal, granted to the most promising economist under the age of forty) among others, is a marked improvement on the status quo in development aid, which not infrequently involves assessing program effectiveness by simply checking whether or not the money was spent. Clearly, a world in which aid money is allocated to projects that actually benefit people is preferable to one in which money goes to whoever most effectively maneuvers to get the money to start with and then most reliably manages to spend it. In this way, the use of RCTs in development arguably advances the goal of aid effectiveness.

So, to be clear, RCTs quite credibly represent the “gold standard” in program assessment. If other methods are used, it is usually because RCTs are too expensive or otherwise impractical. Describe your favorite entrepreneurial initiative to an economist and the most probable skeptical response you’ll receive is: “Sounds wonderful, but where’s the evidence of effectiveness? In order to find out whether or not it worked, you really need to do a randomized controlled trial.”

So what is the problem with applying RCTs to development?  The Achilles heel of RCTs is a little thing known to the statistically inclined as “external validity”—a phrase that translates informally to “Who cares?” (In a future post I’ll elaborate on another fundamental flaw of RCTs, which is technically termed the “non-stationarity” of underlying processes.)

The concept of external validity is straightforward. For any assessment, “internal validity” refers to the mechanism of conducting a clinical trial, and the reliability of results on the original setting. A professionally conducted RCT that yields a high level of statistical significance is said to be “internally valid.” However, it is fairly obvious that an intervention rigorously proven to work in one setting may or may not work in another setting. This second criterion—the extent to which results apply outside the original research setting—is known as “external validity.” External validity may be low because the populations in the original and the new research setting are not really comparable—for example, results of a clinical trial conducted on adults may not apply to children. But external validity may also be low because the environment in the new study setting is different in some fundamental way, not accounted for by the researcher, from the original study setting. Econometric studies that seek to draw conclusions about effectiveness from data that span large geographical areas or highly varied populations thus typically have lower levels of internal validity, but higher levels of external validity.

So, once again, the fundamental issue is not the purity of the methodology employed (as exciting as such methodological purity is to the technically inclined) but rather the inherent complexity of the world being studied.

For this precise reason, it turns out that those who most vociferously and naïvely advocate that we apply techniques from public health to economics (a group that does not include Esther Duflo) make a fundamental error.  They fail to appreciate the fact that, when it comes to external validity, public health is the exception that proves the rule. Indeed, in aid-led development in general, of the few real historical successes, nearly all are in public health. Outside of public health, few of the large-scale, top-down development programs have in fact succeeded.

Why is this? Multiple conjectures are possible. But one persuasive one is this: when it comes to biophysical function, people are people. For this reason, a carefully developed medical protocol (read “recipes”) proven to be effective for one population is highly likely to work for another population. The smallpox vaccine tested on one population tended to work on other populations; this made it possible to eradicate smallpox. Oral rehydration therapy tested on one group of children tended to work of other groups of children; millions of children have been spared preventable deaths because the technique has been adopted on a global basis. Indeed, medical protocols have such a high level of external validity that, in the United States alone, tens if not hundreds of thousands of lives could be saved every year through a more determined focus on adherence to their particulars.[1]

These huge successes were achieved, and continue to be achievable, though bold action taken by public health officials. They are rightly celebrated and encouraged, but—outside of other public health applications—not easily replicated. Successes in medicine contrast sharply with failures in other domains. Decades of efforts to design and deploy improved cook-stoves—with the linked aims of reducing both deforestation and the illness and death due to indoor air pollution—have so far primarily yielded an accumulation of Western inventions maladapted to needs and realities in various parts of the world, along with locally developed innovations that cannot be expanded to meet the true scale of the challenge.[2] For development programs in general, and RCTs in particular, public health is the exception that proves the rule.

What does work in areas outside of public health? How is it possible to design, test, and implement effective solutions in environments where complexity and volatility are dominant? The general principle applies: Success requires adaptability as well as structure, flexibility as well as structure—a societal capacity to scale successful efforts combined with an ingrained practice of entrepreneurial exploration. As the uniquely insightful Mancur Olson wrote in his classic Power and Prosperity (pp. 188-189):

Because uncertainties are so pervasive and unfathomable, the most dynamic and prosperous societies are those that try many, many things. They are societies with countless thousands of entrepreneurs who have relatively good access to credit and venture capital.

What works in development, according to Olsen, is entrepreneurial exploration. Why? Because we don’t know what works.

[1] See Atul Gawande (2009), Checklist Manifesto: How to Get Things Right. New York: Metropolitan Books.

[2] Burkhard Bilger (2009), Hearth Surgery: The Quest for a Stove That Can Save the World. The New Yorker. December 21