Policymakers Don’t Learn Much from Impact Evaluations (and what can be done about it)
Jeff Bloem was a Higher Education Solutions Network Fellow with the U.S. Global Development Lab in summer 2016.
My first professional experience in the field of international development was working as the M&E specialist for a small NGO in Western Kenya. The NGO was piloting a project and was interested in learning how, and if, the project could be improved before scaling it up. Although data collection went smoothly and the evaluation was performed adequately, by the time I was able to provide results from the study, the program was already well on its way to being scaled up—both within Kenya and into other countries.
The tricky thing about evidence-based policy is that the right people rarely have the right evidence about the right program in the right location at the right time. Although evaluations of development programs have become increasingly rigorous, similar programs often have different effects in different locations and at different times. Thus, more often than many would like to admit, policy decisions are based on much less than rigorous evidence.
In a new working paper Eva Vivalt reports on a meta-analysis performed by the research institute AidGrade of over 600 studies of international development programs and find that impact estimates “are much more heterogeneous than in other fields”. The studies were included in the meta-analysis if they properly measured impact of one of 20 different varieties of development programs, spanning from “conditional cash transfers to microfinance”. Although the fact that results of impact evaluations differ between contexts is not especially surprising, the measured variation within the observed sample of studies was much larger than expected.
In a recent blog post Vivalt wrote the following:
“In particular, if one tried to use data from the earlier studies to predict the results of later studies, no matter how one tried to model the results or which methods one used to make predictions, it would seem unusual for a prediction to come within 50 percent of the observed value.”
All this suggests that policymakers (i.e. anyone who makes decisions about allocating resources for achieving some development objective) don’t actually learn very much from impact evaluations. There are many ways in which one could address this issue.
Perhaps we simply need to remind ourselves that most impact evaluation methods are designed to satisfy internal validity rather than external validity. Perhaps better methods need to be developed for aggregating research results from different and diverse contexts. Perhaps more work should be expended on diagnosis and prescribing development treatments rather than searching for the all-encompassing elixir that works for everyone everywhere. Perhaps development organizations should implement new evaluation methods that allow for tighter feedback loops, quicker learning, and program adaptation.
Over the summer months, I am working in the Office of Evaluation and Impact Assessment, within the U.S. Global Development Lab at USAID, assisting with the MERLIN Program. The MERLIN Program is made up of a mix of organizations each with their own specialty and strengths spanning from randomized control trials and complex systems modeling to social network analyses and developmental evaluation.
At the beginning of June 2016, The MERLIN Program held its first annual ‘Roundtable’ meeting, with all of its partners. Through participating in this event, two important contributions of the MERLIN Program emerged before me and addressed the shortcomings of my previous work in M&E.
First, the MERLIN Program represents the beginnings of a shift in attitude about how M&E activities relate to other aspects of a development program. When I went to work in Western Kenya, I was the M&E Specialist and everyone else either managed or implemented the program. This organizational structure set M&E aside as an add-on to the organization, rather than being integral to the fundamental objectives of the project. Given that most have little more than a guess (when we are being honest) as to what development programs will work best, when, where, and with whom; a necessary objective of any development project should be to generate evidence, learn, and adapt. This means that M&E needs to be at the forefront of the next generation of development projects.
Second, the specific role of the MERLIN Program is to implement and test several innovative, and more adaptive, methods of M&E within the contexts of USAID. In Western Kenya I managed a quasi-experimental evaluation with two data collection waves, when in hindsight what was actually needed was an evaluation methodology that provided more rapid feedback on various aspects of the program’s design. In order to expand the learning from M&E activities, the M&E toolbox itself needs to be dramatically expanded. Impact evaluations (either randomized control trials or quasi-experiments) play a huge and necessary role, but other methodologies are needed for situations where the environment is complex and change occurs in a nonlinear manner.
The MERLIN Program is currently piloting and testing each of its mechanisms. At the end of this initial stage, the MERLIN Program hopes to make recommendations for which of these mechanisms are suited for wider adoption into the MERL toolkit of USAID Operating Units and Implementing Partners. If these mechanisms prove to be relatively beneficial, then policymakers may (someday) begin to learn and adapt much more from their M&E activities.