Module 3: Purposes of Evaluations (Plausibility, Probability, Adequacy)

In general, there are three reasons why evaluations are conducted: to determine plausibility, probability, or adequacy.  As discussed in the Constraints on Evaluations module, resources for evaluations are limited and determining the reason for the evaluation can save both time and money in program budgets.

Adequacy

An adequacy assessment is conducted if stakeholders and evaluators are only interested in whether or not the goals, set by program developers, were met. For example, if a child health program seeks to reduce child mortality to 25% in selected villages, an adequacy assessment will attempt to show whether or not this 25% target was reached. The benefit of performing an adequacy assessment is that it does not require a control group, which can significantly cut the budget of an evaluation, as well as time and effort levels. However, without randomization or a control group, many indicators cannot be appropriately linked directly to the program activities.   Although limited in what can be inferred from them, adequacy assessments do show progress toward pre-determined targets, which may be sufficient to argue for increased or continued funding.(1)

Case Study: Adequacy assessment of a community nutrition program in Senegal(2)

The goal of evaluators when planning the process of evaluation of a Community Nutrition Project (CNP) in Senegal was to determine the failures and successes of program activities in order to strengthen the implementation process. Therefore, it was decided that this evaluation only required an adequacy evaluation; to look at whether or not expected process and outcome indicators were met. Process indicators, which were developed by program implementers and stakeholders included recruitment of at least 90% of all underweight children and reach 80% attendance among mothers. Monitoring tools had been developed so evaluators could determine for each child, attendance at monthly weigh-ins, attendance of the mother, and whether or not the food supplement was distributed. Although not all process targets were met, the evaluators found that nearly all indicators did improve in the expected direction. The results of this adequacy assessment were used to improve the delivery of services and recruitment strategies during the next phases of the project.

Plausibility

A plausibility assessment similarly determines if a program has attained expected goals, yet identifies changes as potential effects of program activities rather than external or confounding sources. This is possible with the use of an experimental control group.(3) Ideally, a plausibility assessment will also incorporate baseline and post-intervention data points to explicitly show improvements in target indicators.(4) Again, the benefits of having a control group over not having a control group is the ability to link program activities to program outcomes; without a control group, it is possible that, for example, all villages decreased child mortality rates and not just those where the intervention took place. Without measuring identical indicators in control villages, there is no way to link a decrease in child mortality to program activities because other external and confounding factors may have contributed. In plausibility assessments, the control groups are not required to be truly randomized; control groups can be chosen from historical epidemiological databases or internally (i.e. those individuals who were chosen for an intervention but chose not to participate in the program activities).(5) This, however, does allow for certain selection bias confounders that are not accounted for in the analysis. Therefore, the results of a plausibility assessment only truly determine that there was a difference between the control and the intervention groups that most likely, but not wholly, can be attributed to program activities.

Case Study: Plausibility assessment of a microfinance program in South Africa(6)

The Intervention with Microfinance for AIDS and Gender Equity (IMAGE) program was designed to reduce and prevent the prevalence of HIV and intimate partner violence in South Africa. The program had set goals to improve certain economic household variables, like the ability to pay back debt and meet basic household needs. These indicators were measured, as in adequacy assessments, to identify whether the pre-determined targets were met. However, the evaluation of IMAGE is better able to attribute the improvements in variables, such as ability to pay back debt, to the program activities because of the study design, which included a control group. Evaluators found that certain economic variables improved in villages where women were participating in the microfinance program, as compared to the control villages.

Probability

Like both plausibility and adequacy assessments, probability evaluations look to determine the success of a programs activities and outcomes. However, unlike the two previously discussed assessments, probability assessments use the most robust study design, randomized control trials (RCTs) to determine the true effect of the program activities on the indicators of interest.(7) This type of assessment is the most expensive and time consuming of the three so it should, ideally, only be used when evaluators and stakeholders have found it necessary for funding or research purposes; RCTs are much more involved, in terms of data collection, and more stress is placed on compliance so often increased costs associated with personnel to collect data and incentives or vouchers to improve participation and compliance rates may often drive up costs.(8) Depending on the evaluation and associated project, it may be impossible or unethical to conduct a true RCT.(9) Furthermore, this strategy is nearly infeasible if the evaluation is not discussed in the initial phases of the program planning, as a randomized control is required, and is difficult to conceive mid-intervention. An RCT involves complete randomization when selecting the intervention and control groups in order to reduce the influence of bias on the data. For example, for plausibility assessments, if the evaluators choose to use an internally-created control group (see above: plausibility), there is a risk of the control group being influenced by others in the household or village who are participating in the program (also called spillover effects). With a probability assessment, evaluators take this into consideration when choosing their control groups (ensuring physical and social distance between the groups, for example).

Case Study: Probability assessment of a breastfeeding promotion initiative in Belarus(10)

Although it is held that breastfeeding is beneficial for newborns and young children, especially in the prevention of infections, most of the research and programmatic evaluation that has been conducted produces plausibility, not probability, statements of these benefits. With a certain amount of bias and confounding variables in plausibility evaluations, there was a call for more substantial, unbiased evidence for the association between breastfeeding and risk of infection in infants. With the understanding that it is unethical to prevent a woman from breastfeeding her child, the program was developed to promote exclusive and prolonged breastfeeding among women who had shown interest in breastfeeding.  In this sense, the evaluators were able to randomize women into an intervention group (those women in chosen clinics who received the promotion initiative) and a control group (those women in clinics chosen to continue efforts as usual, with no intervention) without ethical concerns surrounding the control group.  Although clinics were randomized, the groups of women within them were generally similar in terms of age, number of children, smoking habits etc. This allows researchers to ensure that differences between the groups are due to the project and not to underlying sociodemographic characteristics. A process monitoring system was put in place to ensure that the program was properly implemented.  Data, including demographic information, breastfeeding schedules, and medical history, were collected at all well-child visits over the first 12 months. Comparisons were made based on the incidence of respiratory and gastrointestinal infections and consistency of breastfeeding habits. By measuring each of these indicators in the randomly selected control and intervention clinics, evaluators were able to determine the association between breastfeeding and infant infection risks. Evaluators found only a reduction in the risk of gastrointestinal infections among the intervention group (as much as 40% lower) but not a reduction in risk of respiratory infections.

Summary

This module was intended to highlight the details that evaluators must attend to prior to developing a research plan. Each of these categories of evaluations holds benefits and drawbacks, and choosing the correct one is based on a number of different factors: how much time and/or money is allotted for the evaluation? What kind of decisions are going to be made based on the evaluation results – policy or funding? Is it ethical or appropriate to have a control group or randomize the intervention? What is the goal of the client? The stakeholder? The evaluators? After these questions have been answered, the proper evaluation type can be chosen.

Go To Module 4: Constraints on Evaluations >>

Footnotes

(1) Habicht, J.P., Victora, C.G., and Vaughan, J.P. (1999). Evaluation designs for adequacy, plausibility, and probability of public health programme performance and impact. International Journal of Epidemiology, 28:10-18.

(2) Gartner, A., Maire, B., Kameli, Y., Traissac, P., and Delpeuch, F. (2006). Process evaluation of the Senegal-community nutrition project: An adequacy assessment of a large scale urban project. Tropical Medicine and International Health, 11(6):955-966.

(3) Habicht, J.P., Victora, C.G., and Vaughan, J.P. (1999).

(4) Global HIV M&E Information. (2008). Glossary: Plausibility evaluation.

(5) Habicht, J.P., Victora, C.G., and Vaughan, J.P. (1999).

(6) Kim, J., Ferrari, G., Abramsky, T., Watts, C., Hargreaves, J., Morison, L., Phetla, G., Porter, J., and Pronyk, P. (2009). Assessing the incremental effects of combining economic and health interventions: The IMAGE study in South Africa. Bulletin of the World Health Organization, 87:824-832.

(7) Victoria, C., Habicht, J.P., and Bryce, J. (2004). Evidence-based public health: Moving beyond randomized trials. American Journal of Public Health, 94(3):400-405.

(8) Ibid.

(9) Black, N., (1996). Why we need observational studies to evaluate the effectiveness of health care. BMJ, 312:1215-8.

(10) Kramer, M.S., Fombonne, E., Igumnov, S., Vanilovich, I., Matush, L., Mironova, E., Bogdanovich, N., Tremblay, R.E., Chalmers, B., Zhang, X., Platt, R.W. (2008). For the promotion of breastfeeding intervention trial (PROBIT) study group effects of prolonged and exclusive breastfeeding on child behavior and maternal adjustment: Evidence from a large, randomized trial. Pediatrics, 121(3):e435–e440.