Evaluation is a complicated concept and for it to generate valid conclusions, it needs a carefully thought approach. Such complications emanate from the fact that there are numerous factors that influence the manner in which conclusions are interpreted as well as the outcomes. These factors are usually in a complex interrelationship, which is further catalyzed by the context within which evaluation is being conducted.
Nevertheless, there are numerous evaluation designs and methods through which validity of an evaluation’s outcomes is determined depending on the prevailing needs and conditions. These designs are complex and detailed; their similarities and differences are inexhaustible. Nevertheless for the purpose of this essay comparisons and contrasts are made with reference to outcomes several observations made with regard to outcomes
One-group evaluation designs are among the simplest methods of arriving at conclusions. These designs are intended to demonstrate whether an evaluator’s informational needs have been met at a particular point in time. As such, findings within One-group evaluation designs can only be used with reference to that particular time and are not indicative of conclusions at a different time. This does not mean that One-group evaluation designs are not useful.
Within the One-group evaluation designs, there are the pretest/post test as well as posttest only designs. Evaluation of an effective law compliance program is best evaluated through the posttest only design since it is illogical to pretest the compliance with law before an intervening program is instituted. One-group evaluation designs are limited in effectiveness, but such limitations are effectively addressed the more complex evaluation methods generally referred to as quasi-experimental designs, which includes the time-series designs, Selective Control Design , nonequivalent control group designs among others. Time-series designs “increases the interpretability of an evaluation by extending the periods of observations over time” and indicates that findings are also limited within an extended period of time (Posavac, 2011).
Similarly, nonequivalent control group designs extend the interpretability of findings by incorporating more than one study group. Like the One-group evaluation designs, quasi-experimental designs are also effective in evaluating law enforcement programs. For instance, the time series design has been effectively used to test the compliance with pre-marriage HIV testing law in Illinois. An evaluation of compliance with the program was conducted over 116 months after its introduction (Posavac, 2011).
Like the One-group evaluation designs, quasi-experimentation designs could only posttest compliance with the pre-marriage HIV testing law since pre-program conditions did not necessitate its evaluation. Additionally, the findings are only limited for the extending to 116 months, are only limited to those partners intending to marry and only factor in the effectiveness of the program in Illinois. Therefore, evaluation outcomes of all the evaluation methods are tentative indicators rather than absolutely conclusive findings.
Evaluation designs are intended to generate valid conclusions. But as it occurs from time to time, there are some external influences that affect an evaluator’s degree of certainty with regards to validity of outcomes. These are commonly referred to as threats to internal validity; they are detailed and their similarity within which they are manifested across all evaluation designs cannot be exhaustively discussed within this essay. Nevertheless they can be enumerated as Maturation, historical occurrences, participant’s selection criteria, attrition, testing criteria and the measurement methods.
While threats to internal validity are found to significantly mediate in determining the certainty of validity in all evaluation designs, they do so in varied fashion. For instance, regression towards the mean outcome is found to impinge the validity of outcomes to various extents depending on evaluation design.
Using the time series design, it was found out that the number of marriage certificates issued in Illinois after the introduction of a pre-marriage HIV testing law dropped by 14 % in Illinois, but stayed constant in other states with a similar law. The drop in Illinois is likely to create a false impression on the effectiveness of the law, but in real sense couples from Illinois obtained marriage certificates from adjacent states that had no such laws.
Similarly, with reference to one-group evaluation design, regression was found to influence the validity of perceived outcomes. For instance, changing the law to divert federal funds from foster families to biological families experiencing financial difficulties experienced a significant drop in foster parenthood. While such a drop was achieved over a long period of time, measuring the changes at particular times reveals a +/-25 % fluctuation levels (Posavac, 2011).
Thus, if measurement was done at a time when regression towards the mean was highest, then the program is likely to be termed as effective. But to negate the influence of self selection on the effectiveness of the program, is vital to included a selectively controlled group; issuance of certificates in adjacent states that had no such law. This includes selectively controlled group; those couples from Illinois intending to marry without taking the test. Thus, Selective Control Design seems relevant in this case, as it allows for the inclusion of couples not affected by the program (Shadish, Cook and Campbell, 2002). Nevertheless outcomes from all designs are generally affected by threats to internal validity.
The pre-marriage HIV testing law described above focuses only couples planning to get married. Similarly, the law changing foster care funding only focuses on children from abusive families. Evaluation of the effectiveness of such program, as previously explained can be done using both one-group and time based designs. While each of the evaluation designs is most likely to generate different outcomes, it is evident that each of the evaluation design is only effective in generating valid results if the participants share similar needs. For instance, it would be illogical for evaluators to incorporate partners without marriage plans as part of the non-program control group as the needs for this particular group of participants fail to match evaluation criteria.
The analysis above indicates that all the evaluation methods are focused on outcomes at the end of the program. This implies that evaluation methods indicated herein are summative in nature. But to what extend is this similarity evident? Trochim (2006) asserts that summative evaluation has various considerations. As already indicated law enforcement agencies can only evaluate the effectiveness of a law using the posttest only design such that the effectiveness of the pre-marriage HIV testing law can only be determined at the end of the program.
Similarly, the actual effectiveness of the same law using the any of the quasi-experimentation designs can only be effectively determined if the behavior of participants is observed for an extended period of time after the end of the program. This can enable evaluators to determine whether any of the threats to validity influenced outcomes during the program. Similarly, while experimental evaluation design is claimed to have more valid outcomes than other designs, it only effectively evaluates the influence of threats to validity at the end of the program. Thus with reference to evaluating the outcomes and impacts of a program at the end of it, all the aforementioned evaluation designs show similarities.
As indicated by Trochim (2006) summative evaluation has various considerations, which includes meta-analysis of outcomes; meta-analysis involves integrating estimates of multiple studies to come up with an aggregated summary judgment. The effectiveness of the pre-marriage HIV testing law was determined by evaluating the outcomes after introduction of the law in Illinois as well as well as outcomes in other states where the law was operational. Additionally comparisons were made about the trends in issuance of marriage certificates in adjacent states without such a law. The outcomes were evaluated for a long period of time and also involved different sets of participants.
However, if the outcome of the pre-marriage HIV testing law was to be determined at a particular time in Illinois, then such comparisons would not be possible. This indicates that one group design is dissimilar to quasi-experimental designs as far as meta-analysis of outcomes is concerned. There are also dissimilarities within the quasi-experimental designs as far as meta-analysis of outcomes is concerned. While time-series designs analyze the outcomes of a program at different times, nonequivalent control group designs aggregate outcomes involving different groups (Shadish, Cook and Campbell, 2002). This is demonstrated through the manner in which the outcomes of the pre-marriage HIV testing law were validated.
While evaluation outcomes of all the evaluation methods are tentative indicators rather than absolute conclusive findings, the level of uncertainty varies depending on the evaluation design in question. The level uncertainty with regards to validity of outcomes would be significantly high if one-group design is used to evaluate the effectiveness of a complex program. For instance effectiveness of the pre-marriage HIV testing program can only be derived through quasi-experimental designs; evaluating its effectiveness over an extended period of time, in this case 116 months, and aggregating the findings.
Time-series design is likely to generate a lower level of uncertainty as compared to any of the one-group designs; one group designs would only in this case evaluate the effectiveness of pre-marriage HIV testing law at a particular point in time (Posavac, 2011). But are there contrasts within the quasi-experimental designs a far as certainty of outcomes is concerned? Yes, depending on the type of outcome desired.
Referring to the law diverting federal funds from foster families to biological families, the scenario is likely to demonstrate such subtleties. The level of certainty is likely to be low with regards to evaluating the validity of outcomes over an extended period of time than it would with regards to outcomes involving more that one set of participants. On the other hand, the level of certainty is likely to be low if nonequivalent control group design is used to evaluate outcomes involving more than one set of participants than it would evaluating the validity of outcomes over an extended period of time (Shadish, Cook and Campbell, 2002).
As indicated earlier, threats to internal validity do influence the interpretation of outcomes in almost similar fashion, regardless of the evaluation design in question. Threats to validity are enumerated as enumerated as Maturation, historical occurrences, participant’s selection criteria, attrition, testing criteria and the measurement methods (Posavac, 2011). Regression nevertheless, is complicated and influences outcomes differently across the evaluation designs. For instance, in considering the foster care funding program, regression is only a valid influence only when evaluating the effectiveness of the program on children from those abusive families that have not responded to counseling and any other correctional therapy.
Thus, in this case regression seems to be influential in any design that only factors participants in dire need of help. Thus, time-series designs, experimental designs and pretest/posttest are likely to be influenced by regression (Shadish, Cook and Campbell, 2002). However, if the program is to factor in another set of participants, such as children from families which are likely to be positively affected by counseling, then regression cannot be used as a credible interpretation of outcomes since the number of children under foster care will definitely reduce. Such a reduction will be as a result of improved conditions rather than the effects of the program.
As indeed evidenced in this essay, evaluation is a complex concept. Drawing comparisons and contrasts between these designs is in itself as complicated as the designs are. Nevertheless, an attempt to highlight such has been made and indicates an intricate interrelationship between these designs. Drawing conclusions thus ought to be undertaken from a particular approach. In this essay, comparison and contrast are made with reference to law enforcement case studies enumerated by Posavac (2011). Regardless of the complexities here in, clear distinctions have been made on the extent of similarities and differenced between the evaluation designs and methodologies.
Reference List
Posavac, E. (2011). Program evaluation: methods and case studies. London: Prentice Hall.
Trochim, W. (2006). Introduction to evaluation. Web.
Shadish, W., Cook, T., and Campbell, D. (2002). Experimental and quasi-experimental design for generalized causal inference. Boston, MA: Houghton Mifflin.