Content warning: discussions of BMI/weight/obesity, genocide, and residential schools for indigenous children.
Imagine you are working for a nonprofit focused on children’s health and wellness in school. One of the grants you received this year funds a full-time position at a local elementary school for a teacher who will be integrating kinesthetic learning into their lesson plans for math classes for third graders. Kinesthetic learning is learning that occurs when the students do something physical to help learn and reinforce information, instead of listening to a lecture or other verbal teaching activity. You have read research suggesting that students retain information better using kinesthetic teaching methods and that it can reduce student behavior issues. You want to know if it might benefit your community.
When you applied for the grant, you had to come up with some outcome measures that would tell the foundation if your program was worth continuing to fund – if it’s having an effect on your target population (the kids at the school). You told the foundation you would look at three outcomes:
But, you say, this sounds like research! However, we have to take a look at the purpose, origin, effect, and execution of the project to understand the difference, which we do in section 23.1 in this chapter. Those domains are where we can find the similarities and differences between program evaluation and research.
Realistically, as a practitioner, you’re far more likely to engage in program evaluation than you are in research. So, you might ask why you are learning research methods and not program evaluation methods, and the answer is that you will use research methods in evaluating programs. Program evaluation tends to focus less on generalizability, experimental design, and replicability, and instead focuses on the practical application of research methods to a specific context in practice.
Learners will be able to…
Program evaluation can be defined as the systematic process by which we determine if social programs are meeting their goals, how well the program runs, whether the program had the desired effect, and whether the program has merit according to stakeholders (including in terms of the monetary costs and benefits). It’s important to know what we mean when we say “evaluation.” Pruett (2000) [1] provides a useful definition: “Evaluation is the systematic application of scientific methods to assess the design, implementation, improvement or outcomes of a program” (para. 1). That nod to scientific methods is what ties program evaluation back to research, as we discussed above. Program evaluation is action-oriented, which makes it fit well into social work research (as we discussed in Chapter 1).
A quick side note: elsewhere in the text, we’ve talked about stakeholders and gatekeepers as distinct groups of people. While this distinction is accurate and important, in this chapter, I’m only going to use the word “stakeholders” because that’s what you’ll hear in the world of program evaluation and logic models.
Often, program evaluation will consist of mixed methods because its focus of is so heavily on the effect of the program in your specific context. Not that research doesn’t care about the effects of programs – of course it does! But with program evaluation, we seek to ensure the way that we are applying our program works in our agency, with our communities and clients. Thinking back to the example at the beginning of the chapter, consider the following: Does kinesthetic learning make sense for your school? What if your classroom spaces are too small? Are the activities appropriate for children with differing physical abilities who attend your school? What if school administrators are on board, but some parents are skeptical?
The project we talked about in the introductions – a real project, by the way – was funded by a grant from a foundation. The reality of the grant funding environment is that funders want to see that their money is not only being used wisely, but is having a material effect on the target population. This is a good thing, because we want to know our programs have a positive effect on clients and communities. We don’t want to just keep running a program because it’s what we’ve always done. (Consider the ethical implications of continuing to run an ineffective program.) It also forces us as practitioners to plan grant-funded programs with an eye toward evaluation. It’s much easier to evaluate your program when you can gather data at the beginning of the program than when you have to work backwards at the middle or end of the program.
As we talked about above, program evaluation and research are similar, particularly in that they both rely on scientific methods. Both use quantitative and qualitative methods, like data analysis and interviews. Effective program evaluation necessarily involves the research methods we’ve talked about in this book. Without understanding research methods, your program evaluation won’t be very rigorous and probably won’t give you much useful information.
However, there are some key differences between the two that render them distinct activities that are appropriate in different circumstances. Research is often exploratory and not evaluative at all, and instead looks for relationships between variables to build knowledge on a subject. It’s important to note at the outset that what we’re discussing below is not universally true of all projects. Instead, the framework we’re providing is a broad way to think about the differences between program evaluation and research. Scholars and practitioners disagree on whether program evaluation is a subset of research or something else entirely (and everything in between). The important thing to know about that debate is that it’s not settled, and what we’re presenting below is just one way to think about the relationship between the two.
According to Mathison (2008) [2] , the differences between program evaluation and research have to do with the domains of purpose, origins, effect and execution.
Program Evaluation | Research | |
Purpose | Judges merit or worth of the program | Produces generalizable knowledge and evidence |
Origins | Stems from policy and program priorities of stakeholders | Stems from scientific inquiry based on intellectual curiosity |
Effect | Provides information for decision-making on specific program | Advances broad knowledge and theory |
Execution | Conducted within a setting of changing actors, priorities, resources and timelines | Usually happens in a controlled setting |
Let’s think back to our example from the start of the chapter – kinesthetic teaching methods for 3rd grade math – to talk more about these four domains.
To understand this domain, we have to ask a few questions: why do we want to research or evaluate this program? What do we hope to gain? This is the why of our project (Mathison). Another way to think about it is as the aim of your research, which is a concept you hopefully remember from Chapter 2.
Through the lens of program evaluation, we’re evaluating this program because we want to know its effects, but also because our funder probably only wants to give money to programs that do what they’re supposed to do. We want to gather information to determine if it’s worth it for our funder – or for us – to invest resources in the program.
If this were a research project instead, our purpose would be congruent, but different. We would be seeking to add to the body of knowledge and evidence about kinesthetic learning, most likely hoping to provide information that can be generalized beyond 3rd grade math students. We’re trying to inform further development of the body of knowledge around kinesthetic learning and children. We’d also like to know if and how we can apply this program in contexts other than one specific school’s 3rd grade math classes. These are not the only research considerations, but just a few examples.
Purpose and origins can feel very similar and be a little hard to distinguish. The main difference is that origins are about the who, whereas purpose is about the why (Mathison). So, to understand this domain, we have to ask about the source of our project – who wanted to get the project started? What do they hope this project will contribute?
For a program evaluation, the project usually arises from the priorities of funders, agencies, practitioners and (hopefully) consumers of our services. They are the ones who define the purpose we discussed above and the questions we will ask.
In research, the project arises from a researcher’s intellectual curiosity and desire to add to a body of knowledge around something they think is important and interesting. Researchers define the purpose and the questions asked in the project.
The effect of program evaluation and research is essentially what we’re going to use our results for. For program evaluation, we will use them to make a decision about whether a program is worth continuing, what changes we might make to the program in the future or how we might change the resources we devote going forward. The results are often also used by our funders to make decisions about whether they want to keep funding our program or not. (Outcome evaluations aren’t the only thing that funders will look at – they also sometimes want to know whether our processes in the program were faithful to what we described when we requested funding. We’ll discuss outcome and process evaluations in section 23.4.)
The effect of research – again, what we’re going to use our results for – is typically to add to the knowledge and evidence base surrounding our topic. Research can certainly be used for decision-making about programs, especially to decide which program to implement in the first place. But that’s not what results are primarily used for, especially by other researchers.
Execution is fundamentally the how of our project. What are the circumstances under which we’re running the project?
Program evaluation projects that most of us will ever work on are frequently based in a nonprofit or government agency. Context is extremely important in program evaluation (and program implementation). As most of us will know, these are environments with lots of moving parts. As a result, running controlled experiments is usually not possible, and we sometimes have to be more flexible with our evaluations to work with the resources we actually have and the unique challenges and needs of our agencies. This doesn’t mean that program evaluations can’t be rigorous or use strong research methods. We just have to be realistic about our environments and plan for that when we’re planning our evaluation.
Research is typically a lot more controlled. We do everything we can to minimize outside influences on our variables of interest, which is expected of rigorous research. Of course, some research is extremely controlled, especially experimental research and randomized controlled trials. this all ties back to the purpose, origins, and effects of research versus those of program evaluation – we’re primarily building knowledge and evidence.
In the end, it’s important to remember that these are guidelines, and you will no doubt encounter program evaluation projects that cross the lines of research, and vice versa. Understanding how the two differ will help you decide how to move forward when you encounter the need to assess the effect of a program in practice.
Learners will be able to…
Planning a program evaluation project requires just as much care and thought as planning a research project. But as we discussed in section 23.1, there are some significant differences between program evaluation and research that mean your planning process is also going to look a little different. You have to involve the program stakeholders at a greater level than that found with most types of research, which will sometimes focus your program evaluation project on areas you wouldn’t have necessarily chosen (for better or worse). Your program evaluation questions are far less likely to be exploratory; they are typically evaluative and sometimes explanatory.
For instance, I worked on a project designed to increase physical activity for elementary school students at recess. The school had noticed a lot of kids would just sit around at recess instead of playing. As an intervention, the organization I was working with hired recess coaches to engage the kids with new games and activities to get them moving. Our plan to measure the effect of recess coaching was to give the kids pedometers at a couple of different points during the year, and see if there was any change in their activity level as measured by the number of steps they took during recess. However, the school was also concerned with the rate of obesity among students, and asked us to also measure the height and weight of the students to calculate BMI at the beginning and end of the year. I balked at this because kids are still growing and BMI isn’t a great measure to use for kids and some kids were uncomfortable with us weighing them (with parental consent), even though no other kids would be in the room. However, the school was insistent that we take those measurements, and so we did that for all kids whose parents consented and who themselves assented to have their weight measured. We didn’t think BMI was an important measure, but the school did, so this changed an element of our evaluation.
In an ideal world, your program evaluation is going to be part of your overall program plan. This very often doesn’t happen in practice, but for the purposes of this section, we’re going to assume you’re starting from scratch with a program and really internalized the first sentence of this paragraph. (It’s important to note that no one intentionally leaves evaluation out of their program planning; instead, it’s just not something many people running programs think about. They’re too busy… well, running programs. That’s why this chapter is so important!)
In this section, we’re going to learn about how to plan your program evaluation, including the importance of logic models. You may have heard people groan about logic models (or you may have groaned when you read those words), and the truth is, they’re a lot of work and a little complicated. Teaching you how to make one from start to finish is a little bit outside the scope of this section, but what I am going to try to do is teach you how to interpret them and build some evaluation questions from them. (Pro-tip: logic models are a heck of a lot easier to make in Excel than Word.)
The Centers for Disease Control has a great, simple framework for planning your program evaluation project that I’m going to walk through with you below.
It has three primary steps: engaging stakeholders, describing the program and focusing the evaluation.
Stakeholders are the people and organizations that have some interest in or will be impacted by our program. Including as many stakeholders as possible when you plan your evaluation will help to make it as useful as possible for as many people as possible. The key to this step is to listen. However, a note of caution: sometimes stakeholders have competing priorities, and as the program evaluator, you’re going to have to help navigate that. For example, in our kinesthetic learning program, the teachers at your school might be interested in decreasing classroom disruptions or enhancing subject matter learning, while the administration is solely focused on test scores, while the administration is solely focused on test scores. Here is where it’s a great idea to use your social work ethics and research knowledge to guide conversations and planning. Improved test scores are great, but how much does that actually benefit the students?
Once you’ve got stakeholder input on evaluation priorities, it’s time to describe what’s going into the program and what you hope your participants and stakeholders will get out of it. Here is where a logic model becomes an essential piece of program evaluation. A logic model “is a graphic depiction (road map) that presents the shared relationships among the resources, activities, outputs, outcomes, and impact for your program” (Centers for Disease Control, 2018, para. 1). Basically, it’s a way to show how what you’re doing is going to lead to an intended outcome and/or impact. (We’ll discuss the difference between outcomes and impacts in section 23.4.)
Logic models have several key components, which I describe in the list below (CDC, 2018). The components are numbered because of where they come in the “logic” of your program – basically, where they come in time order.
The CDC also talks about moderators – what they call “contextual factors” – that affect the execution of your program evaluation. This is an important component of the execution of your project, which we talked about in 23.1. Context will also become important when we talk about implementation science in section 23.3.
Let’s think about our kinesthetic learning project. While you obviously don’t have full information about what the project looks like, you’ve got a good enough idea for a little exercise below.
So now you know what your stakeholder priorities are and you have described your program. It’s time to figure out what questions you want to ask that will reflect stakeholder priorities and are actually possible given your program inputs, activities and outputs.
Why do inputs, activities and outputs matter for your question?
Learners will be able to…
Something we often don’t have time for in practice is evaluating how things are going internally with our programs. How’s it going with all the documentation our agency asks us to complete? Is the space we’re using for our group sessions facilitating client engagement? Is the way we communicate with volunteers effective? All of these things can be evaluated using a process evaluation, which is an analysis of how well your program ended up running, and sometimes how well it’s going in real time. If you have the resources and ability to complete one of these analyses, I highly recommend it – even if it stretches your staff, it will often result in a greater degree of efficiency in the long run. (Evaluation should, at least in part, be about the long game.)
From a research perspective, process evaluations can also help you find irregularities in how you collect data that might be affecting your outcome or impact evaluations. Like other evaluations, ideally, you’re going to plan your process evaluation before you start the project. Take an iterative approach, though, because sometimes you’re going to run into problems you need to analyze in real time.
The RAND corporation is an excellent resource for guidance on program evaluation, and they describe process evaluations this way: “Process evaluations typically track attendance of participants, program adherence, and how well you followed your work plan. They may also involve asking about satisfaction of program participants or about staff’s perception of how well the program was delivered. A process evaluation should be planned before the program begins and should continue while the program is running” (RAND Corporation, 2019, para. 1) [3] .
There are several key data sources for process evaluations (RAND Corporation, 2019) [4] , some of which are listed below.
Using these data sources, you can learn lessons about your program and make any necessary adjustments if you run the program again. It can also give you insights about your staff’s needs (like training, for instance) and enable you to identify gaps in your programs or services.
A further development of process evaluations, implementation science is “the scientific study of methods to promote the systematic uptake of research findings and other evidence-based practices into routine practice, and, hence, to improve the quality and effectiveness of health services.” (Bauer, Damschroder, Hagerdorn, Smith & Kilbourne, 2015) [5]
Put more plainly, implementation science studies how we put evidence-based interventions (EBIs) into practice. It’s e ssentially a form of process evaluation, just at a more macro level. Implementation science is a r elatively new field of study that focuses on how to best put interventions into practice, and it’s i mportant because it helps us analyze on a macro level those factors that might affect our ability to implement a program. Implementation science focuses on the context of program implementation, which has significant implications for program evaluation.
A useful framework for implementation science is the EPIS (Exploration, Preparation, Implementation and Sustainment) framework. It’s not the only one out there, but I like it because to me, it sort of mirrors the linear nature of a logic model.
The EPIS framework was developed by Aarons, Hurlburt and Horwitz (first published 2011). (The linked article is behind a paywall, the abstract is still pretty useful, and if you’re affiliated with a college or university, you can probably get access through your library.) This framework emphasizes the importance of the context in which your program is being implemented – inner, organizational, context and outer, or the political, public policy and social contexts. What’s happening in your organization and in the larger political and social sphere that might affect how your program gets implemented?
There are a few key questions in each phase, according to Aarons, Hurlburt and Horwitz (2011) [6] :
Implementation is a new and rapidly advancing field, and realistically, it’s beyond what a lot of us are going to be able to evaluate in our agencies at this point. But even taking pieces of it – especially the pieces about the importance of context for our programs and evaluations – is useful. Even if you don’t use it as an evaluative framework, the questions outlined above are good ones to ask when you’re planning your program in the first place.
Learners will be able to…
A lot of us will use “outcome” and “impact” interchangeably, but the truth is, they are different. An outcome is the final condition that occurs at the end of an intervention or program. It is the short-term effect – for our kinesthetic learning example, perhaps an improvement over last year’s end-of-grade math test scores. An i mpact is the long-term condition that occurs at the end of a defined time period after an intervention. It is the longer-term effect – for our kinesthetic learning example, perhaps better retention of math skills as students advance through school. Because of this distinction, outcome and impact evaluations are going to look a little different.
But first, let’s talk about how these types of evaluations are the same. Outcome and impact evaluations are all about change. As a result, we have to know what circumstance, characteristic or condition we are hoping will change because of our program. We also need to figure out what we think the causal link between our intervention or program and the change is, especially if we are using a new type of intervention that doesn’t yet have a strong evidence base.
For both of these types of evaluations, you have to consider what type of research design you can actually use in your circumstances – are you coming in when a program is already in progress, so you have no baseline data? Or can you collect baseline data to compare to a post-test? For impact evaluations, how are you going to track participants over time?
The main difference between outcome and impact evaluation is the timing and, consequently, the difficulty and level of investment. You can pretty easily collect outcome data from program participants at the end of the program. But tracking people over time, especially for populations social workers serve, can be extremely difficult. It can also be difficult or impossible to control for whatever happened in your participant’s life between the end of the program and the end of your long-term measurement period.
Impact evaluations require careful planning to determine how your follow-up is going to happen. It’s a good practice to try to keep intermittent contact with participants, even if you aren’t taking a measurement at that time, so that you’re less likely to lose track of them.
Learners will be able to…
In a now decades-old paper, Stake and Mabry (1998) [7] point out, “The theory and practice of evaluation are of little value unless we can count on vigorous ethical behavior by evaluators” (p. 99). I know we always say to use the most recent scholarship available, but this point is as relevant now as it was over 20 years ago. One thing they point out that rings particularly true for me as an experienced program evaluator is the idea that we evaluators are also supposed to be “program advocates” (p. 99). We have to work through competing political and ideological differences from our stakeholders, especially funders, that, while sometimes present in research, are especially salient for program evaluation given its origins.
There’s not a rote answer for these ethical questions, just as there are none for the practice-based ethical dilemmas your instructors hammer home with you in classes. You need to use your research and social work ethics to solve these problems. Ultimately, do your best to focus on rigor while meeting stakeholder needs.
One of the most important ethical issues in program evaluation is the implication of not evaluating your program. Providing an ineffective intervention to people can be extremely harmful. And what happens if our intervention actually causes harm? It’s our duty as social workers to explore these issues and not just keep doing what we’ve always done because it’s expedient or guarantees continued funding. I’ve evaluated programs before that turned out to be ineffective, but were required by state law to be delivered to a certain population. It’s not just potentially harmful to clients; it’s also a waste of precious resources that could be devoted to other, more effective programs.
We’ve talked throughout this book about ethical issues and research. All of that is applicable to program evaluation too. Federal law governing IRB practice does not require that program evaluation go through IRB if it is not seeking to gather generalizable knowledge, so IRB approval isn’t a given for these projects. As a result, you’re even more responsible for ensuring that your project is ethical.
Ultimately, social workers should start from a place of humility in the face of cultures or groups of which we are not a part. Cultural considerations in program evaluation look similar to those in research. Something to consider about program evaluation, though: is it your duty to point out potential cultural humility issues as part of your evaluation, even if you’re not asked to? I’d argue that it is.
It is also important we make sure that our definition of success is not oppressive. For example, in Australia, the government undertook a program to remove Aboriginal children from their families and assimilated them into white culture. The program was viewed as successful, but the measures of success were based on oppressive beliefs and stereotypes. This is why stakeholder input is essential – especially if you’re not a member of the group you’re evaluating, stakeholders are going to be the ones to tell you that you may need to reconsider what “success” means.
Unrau , Gabor, and Grinnell (2007) [8] identified several important factors to consider when designing and executing a culturally sensitive program evaluation. First, evaluators need “a clear understanding of the impact of culture on human and social processes generally and on evaluation processes specifically and… skills in cross-cultural communications to ensure that they can effectively interact with people from diverse backgrounds” (p. 419). These are also essential skills in social work practice that you are hopefully learning in your other classes! We should strive to learn as much as possible about the cultures of our clients when they differ from ours.
The authors also point out that evaluators need to be culturally aware and make sure the way they plan and execute their evaluations isn’t centered on their own ethnic experience and that they aren’t basing their plans on stereotypes about other cultures. In addition, when executing our evaluations, we have to be mindful of how our cultural background affects our communication and behavior, because we may need to adjust these to communicate (both verbally and non-verbally) with our participants in a culturally sensitive and appropriate way.
Consider also that the type of information on which you place the most value may not match that of people from other cultures. Unrau , Gabor, and Grinnell (2007) [9] point out that mainstream North American cultures place a lot of value on hard data and rigorous processes like clinical trials. (You might notice that we spend a lot of time on this type of information in this textbook.) According to the authors, though, cultures from other parts of the world value relationships and storytelling as evidence and important information. This kind of information is as important and valid as what we are teaching you to collect and analyze in most of this book.
Being the squeaky wheel about evaluating programs can be uncomfortable. But as you go into practice (or grow in your current practice), I strongly believe it’s your ethical obligation to push for evaluation. It honors the dignity and worth of our clients. My hope is that this chapter has given you the tools to talk about it and, ultimately, execute it in practice.
The systematic process by which we determine if social programs are meeting their goals, how well the program runs, whether the program had the desired effect, and whether the program has merit according to stakeholders (including in terms of the monetary costs and benefits)
× Close definitionindividuals or groups who have an interest in the outcome of the study you conduct
× Close definitionthe people or organizations who control access to the population you want to study
× Close definitionThe people and organizations that have some interest in or will be effected by our program.
× Close definitionA graphic depiction (road map) that presents the shared relationships among the resources, activities, outputs, outcomes, and impact for your program
× Close definitionAn analysis of how well your program ended up running, and sometimes how well it's going in real time.
× Close definitionThe scientific study of methods to promote the systematic uptake of research findings and other evidence-based practices into routine practice, and, hence, to improve the quality and effectiveness of health services.
× Close definitionThe final condition that occurs at the end of an intervention or program.
× Close definitionTthe long-term condition that occurs at the end of a defined time period after an intervention.
× Close definitionGraduate research methods in social work Copyright © 2020 by Matthew DeCarlo, Cory Cummings, Kate Agnelli is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.