Philanthropy Action

Analysis, Interviews, and Reviews


The most critical challenge that philanthropy has faced over the last several years is proving with evidence, rather than inference, that financial gifts have made a significant impact. This has become even more critical during the present economic crisis as donors give more selectively. In one of the New York Times Magazine‘s annual “Money” issues, the impact measurement is referred to as “philanthropy’s largest problem.”

Most non-profits and foundations have been more talk than action when it comes to measuring impact. There are several reasons why. First, as everyone acknowledges, measuring impact is hard — and many charities and foundations believe it is too expensive. But another big part of the reason that detailed impact evaluations aren’t de rigeur is that many people in the sector fear finding out the truth of their programs’ impact. I recently had a conversation with a leader of a respected philanthropy in my hometown. After telling him of my interest in measuring the impact of philanthropic giving, he looked at me and said, “you’re the bad guy.” This typical response embodies the concern of many that there is a draconian motive behind impact measurement, rather than a desire to make things better.

But perhaps the single biggest challenge to collecting meaningful impact data is that too many philanthropic organizations simply do not know how to go about measuring. Some don’t even try. Nonprofits today may say they are interested in impact, but rely for their ‘evidence’ on self-generated reports from beneficiaries and implementing agencies. This style of reporting produces “feel-good” stories but that’s about it. It really doesn’t tell you about the overall outcome of a program or how it compares to others—which is ultimately the point of measurement.

There of course are some organizations that have taken the first steps toward impact measurement through the use of surveys. Surveys are the most popular form of measurement because they seem to be straightforward, easy and relatively cheap compared to the alternatives. Unfortunately these first steps are often dangerous. Why? Because surveys can easily lead to bad data and false conclusions that can result in future missteps.

Is there a way for organizations to start getting serious about impact evaluation without either breaking the bank or misleading themselves through surveys?

Yes. Here’s how: 

Point 1: Design Carefully

Good surveys are not as easy as you may think. If you read Philanthropy Action with any frequency, you’ll have read about randomization in surveys and impact studies. Randomization basically means that the study sample is randomly divided into treatment and control groups so a clean comparison between them can be made. Randomization is critically important because without it, you introduce the opportunity for all sorts of bad data.

The most common problem for surveys that are not randomized is self-selection bias. Most organizations that use surveys, in the nonprofit world and elsewhere, ask for volunteers to fill out their surveys. Unfortunately, the people who volunteer to take surveys often have more in common with each other than they do with the whole group you are trying to assess. Voluntary surveys often attract only the happiest and unhappiest participants, which skews the survey data beyond repair. Even some randomized surveys suffer from problems when potential respondents can easily opt-out of taking a survey. This is known as non-respondent bias

For charities dealing with sensitive problems reporting bias is a big problem. Put most simply, respondents often don’t want to be completely honest about such issues as religion, sex practices or health. Recent work by economists Dean Karlan and Jonathan Zinman in South Africa found that 40 percent of respondents to a survey purposely or accidentally provided inaccurate information about their debts (for instance by denying that they had taken a microfinance loan).

Finally, another common problem with surveys is interviewer bias. This occurs when the interviewer asks a question a certain way, provides an opinion to the respondent or disturbs the respondent’s answering process. But surveys can be skewed by nothing more complicated than the sex or identity of the survey taker. In the South Africa lending study mentioned above, women were less likely to admit they had taken a loan to male surveyors than to female surveyors. Best practices to guard against interviewer bias include ensuring that no more than 10 percent of the sample is collected by any one person. But that can make the cost of a survey climb quickly.

Point 2: Use the Right Tools

Even well designed surveys sometimes miss the mark. That’s because they aren’t always the best tool for answering the question at hand. For instance, over the last 25 years charities of every stripe have spent billions educating people around the world about how to prevent HIV. A survey might give us good data on HIV infection rates over time, but it can’t reliably tell us what prevention education programs were most effective. A better approach is to measure how participants’ knowledge and beliefs change as a result of an education program. 

But knowledge and beliefs are what we call latent or hidden traits. I’ve met plenty of people who believe that latent traits can’t truly be measured. But there are good tools for measuring latent traits—specifically Item Response Theory (IRT). IRT is an accepted and rigorously used methodology within the fields of psychology, pain management, business and education. The most famous use of IRT is in the Graduate Record Exam (GRE), required by the majority of graduate schools for entrance, which measures the latent traits of knowledge and aptitude.

Using IRT takes more time and money, but it produces meaningful data when measuring latent traits—which surveys simply cannot do. 

Point 3: If you don’t have the right tools, don’t mix in faulty ones

As a student in Public Health at Johns Hopkins, I was asked by a friend in Northern India to design a survey to measure the prevalence of tuberculosis (TB) in a neighboring valley. A local hospital was trying to assess the presence of TB there in order to tailor a new program. We ran two well-designed and executed surveys—and found no TB to speak of in the valley. The TB treatment programs for the valley were canceled. Yet shortly thereafter people with advanced cases of TB began showing up at the hospital. 

What went wrong? The tools we had for detecting TB simply weren’t good enough. The hospital had used a cheaper method of TB detection that yielded high rates of “false negatives”—telling us that someone with TB actually didn’t have the disease. The hospital had tried to save money on the survey effort but had in effect wasted all the money they’d spent. The lesson: it’s better not to run a survey at all than run a faulty one. 
What all this adds up to is a cautionary tale about wading into impact measurement without proper planning. Surveys, poorly designed or poorly executed, can do more harm than good. 

That being said, correcting these problems doesn’t have to be a huge expense. Engaging experts to help design a survey upfront does cost more than doing a survey on your own, but it will likely mean the difference between a wasted and a valuable effort. You can engage in some experiments to gain experience in survey design and implementation that will help you improve your surveys—and prove to stakeholders that well-designed surveys are an absolute necessity.

For instance you can run some test surveys using different methodologies to see how your surveys could be skewed. Here are few ways to get started:

1.  Come up with a question you want answered and a population that you know and that is easily accessible to you. This could be beneficiaries from your projects or could be donors who partner with you. If you work with beneficiaries, you could ask questions about program participation, other organizations they receive services from, how long they’ve been involved, what other needs they have and their satisfaction levels with the services they receive.

2. For a well run survey, you’ll need to determine how large of a sample size you’ll need to get accurate results. You can use one of the many calculators on the web, such as the one at Survey System.

3. Next, if your population is over several hundred, you can randomly select which of your population will be sampled using one of the many random number generators on the web. One way to do this is to number every person in your population, and then those people corresponding the numbers that come up in from the random number generator are those who participate.

4. Now you can run a parallel survey to compare against the results of the well-designed survey. For instance, you can run the survey with a smaller sample of volunteers. You can change the wording of a few questions. You can only survey a group that shares a characteristic like women-only, people in their 20’s only, or people who are wearing jeans when you ask them to participate.

5. You can then compare the results from the various surveys and see how the average response differs—and have powerful first-hand evidence of how good survey design matters.
I’m of course very interested in the results of your tests—so please let me know how it goes!

David Roberts is Executive Director of New Dominion Philanthropy Metrics. He has more than 10 years experience working in quantitative research and public health. He can be reached at droberts (at)


Bill Huddleston

I understand funders’ desire to have guaranteed results before committing money to a particular organization, and indeed this is not restricted to the non-profit world—some of the more obvious examples are owners of sports teams who think that by having a large payroll, they can ensure a world championship in their particular sport.  It doesn’t work that way (except for the Yankees).

In the non-profit world, I work with a number of youth oriented groups and this is where I have never heard any good description of how to measure results beyond the current activities.  Once enough time has elapsed, I can tell you how successful some of the participants have become, but there is no way to predict this at the time they are in the youth program itself. 

Here’s a list of successful men that were all in the same youth organization and distinguished themselves (just men only because they were in a boy’s youth organization):  Steven Spielberg, Bill Bradley, Jimmy Stewart, Hank Aaaron, Neil Armstrong, J.Willard Marriott and Gerald Ford and there are many others.

What do all these men have in common?

They earned the rank of Eagle Scout while in the Boy Scouts of America.  Could their future accomplishments including being the first man to walk on the moon, famous actor, famous director, famous baseball player, famous senator, famous businessman or President of the United States been predicted at the time they achieved their Eagle? No way.

Bill Huddleston

February 17, 2010
Diana Rutherford

The Boy Scouts of America is a good example of why external funders are often needed for impact studies. As Mr. Huddleston states, the long-term impact of being an Eagle Scout isn’t known at the time the boy is an Eagle Scout. Perhaps the Boy Scouts wouldn’t fund the long-term tracking study, but an external funder might ... or perhaps the Boy Scouts would like to be able to say that boys who made Eagle Scout versus those who dropped out of Boy Scouts prior to reaching that level of achievement are x% more likely to be high achievers in their lifetimes (or 10 years later, 20 years later, etc). That kind of panel data with comparison group requires significant resources.

That is the major decision each organization must make. In some cases, funders, contributors, and the public may insist upon it. We can KNOW what works, if we are willing to pay the price to find out.

February 22, 2010
Brett Keller

Mr. Huddleston,
I’m a little late to this discussion, but I wanted to point out that your Boys Scouts example is not very applicable to program evaluation. There would have been plenty of ways to measure the impact of a program like the Eagle Scouts in the short term, rather than waiting for a few outcomes as distinguished as those listed. But the more important thing is that good evaluation isn’t about predicting future results, it’s about doing your best to compare the effect of a program or interventions vs. the counterfactual of what would have happened had that program not existed.

The most likely reality is that the Eagle Scouts self-selected into the program and that young men who likely would have already been exceptional chose to participate in this program. Had the Eagle Scouts never existed, I bet most of those men would still have been very successful… but without good data we’ll never know.

Brett Keller

November 29, 2010

Remember my personal information

Notify me of follow-up comments?

Comments may be edited for length. Inappropriate comments will not be published.