Understanding Context: Evaluation and Measurement in Not-for-Profit Sectors

Dale C. Brandenburg

Many individuals associated with community agencies, health care, public workforce development, and similar not-for-profit organizations view program evaluation akin to a visit to the dentist’s office. It’s painful, but at some point it cannot be avoided. A major reason for this perspective is that evaluation is seen as taking money away from program activities that perform good for others, that is, intruding on valuable resources that are intended for delivering the “real” services of the organization (Kopczynski & Pritchard, 2004). A major reason for this logic is that since there are limited funds available to serve the public good, why must a portion of program delivery be allocated to something other than serving people in need? This is not an unreasonable point and one that program managers in not-for-profits face on a continuing basis.

The focus of evaluation in not-for-profit organization has shifted in recent years from administrative data to outcome measurement, impact evaluation, and sustainability (Aspen Institute, 2000), thus a shift from short-term to long-term effects of interventions. Evaluators in the not-for-profit sector view their world as the combination of technical knowledge, communication skills, and political savvy that can make or break the utility and value of the program under consideration. Evaluation in not-for-profit settings tends to value the importance of teamwork, collaboration, and generally working together. This chapter is meant to provide a glimpse at a minor portion of the evaluation efforts that take place in the not-for-profit sector. It excludes, for example, the efforts in public education, but does provide some context for workforce development efforts.


Evaluation in not-for-profit settings tends to have different criteria for the judgment of its worth than is typically found in corporate and similar settings. Such criteria are likely to include the following:

How useful is the evaluation?

Is the evaluation feasible and practical?

Does the evaluation hold high ethical principles?

Does the evaluation measure the right things, and is it accurate?

Using criteria such as the above seems a far cry from concepts of return on investment that are of vital importance in the profit sector. Even the cause of transfer of training can sometimes be of secondary importance to assuring that the program is described accurately. Another difference is the pressure of time. Programs offered by not-for-profit organizations, such as an alcohol recovery program, take a long time to see the effects and, by the time results are viewable, the organization has moved on to the next program. Instead we often see that evaluation is relegated to measuring the countable, the numbers of people who have completed the program, rather than the life-changing impact that decreased alcohol abuse has on participants. While the latter is certainly important, the typical community-based organization (CBO) is limited in its resources to perform the long-term follow-through needed to answer the ultimate utility question. Thus, the choice of what is measured tends to be the result of negotiation with stakeholders. The broad goals of evaluation tend to be grouped among the following:

Understanding and sharing what works with other organizations and communities;

Building sustainability of programs and ensuring funding;

Strengthening the accountability of the programs with various public constituencies;

Influencing the decisions of relevant policy makers and program funders;

Building community capacity so that future engagements have greater community participation; and

Understanding where the program is going so that it results in reflecting on progress that can improve future programs.

These goals reflect some of the typical objectives for applying evaluation in not-for-profit settings. The goals embody specific activities that can be designed to collect evidence on a program’s effectiveness, be accountable to stakeholders, identify projects for improvement, clarify program plans, and improve communication among all groups of stakeholders. The types of programs or actions that are designed to improve outcomes for particular individuals, groups, or communities considered in this chapter include the following:

Direct service interventions: improve the nutrition of pre-school children;

Research endeavors: determine whether race disparities in emergency room care can be reduced;

Advocacy efforts: campaign to influence legislation on proper use of infant car seats; and

Workforce training programs: job training program to reduce unemployment among economically disadvantaged urban residents.

The results of evaluation in not-for-profit settings are typically designed to provide information for future decision making. Most efforts can be grouped into three categories: process evaluation, short-term evaluation, or long-term (outcome) evaluation. In fact, it is rare to see a program evaluation effort that does not include some process evaluation. Process evaluation is considered important because it typically yields an external view of how the program was conducted; in other words, it provides a detailed description of the objectives, activities, resources used, management of the program, and involvement of stakeholders. It would be difficult to judge any outcomes of a program without understanding the components of the program in detail. That is why process evaluation is important. Short-term evaluation deals with the accomplishment of program objectives or the intermediate links between the program activities and the long-term outcomes. Outcome evaluation is associated with the long-term effects such as health status or system changes that are often beyond the range of a typical evaluation effort.


One of the key tools used to structure the evaluation process in not-for-profit settings is the stakeholder analysis. It is a key initial stage to develop the primary issues and evaluation questions from which other stages of the evaluation process can be built. The stakeholder analysis is designed to identify the needs and values of the separate stakeholders and combine the results so that an adequate plan can be developed. It is rare to find an evaluation effort in not-for-profit situations for which there are at least three stakeholder groups with a major interest in the effort. The stakeholder analysis is a means to organize the political process and create an evaluation plan, as well as satisfy the desires of the different perspectives and needs for information. A first step in the process is to identify all groups or individuals with a stake in the process, followed by a second dividing the groups into primary and secondary members. Primary members are those stakeholders who are likely to be direct users of the evaluation results; whereas secondary stakeholders may have an interest in the findings, but the results of the evaluation are not likely to impact them directly.

For example, in the evaluation of a workforce development project for urban adults, primary stakeholders would include the sponsoring agency or funder, the manager of program delivery, program developers, instructors, a third-party program organizer, and partner organizations that might include community-based organizations, a local business association, and a representative from the city government. Secondary stakeholders might include parents or relatives of program participants, local welfare officials, advocacy groups, and the participants themselves. A list of stakeholders can be determined by answering the following questions:

Who pays for program staff time?

Who selects the participants?

Who champions the desired change?

Who is responsible for after-program behavior/performance?

Who determines success?

Who provides advice and counsel on program conditions?

Who provides critical guidance?

Who provides critical facts?

Who provides the necessary assistance?

Note that these questions do not identify any of the purposes of the evaluation, only a means to distinguish who should be involved in evaluation planning. The amount of stakeholder involvement in the evaluation can be limited or it can be substantial enough so that they assist in the design of the overall process, including the development of data collection protocols. The involvement of stakeholders should also lead to increasing the value and usefulness of the evaluation (Greene, 1988) as well increasing the use of the findings. Such a process increases the ownership in the report and possibly leads to a common vision of collective goals (Kopczynski & Pritchard, 2004).

The second step of performing a stakeholder analysis involves the identification of the requirements, or primary evaluation purposes, and aligning those against the list of primary stakeholders. The matrix shown in Table 16.1 is an example of this analysis for a publicly funded workforce development program for inner city adults.

The “requirements” or purposes of the evaluation can be developed in a number of ways, but the major sources of the list usually come from the published description of the program, supplemented by interviews with the stakeholders. It would be ideal if the matrix could be completed in a group setting, but it is more likely that the evaluator develops the chart independently and then shares the results in a program advisory meeting. The “X” in a given box represents where a requirement matches the possible use of that information by the selected stakeholder. One can note, for example, that program Strengths and Weaknesses are needed for staff internal to the development and execution of the program but not of particular interest to those outside that environment. Another outcome of this process is at least a general expectation of reporting relationships that would occur during the communication of the interim and final results of the evaluation.

Table  Description automatically generated with medium confidence

While it might seem that the stakeholder analysis might focus on the program itself, there are other outputs that could be entered into the analysis. Results of evaluation data are often used to tell compelling stories that can increase the visibility and marketing of the organization and increase accountability with board members and community, as well as attract additional sources of revenue. These findings were supported by a national survey of local United Way organizations as reported by Kopczynski & Pritchard (2004).


While the stakeholder analysis establishes the initial phase of understanding evaluation purposes, further definition is supplied by a list of evaluation issues. These issues are typically generated during the stakeholder analysis after further understanding of the program description. Evaluation issues can range from general questions about program impact to detailed questions on the selection of service providers. A sample (Gorman, 2001) set of questions or primary issues from a low-income workforce development project that leverages resources from a local housing authority are listed below:

Describe the major partners leading, managing, and staffing each major activity area.

Describe the major program areas and the target population service goals by area. What are the major strategies to be employed to reach these service goals?

To what extent does the program assist in decreasing or alleviating educational barriers to sustained employment? What were the major instructional and training strategies employed to attain the stated goals and objectives?

To what extent does the program assist in decreasing or alleviating personal and social barriers to employment: poor work histories, cultural barriers, substance abuse and developmental disabilities, and limitations of transportation and adequate childcare?

What were the participant impacts: wages, self-efficacy, and employability?

Was there differential effectiveness of the program relative to categories of the target population: public housing residents, non-custodial parents, learning-disabled individuals, persons with limited English proficiency, and other economically disadvantaged groups?

Was there differential effectiveness relative to the categories of high-demand sectors targeted by the program: building/construction trades, health care, and hospitality?

What evidence indicates that the program management and its partners can actively disseminate/replicate this program in other regions via its current programs?

What are some key success stories that serve to illustrate overall program value to participants and community?

How cost-effective was the project in general terms? By program area? How did costs or efficiency of service change over the course of the project?

What were the major lessons learned in the project? How do these relate to self-sufficiency for the target population? Community economic development? Leadership effectiveness?

These results were obtained from a stakeholder analysis and are not yet grouped into evaluation components. Combining these results with the stakeholder analysis assists in defining the overall evaluation plan. It provides the framework for developing the data requirements as well as the type of measurement needed for instrumentation.


The final phase for planning the evaluation is the development of a logic model, a description or map of how the major components of an evaluation are aligned; that is, the connection between how the program is designed and its intended results. Logic models can be used in any phase of the program development cycle (McLaughlin & Jordan, 2004) from initial design to examining long-term impact. It is a means to make the underlying theory behind the intervention more explicit and to discover its underlying assumptions. Even the development of a model, that is, mapping out all of its components, can be instructive for program staff and other stakeholders. A logic model is particularly useful for evaluators both as an advance organizer as well as a planning tool for assessment development (McLaughlin & Jordan, 2004). In many situations, such as community or public health, the specification of a logic model is a proposal requirement. Whether or not an evaluator builds on an existing logic model or develops a preliminary version for the evaluation plan, the map created is a visualization of how the human and financial investments are intended to satisfy program goals and lead to program improvements. Logic models contain the theoretical and practical program concepts in a sequence from input of resources to the ultimate impact.

Most logic models have a standard nomenclature (see the Kellogg Foundation guidelines [W.K. Kellogg Foundation, 2007] and McLaughlin & Jordan) that contain the following elements:

Resources: program inputs like needs assessment data and capabilities (financial, human, organizational partnerships, and community relationships) that can be allocated to the project.

Activities: the tasks or actions the program implements with its resources to include events, uses of tools, processes, or technology to perform actions to bring about intended changes or results.

Outputs: the direct products of program activities or services delivered by the program, even reports of findings that may be useful to other researchers.

Outcomes: both short-term (one to three years) and longer-term (four to six years) specific changes in the targeted individuals or organizations associated with behavior, functioning, knowledge, skills, or status within the community. Short-term outcomes are those that are assumed to be “caused” by the outputs; long-term outcomes are benefits derived from intermediate outcomes.

Impact: the ultimate consequences or results of change that are both intended and unintended for the individuals, organizations, or communities that are part of the system, generally occurring after program conclusion.

Whatever the process, such as a focus group, used to create a logic model, another useful outcome is a listing of key contextual factors not under the control of the program that might have both positive and negative influences on the program. These context factors can be divided into two components—antecedent conditions and mediating variables (McLaughlin & Jordan, 2004). Geography, economic conditions, and characteristics of the target group to be served are examples of the former; whereas staff turnover, new government incentives, and layoffs for a local major employer are examples of the latter.

The example shown in Figure 16.1 is a portion of a logic model from public health and concerns a neighborhood violence prevention program (Freddolino, 2005). The needs were derived from data that showed that poverty and violence were significant threats to the health of residents in Central County. The general goals of the program were to increase the capacity of local neighborhood groups to plan and implement local prevention programs and work to establish links between existing programs and agencies. Many logic models are likely more complex than the example shown in that specific actions are linked to given outputs and outcomes in order to differentiate the various conceptual links in a program.

Logic models can be constructed in various ways to illustrate various perspectives of a program. Figure 16.1 represents an activities-based approach that concentrates on the implementation of the program, and it is most useful for program management and monitoring. A second approach based on outcomes seeks to connect the resources to the desired results and is specifically geared to subdivide the short-term, long-term, and ultimate impact of the program. Such models are most useful for designing evaluation and reporting strategies. The logic model developed for a program should suggest the type of measurement required in order to prove or improve the model specified. Since the model is related to performance objectives, it can also lead to assist in judging the merit or worth of the outcomes observed.

A third type, the theory approach emphasizes the theoretical constructs behind the idea for the program. Such models concentrate on solution strategies and prior empirical evidence that connect the selected strategies to potential activities and assumptions. These are most useful for program planning and design. Regardless of the type of model selected, each is useful for the evaluator to provide more comprehensive descriptive information that can lead to effective evaluation design and planning.

Diagram  Description automatically generated

Text  Description automatically generated

Text  Description automatically generated

Text  Description automatically generated with medium confidence

Collaborative projects can also suffer from lack of comparability in implementation due to inconsistent allocation or availability of resources across partner organizations. This lack of comparability in program resources often thwarts attempts at gathering data to compare results across sites and leads to misinterpretation of findings or no consistent findings. An example evaluation plan (Gorman & Brandenburg, 2002) from a multiple-site consortia project managed by a systems integrator and funded by the U.S. Department of Labor is provided in Table 16.3. In this case, the systems integrator, American Community Partnerships, is an affiliated organization associated with a large U.S. labor union whose goals, in part, are to promote high-wage jobs in union roles for economically disadvantaged urban residents. This particular program operated in some of the very poorest neighborhoods in the respective cities involved.

One can note that in complex arrangements represented in Exhibit 16.1, that the management of internal and external partner organizations is crucial to obtaining the needed services for the target population as well as attempting to meet the long-range goal of program sustainability. Initial sponsor-allocated funds cover the start-up of the partnership, but the leveraging of existing funds and other resources are required for the overall effort to be successful. Other evaluation can be considerably more complex when funded by two or more federal agencies. The report provided by Hamilton and others (2001) is such an example and provides a description of an evaluation funded over five years by the U.S. Departments of Labor and Education.

Foundation Guidelines

Numerous resources are available on outcome measurements that are designed to assist not-for-profit organizations. A current major source of information can be found through the website of the Utica (New York) Public Library, associated with the Foundation Center, a foundation collaborative. The site is annotated and contains links to major foundations, professional associations, government sponsors, technology uses, data sources, statistical information, best practices, benchmarking, and assessment tools.

Another set of evaluation models can be derived from an examination of community change efforts. Two organizations that have funded a considerable number of these actions include the W. K. Kellogg Foundation and the Annie E. Casey Foundation. Both organizations offer assistance in designing evaluation efforts. These foundations, among others, want their grantees to be successful in their efforts, so they have developed evaluation guidelines for potential grantees to build in evaluation efforts at the time of proposal writing. More extensive guidelines exist for larger-scale efforts. Especially instructive in this regard is Kellogg’s perspective on evaluating social change in that they have constructed a framework especially for external evaluators. Based on the dynamics of social change and the context of the program, they detail four types of evaluation designs: exploratory, predictive, self-organizing, and initiative renewal. Given the complexity of the type of program under review, evaluators are charged to pay attention primarily to the major aspects of the system change and disregard those features that are of minor consequence.

Text  Description automatically generated

Text, letter  Description automatically generated

agencies and program staff in understanding the major issues to be considered. These organizations can be quite effective because they understand the local and regional context.


As can be concluded from some of the previous discussion, data collection and measurement for evaluation in not-for-profit settings can range from the deceptively simple to the complex. Most data collection schemes tend to be customized to the environment of the program being evaluated. There are a number of reasons for this. The measurement of a program is most often linked to stakeholder considerations and the needs of the sponsoring organization, as opposed to creating an elegant experimental design. The use of both quantitative and qualitative data is applied to answer the evaluation questions posed at the outset of the investigation.

Second, there is a strong bias for producing data that are useful for all stakeholders, and many users, such as directors of community-based organizations are not used to interpreting sophisticated statistical analyses. Data on effectiveness of programs often can be derived from solid descriptions of program activities and survey data. This is not to say that measurement is based on the lowest common denominator of sophistication, but the end-result is to use the findings in a way that can improve program activities. The availability of resources within not-for-profits generally implies that program staff collects data for monitoring purposes, that is, what is countable from an administrative perspective (Kopczynski & Pritchard, 2004). Other not-for-profit organizational shortcomings may enter into the development of a plan for a comprehensives data collection scheme, namely “the lack of appreciation of data, lack of training, poorly developed information systems, and high turnover rates” (Kopczynski & Pritchard, 2004, p. 652), besides the issue of limited budgets.

A third reason to customize data collection is that they may be limited by budget considerations. Elegant and rigorous designs cost more to implement and can have an indirect effect on program activities if certain target participants cannot participate in the primary intervention. This is not to say that rigorous designs are not applied in not-for-profit settings, especially in the public health domain. One such effort (Huffman & others, 2002) funded by the Packard Foundation presents a wealth of information on outcome measurement, valid evaluation constructs, and a set of principles to apply in such settings. Subjects of workforce development programs, for example, are often difficult to recruit, so allocating a portion of participants to a control setting is not cost-effective in many cases. Even more challenging would be follow-up data collection efforts on a homeless population (Kopczynski & Pritchard, 2004).

It is probably instructive at this point to introduce an example of the type and range of data needed to satisfy requirements in a large-scale workforce development effort. Using the same program (Gorman & Brandenburg, 2002) from the example in Table 16.3, Exhibit 16.1 represents a listing of the data elements selected to provide a comprehensive evaluation.

Text  Description automatically generated

Graphical user interface, text, application, email  Description automatically generated

Graphical user interface, text  Description automatically generated


If evaluation reports are to be useful for all stakeholders, it is important to consider the communication of evaluation results early in the design of the evaluation process. Such considerations are needed because the demand for clear and credible information from not-for-profits is high (Kopczynski & Pritchard, 2004). In general, it is prudent to negotiate what results are to be presented to whom at the time the stakeholder analysis is conducted. Final reports of evaluation are typically reported in two stages: (1) a formal written report that contains all details of the design, implementation, data collection, data analysis, findings, and conclusion with possible recommendations and (2) a second summary version, such as a PowerPoint presentation that lists the highlights of the investigation. Other reports, such as interim findings or data from a single site in a multiple-site investigation, results of a set of interviews or a survey, or case descriptions, may also be added, depending on funding and stakeholder needs.

Regardless of the type of report, there is still a need to organize reports so that stakeholders may be guided to make appropriate decisions and plan future actions based on the findings. An evaluation that contains as many data elements as depicted in Exhibit 16.1 would need to be organized efficiently if stakeholders were to be able to comprehend and sift through the volume of data likely to be generated. Smock & Brandenburg (1982) suggest a tool to aid in the process of organizing data and presenting findings as portrayed in Table 16.4.

Table  Description automatically generated with medium confidence

While the classification shown may be an oversimplification of available data, it is nonetheless quite useful in presenting data to unsophisticated users. The concept underlying its structure is that the entirety of information may be ranked hierarchically from its continuous mode into three levels. These three levels represent an artificial trichotomy from very general (overall success) to success or failure of specific program components to very specific findings that have meaning to possibly a single stakeholder. Level I information is the most general, must be inferred from a variety of findings (rolled up across data elements), permits a general summary of findings, can be applied to most settings in which the question asked is limited to a few statements, and is useful for general management “go or no go” decisions, the kind of final judgment that a sponsoring program officer would be interested in. This type of data would be the primary target to use in a summary presentation of the findings.

Level II information represents data useful for identifying the strengths and weaknesses of a program by its representative components, for example: the materials worked well, instructional delivery could be improved, stakeholders were satisfied, recruitment should be strengthened, or case work referrals worked well. Comparisons across sites can be made only in cases in which program elements are identical or very similar and this comparison information, when legitimate, could be included in the Level I report. The Level II information is most useful for local program staff to identify areas in which program development and delivery could be improved. It provides a starting point to delve into specifics. Both Level I and Level II information tend to be reliable due to the fact that judgments made are cross-referenced and double-checked during the data collection and analysis, whereas Level III information is more site-specific and detailed.

Level III information is the most detailed, such as open-ended comments on surveys or specific references to incidents made during interviews, and while such data tend to be rich in detail, they are also less reliable in that general judgments about success or failure of a program component cannot be made. On the other hand, they can provide very useful information for improving specific aspects of a program as well as offer explanations as to why Level II information yielded results in a particular direction.


If one wishes to compare how evaluation work is done in not-for-profit settings to how it is done in corporate training or organizational development efforts, there are at least three points of distinction. First, not-for-profit evaluation planning tends to be more complex in that it requires a formal, comprehensive stakeholder analysis. The Aspen Institute (2000) study pointed out stakeholder analysis as the initial best practice in not-for-profit program evaluation. The complexity is driven more from the political side because there are likely to be more stakeholders, with considerable variance in levels of sophistication regarding evaluation methods, not all of whom share the desire to even conduct the evaluation. Specific attempts must be made to, first, identify all relevant stakeholders, and then perform a process of consensus-building to assure that all legitimate evaluation questions have been posed. What happens during this initial stage has considerable influence on the remainder of the evaluation process.

Second, evaluation methodologies employed tend to be dictated by the sponsoring organizations, which can be quite varied in their standards. With minor requirements, an evaluation conducted under a U.S. Department of Labor or Housing and Urban Development are quite flexible, whereas projects under the Centers for Disease Control and Prevention, Health and Human Services or a major grant from a private foundation are more prescriptive. In fact, some foundations require a form of “certification” for many of their external evaluators. Most evaluation efforts are likely to assign to an external evaluator due to the fact that most not-for-profit organizations do not have the requisite capabilities to fulfill that role. This fact adds to project complexity, but especially adds to the politics in dealing with the recipient organization.

Third, evaluation efforts in not-for-profits can be as rigorous as those in the private sector: employing sophisticated experimental designs, requiring validated data collection tools, and using complex analysis of qualitative data, but there is an underlying requirement that the program must be described adequately. This requirement that evaluation must address the description of the target activities leads to the use of logic models not often found in private or corporate evaluation efforts. This author believes that a similar process might be useful in those settings.


Annie E. Casey Foundation. (1997). Evaluating comprehensive community change. Baltimore, MD: Author.

Aspen Institute. (2000, February). Current evaluation practice in the nonprofit sector. Snapshots, Research Highlights for the Nonprofit Sector Research Fund, 9.

Centers for Disease Control and Prevention. (1999). Framework for program evaluation in public health (MMWR 48, No. RR–11). Atlanta, GA: Author.


Using New Technology to Create a User-Friendly Evaluation Process

William J. Rothwell

Anita Pane Whiteford

Much interest has been focused on emerging technologies in instructional design and delivery (see, for instance, Rothwell, Butler, Maldonado, Hunt, Peters, Li, & Stern, 2004). However, less attention has focused on applying new technology to evaluating instructional or other performance improvement interventions (see Rothwell, 2005; Rothwell, Hohne, & King, 2007). After all, emerging technologies do have the potential to expand the scope of evaluation in terms of number and geographic distribution of participants; types of data that can be collected; size and relational capabilities of storage and retrieval systems; types of analysis conducted; and type of reports generated. Emerging technologies also offer the promise of exciting new ways to select or design data collection tools; collect and analyze data; store and retrieve data and results; and report evaluation results.

This chapter examines some of the new technology-based evaluation tools. The chapter will answer the following questions: (1) What technology-based evaluation tools are available to human performance technology (HPT) practitioners? (2) How can HPT practitioners use these tools to evaluate performance interventions? (3) Are HPT practitioners conducting evaluations of performance interventions? Why or why not? (4) Why use technology-based tools? (5) What does the future hold for HPT practitioners who use technology-based tools to evaluate performance interventions? In addition, this chapter will discuss ethical issues that could be posed by new technology that could take HPT practitioners into uncharted territory.


Technology-based evaluation tools can be understood to mean any electronically enabled applications that have experienced accelerating growth the last few years in assisting HPT practitioners and evaluators with ability to perform work faster, better, and with increased skills. With new technology-based applications appearing almost daily, it remains difficult to catalog all the new technologies. Therefore, this section will discuss only some technology-based tools that are expanding the scope of evaluation, collecting evaluation data, storing and retrieving evaluation data, analyzing evaluation data, and reporting evaluation results.

General Characteristics: Multi-Tasking and Engagement

All evaluation tools that are available to HPT practitioners expand the scope of evaluation in some way. Most tools have the ability to allow users to multi-task: collect, store, retrieve, analyze, and report data. Engagement theory is perhaps the most well-applied theory to explain the reasons for the successes for technology-based applications for HPT practitioners. “Engagement theory is best applied because the theory is based on three primary means to accomplish engagement: an emphasis on collaborative effort, project-based assignments, and non-academic focus” (Kearsley & Shneiderman, 1999, p. 5). “Engagement theory is intended to be a conceptual framework for technology-based learning and teaching because technology can facilitate engagement in different ways that are difficult to achieve without technology-based applications” (Kearsley & Shneiderman, 1999, p. 1). The theory is based on the foundation of engaged learning, defined by Kearsley and Shneiderman (1999) as “participant activities that involve active cognitive processes such as creating, problem-solving, reasoning, making decisions, and evaluating” (p. 1). HPT practitioners can use engagement theory to aid in selecting the appropriate tools for evaluating HPT applications. Practitioners can use the theory to formulate such critical thought-provoking questions as (1) For what target audience is engagement theory most and least effective? (2) What skills, required of participants to be effectively evaluated, are essential to collaborative activities? (3) How should individual differences be evaluated in collaborative work? and (4) What tools would be most appropriate to evaluate collaborative software tools that might be used during the intervention (Kearsley & Shneiderman, 1999)?

As mentioned previously, most technology-based evaluation tools are multi-tasked whereby the tools can collectively gather data, store and retrieve data, analyze data, and report data. All features, as well as some prime examples of technology-based tools, will be discussed in the following sections about stand-alone evaluation tools, built-in evaluation features, tools to evaluate one-way communication interventions, and tools to evaluate two-way communication interventions.

Stand-Alone Evaluation Tools

Some technology-based evaluation tools can be used on their own or can be blended with performance interventions such as, for example, Questionmark (see and One-Touch system (see

Questionmark. Perception Assessment Management System enables educators and trainers to author, schedule, deliver, and report on surveys, quizzes, tests, and exams. Questionmark offers evaluators the ability to author questions and assessments easily; bank questions by learning objective; deliver to browser, PDA, CD-ROM, or paper; provide instant feedback to enhance learning; provide a secure delivery platform for high-stakes exams; give administrators on-demand results, reports, and item analysis; and randomize the presentation of questions and choices (see

Idaho Power is a case study in using Questionmark. The company provides electricity to Idaho and to sections of Oregon and Nevada. It employs eighteen hundred line service workers and field engineers. Idaho Power sponsors a rigorous apprenticeship program of four and half years’ duration and administers an average of three to four hundred exams per week as well as a final examination at the program’s end. Idaho Power began to use Questionmark to maintain all exams in one database and to have the ability to work continuously on new exams or improve existing ones.

One-Touch. One-Touch System is the leading provider of interactive distance communication and learning solutions for corporate, educational institutions, and governmental offices. One-Touch integrates video, voice, and data over any broadband network with measurable and certifiable results, extending the reach of communication and learning sessions to geographically dispersed participants.

J.C. Penney became a customer of One-Touch System in 1996 after closing its management training facility in Texas. J.C. Penney decided to use distance learning to train 149,000 store associates so as to ensure consistency in its training. Store associates see the facilitators’ video live on a classroom television and interact by using the One Touch interactive touch pad. Facilitators can pose questions and post quizzes, providing an important means to measure and establish accountability among associates for their learning. All participant data are gathered in a central database for further analysis, including attendance training records, the number of times an associate calls in, and question and quiz performance. J.C. Penney saved $58,000 in delivering one training class and reduced travel costs for training by $12 million.

Built-In Evaluation Features

Sometimes “evaluation” features are built into a technology-based performance intervention. A good example is a learning management system (LMS). An LMS is a set of software tools designed to manage user learning interventions. LMS has become a popular method of technology-based application in both the corporate and academic worlds. An LMS allows an organization to manage users, roles, courses, instructors, and facilities; post course calendars; post messages to learners; offer assessment and testing, including administering participant pre-tests and post-tests; display scores and transcripts; grade course-work; and host web-based or blended learning (see

Ilias is an example of an LMS that has a built-in evaluator tool (see Ilias is a powerful web-based learning management system that supports tools for collaboration, communication, evaluation, and assessment. The evaluation and assessment component of Ilias permits evaluators to create assessments based on multiple choice, single choice, allocation questions, cloze questions (free text, select box) arrangement duties, hot spots (search images to click on), and unsettled questions.


Webcasts, podcasts, and web-based surveys rely primarily on one-way communication. A presenter broadcasts a message, or a series of messages, that can only be received by participants over the web, iPod, or other channels. Participants cannot respond in real time, so they may properly be termed as forms of one-way communication.

Podcasts and Webcasts

“A podcast is a digital audio program, a multimedia computer file that can be downloaded to a computer, an iPod, or another device, and then played or replayed on demand” (Islam, 2007, p. 5). A webcast is used on the Internet to broadcast live or delayed audio and/or video transmissions, like traditional television and radio broadcasts (see www.webopedia,com/term/w/webcast.html). Unlike webcasts, podcasts are always available, portable, user-friendly, immediately useable, and inexpensive (Islam, 2007). Podcasts and webcasts can be used in the evaluation phase of performance improvement interventions. The intervention is recorded and that allows multiple reviews to measure the effectiveness of the intervention and assess its outcomes (Islam, 2007). Facilitators could also evaluate their training skills after a training session so as to identify possible improvements for future training sessions.

Web-Based Surveys

Web-based surveys are assessment tools designed by evaluators and posted on the Internet for participants to evaluate the technology-based application used to deliver the learning intervention. Web-based surveys collect data through self-administered electronic sets of questions on the web (Archer, 2003). Software tools for web-based surveys include, among others, Survey Monkey (see and Zoomerang (see

According to Granello and Wheaton (2004) and Archer (2003), web-based surveys can be administered easily by following several important steps:

Determine the target population to be assessed and the purpose of the survey;

Design the layout of the survey and format questions, remembering to keep the survey short and simple or else risk skewed answers;

Provide a welcome screen that is user friendly from the start that explains the easiness of responding to questions with instructions;

Post questions that have a consistent format and flow;

Test the system before launching it to make certain the glitches are worked out;

Pilot test the survey with a small group similar to the targeted audience; and

Strategically plan the survey distribution with only a few email reminders.


Many new technologies facilitate the interaction of people in real time. These technologies are called social networks. Examples include group decision support systems, webinars, web conferences, virtual classrooms, instant messages, chat rooms, blogs, Facebook, YouTube, Twitter, wikipedias (wikis), second-life virtual reality applications—and many others.

Social network websites are legion and their range encompasses every imaginable topic. Examples include Facebook, MySpace, Windows Live Spaces, Linkedln, and HPT practitioners may pose questions and then receive real-time—or near-real-time—responses. The greatest promise for realtime data collection and analysis probably exists with group decision support systems in which live participants interact in real time—or virtual participants interact concurrently—to reach decisions. Participants can reach some level of agreement collaboratively on:

What performance problems are important;

How important they are;

What causes them;

What solutions should be applied to solving those problems by addressing root causes;

How those solutions can be implemented, and

How results may be evaluated.

As a result, those affected by the problems can have a stake in diagnosing the problem, solving it, and establishing and tracking relevant metrics (Rothwell, 2005).

HPT practitioners can pose questions, generate responses from many people, and then facilitate group activities as members analyze their own results. Data collected from these activities are useful for evaluating the process and the results of the group decision support system intervention itself. HPT practitioners using social networks can get many different views to evaluate interventions. They can, for instance, benchmark how other HPT practitioners have done evaluations for similar interventions and share knowledge on the latest evaluation tools and techniques.


The most common method of evaluation used by HPT practitioners is a Level 1 on Donald Kirkpatrick’s four-level evaluation model. Level 1 is the reaction level, and it focuses on how much people liked the intervention. Participant evaluations— also called “smile sheets”—remain the most popular way to evaluate training, and most smile sheets are administered using paper-and-pencil forms. Some large training companies have advanced as far as using scannable sheets to collect participant reactions. Some are even using web-assisted survey tools.

Web-Assisted Survey Tools

Several organizations have gone the next step to use Survey Monkey and similar web-assisted survey software packages to administer participant evaluations. These organizations also post results of participant evaluations online for participants and managers to review.

For instance, the leaders of one company experiencing high turnover asked human resources to explore the root cause of the turnover. The human resources manager, using Survey Monkey, surveyed middle managers and production supervisors. The software made it easy to collect and analyze the data. The human resources manager then fed the results of the survey back to company leaders. Online survey-based methods are particularly popular to use with distance education methods, since online data collection just seems appropriate to use with online training delivery.

Another organization has experimented with real-time data collection using wireless personal digital assistants (PDAs). At the end of a training program, participants are polled in real time to determine their perceptions of the training. Their reactions are compiled in front of them and projected on a screen. They are then asked by the facilitator to offer immediate ideas about how to improve the training for subsequent delivery. This approach, of course, is really an application of a group decision support system.

Beyond Level 1

Why do most HPT practitioners typically evaluate only participant reactions and learning? What about evaluating HPT performance interventions for transfer of skill and impact on the organization? Few HPT practitioners conduct transfer of learning studies and impact studies. They typically give several reasons. Some are that such evaluation studies are more time-consuming; evaluators lack resources and sometimes an understanding of how to conduct sophisticated evaluation studies using robust research designs; and organizational leaders do not ask for evaluation studies.

But it is critically important that HPT practitioners conduct evaluation studies at all levels of Kirkpatrick’s pyramid—including transfer and impact studies— to build support from organizational leaders (Kirkpatrick & Kirkpatrick, 2006; Phillips & Phillips, 2005). Technology-based evaluation tools can help HPT practitioners achieve this goal.


There are many advantages and disadvantages to HPT practitioners using technology-based tools for evaluation. This section will examine those advantages and disadvantages.

Advantages to Using Technology-Based Tools in Evaluation

The advantages of using technology-based tools in evaluation include decreased cost, immediacy, and ease of use for the respondee and the evaluator:

Cost. New technology offers the promise of reducing the cost of collecting and analyzing data. Instead of paying for postage on mail questionnaires, participants can use survey software at a per-use cost. And it is less expensive for an organization to email surveys to hundreds of people than to mail a hard copy of a survey.

Immediacy. In most cases, the greatest appeal of new technology when applied to evaluating performance interventions is its immediacy—that is, how quickly it can be collected and used. Gone are the days when lengthy analysis could be painstakingly applied in step-by-step, linear project approaches and then followed up by deliberate action. Time has become the only strategic resource. Fast approaches to addressing performance problems have grown more appealing to time-pressured, stressed-out managers and workers alike. Many of these approaches call for concurrent project management approaches that combine steps or carry them out at the same time.

Ease of Use. From the participants’ point of view, it is easier to complete web-based surveys. The respondent does not have to hassle mailing back a survey form. From the evaluator’s standpoint, some survey software permits back-end statistical analysis so that an entire data set can be recomputed every time one user completes a survey. That can also reduce both the time and cost to collect and analyze survey data.

Disadvantages to Using Technology-Based Tools in Evaluation

Of course, new technology is no panacea. It does introduce new challenges. According to Lefever, Dal, and Matthiasdottir (2007), the disadvantages of using online surveys including the following:

Target population being surveyed may lack computer skills, not accept using a computer, or may not have easy access to a computer;

Technological problems could occur with the equipment or the Internet service;

Emails sent to participants asking for their voluntary participation to complete the survey may be routed to the junk mail box, instead of the inbox, and participants may delete the email from researchers without opening it;

Challenges may occur when attempting to get a representative sample of the target population for the study by identifying email addresses; and

It may be difficult to verify the identity of participants completing the survey.

Verifying Participants. When using technology-based data collection tools, evaluators often face problems in determining whether the right people are actually responding (Do we really know who logged in?), whether those completing online forms actually completed instruction or participated in a performance intervention, and whether data collected are kept secure and confidential.

Security. Security is a major concern when using the Internet for data collection. Virtual private networks (VPN) are a common, popular vehicle used by researchers to ensure the security of respondents completing data collection instruments, although VPNs are by no means foolproof or hacker proof. According to Wikipedia, virtual private networks (VPNs) are computer networks in which some links between nodes are carried by open connections or virtual circuits in some larger network (for example, the Internet) instead of by physical wires. The link-layer protocols of the virtual network are said to be “tunneled” through the larger network. One common application is to secure communications through the public Internet, but a VPN need not have explicit security features, such as authentication or content encryption. VPNs, for example, can be used to separate the traffic of different user communities over an underlying, secure network (see

Overcoming the Disadvantages

With these new problems come new opportunities. Here are some suggestions:

Security. In addition to using a VPN, HPT practitioners may select programming language based on security needs. Use JavaScript to validate responses given by participants in online surveys. JavaScript is a programming language that alerts respondents participating in online studies if the fields were completed incorrectly or if fields were left blank during submission (White, Carey, & Dailey, 2000). Respondents receive error messages to fix the issue(s) in the survey before the final survey submission can be made. Perl is another programming language that can avoid both the cost and the potential for error incurred by the tedious data processing. Perl automatically deletes field names and other extraneous characters; transforms the various data fields into a single subject record; aggregates the subject records into single files maintained on the server; and performs any required algorithmic data re-coding before the researcher sees it (White, Carey, & Dailey, 2000).

Identifying Participants. One way to make sure that the intended participants are responding is to ask the participants to provide information that only would be able to identify—such as a password, an encrypted code, or an email address).


The future holds exciting possibilities for applying emerging technology to evaluating performance interventions. Here are a few examples for the immediate future and beyond.

Time Travel with Kirkpatrick . . .

It may be necessary to rethink Kirkpatrick’s well-known four levels of evaluation to include a consideration of time. The three time points are before the intervention, during the intervention, and after the intervention. (See Figure 17.1.)

Kirkpatrick’s four levels are, of course, quite well known, although often criticized (see Holton, 1996). Level 1 focuses on reactions, addressing how much people liked the training or other performance interventions in which they participated. Level 2 focuses on learning, addressing how much people learned from training or from other performance interventions in which they participated. Level 3 focuses on behavior, addressing how much people changed their on-the-job behavior based on the performance intervention. Level 4 focuses on results, addressing how much the organization gained in productivity improvement as a result of the performance intervention. But each level may also be examined at three points in time—before, during, and after the performance intervention.

Diagram  Description automatically generated

Adding the time element transforms the “hierarchy” into a “grid.” Each cell of the grid suggests a range of different approaches and focal points of concern. For instance, reaction viewed before the performance intervention focuses on what expectations the participants targeted for change have about the intervention; reaction viewed during the performance intervention focuses on monitoring changing attitudes about the intervention while the intervention is implemented; and reaction viewed after the performance intervention measures how much people liked the change effort upon completion. Each cell of the grid may prompt HPT practitioners to think of several ways by which to measure at each point in time. New, emerging technology may support evaluation in each cell of the grid. Possible ideas about doing that are summarized in Table 17.1.

Of course, creative HPT practitioners may come up with additional ideas about ways to apply new technology in conducting evaluation. Use the worksheet in Exhibit 17.1 to stimulate thinking about other ways to evaluate performance interventions on different levels of the hierarchy and at different points in time.

Getting Help from Avatars

Futurists already predict a time when avatars, representations of people often used in second-life applications, will be guided by artificial intelligence and not just by people (see Hill, 2008). That will mean that second-life avatars may be able to interact with people in simulated ways, giving them opportunities to rehearse actions in real life that might not otherwise be possible due to safety or other concerns. If that happens, then it may be possible to simulate efforts to collect, analyze, and evaluate data about instruction—or other performance interventions—before the change effort is implemented. It would go the step beyond mere guesswork forecasting to actual, robust, and simulated pilot-testing of interventions assisted by new technology.

Getting on the Grid

Futurists predict that the Internet and the web will be transformed into the grid in which people, objects, and data can be linked together as “nodes.” That will speed up efforts to achieve results by seamlessly integrating interactions between humans and technology. It will mean that assistive technology, now associated with technology that helps the disabled to perform like the able-bodied, may become associated with the means to integrate technology seamlessly with human efforts to perform work (Foroohar, 2002).

Table 17.1 Possible Ways That Emerging Technologies May Be Used to Evaluate Performance Technology Interventions

Text  Description automatically generated

Graphical user interface, text, application  Description automatically generated

Graphical user interface, text, application, email  Description automatically generated

Text  Description automatically generated

example, eyeglasses that permit users to see while also permitting them to access real-time Internet and video (see “Newer design of close-up computer monitors increases ease of use,” 2002). This could permit real-time, task-specific coaching on how to perform work. It could also give HPT practitioners the ability to analyze data even as they collect it, thereby slashing the time it takes to make data usable.

Another example: cell phones that do everything from access the Internet to permit instant messaging. Those cell phones are already with us, but their applications in collecting, analyzing, and evaluating real-time performance information is only now being considered. Cell-phone-based training and wristwatch-based training, perhaps in three dimensions, may be the next wave after podcasting, and that opens up the potential to give learners realtime performance assistance and give HPT practitioners the capability to do realtime data collection, analysis, and evaluation.

What Ethical and Confidentiality Issues Arise When Using New Technology to Evaluate Performance Interventions?

With new forms of technology arise new challenges in ethics and confidentiality. For example, there are issues with almost every method of online intervention technology—especially blogs, wikipedias, Facebook, YouTube, Twitter, seminars, and web conferences. Blogs, wikipedias, Facebook, YouTube, and Twitter are technological avenues (social networks) where it is easy to be unethical and violate copyright because the individuals posting information need not identify themselves and are not held accountable or mandated to publish the source. Just Google “media and confidentiality” to find instances in which clients have been specifically discussed by healthcare professionals.

Blogs. Some organizations are beginning to encourage employee groups to create blogs. Yet, blogs can also prompt many ethical, confidentiality, and copyright dilemmas. Blogs are a highly popular method to communicate by media. There are a multitude of blogs that are not controlled and that permit individuals to display text, images, audio files, and video available on the Internet without permission from the original creator or copyright holder (Kuhn, 2007). Currently, no code of ethics exists for bloggers, and so blogging on the web can be as lawless as the Old West. However, if a code of ethics were to be established, accountability and anonymity should be two key critical components. Bloggers should be accountable for the information that they post to give proper credit when due and should preserve anonymity when posting information (Kuhn, 2007). Evaluation of blogging as a performance intervention would require the evaluator to identify and evaluate the ethical issues as well as the performance results of the blogging intervention.

Digital Media. The U. S. government has considered issues of ethics and digital media. This has prompted the government to legislate a few federal acts in the hopes of upholding ethical standards. A few of these acts are as follows:

Audio Home Recording Act of 1992 (Pub.L.No-102-563): Exempts from copyright infringement the making of copies for personal use of music files in digital form, provided that those copies are made with approved equipment.

No Electronic Theft Act of 1997 (Pub. L. No. 105-147): Establishes penalties for unauthorized sharing and redistribution of copyrighted material over the Internet.

Digital Millennium Copyright Act of 1998 (Pub. L. No. 105-304): Creates several protection and enforcement mechanisms for copyrighted works in digital form.

Copyright. HPT practitioners should be aware of resources available from the U.S. Copyright Office (see This link provides information about copyright issues. It also explains how to copyright a work or record a document and reviews applicable copyright laws and policies. If there is uncertainty or questions of issues with copyright, HPT practitioners should refer to this site for additional information. They may also consult The Congress of the United States Congressional Budget Office (2004).

The federal laws listed in this chapter are based on copyright issues to protect individuals’ work that is displayed on the Internet. It is very important that HPT practitioners follow the federal laws when conducting technology-based evaluations. A significant case would be if a practitioner is using an evaluation assessment tool that he or she found online and the practitioner feels that, since the assessment tool was found online, there is no need to obtain copyright permission for subsequent use. This is a common misconception. Whether evaluation material is obtained online or off-line, proper permission must be obtained from the copyright holder.

Use of Information. One dilemma for HPT practitioners is how to use information on ethics and confidentiality when selecting the appropriate evaluation tool for their needs. Practitioners must know how utilizing the information obtained from the data collection methods may impact the participants in regard to ethics and confidentiality. For example, as with any data collection method to be selected for a study, HPT practitioners have to first gauge the conditions under which the data will be collected, the population to be sampled, and then the most appropriate method that will yield the most desirable results—while also protecting confidentiality and well-being of human subjects. Internal Review Boards (IRBs) are helpful to HPT practitioners in the sense that IRBs can help practitioners remain ethical and preserve confidentiality as they collect data. The IRBs will also educate practitioners on legal issues that can arise during data collection. Many federal government guidelines exist to protect human subjects when data are collected.


This chapter listed some examples of technology-based evaluation tools that could be used to evaluate performance interventions. Each new technology holds the promise of providing new, exciting ways to select or create data collection tools and to collect, store, retrieve, analyze, and communicate data in faster, more effective ways. Each new technology may also challenge HPT practitioners to come up with new ways to think about evaluating performance interventions. Indeed, it may be necessary to reinvent Kirkpatrick’s famous hierarchy into a grid that addresses times before, during, and after performance interventions.

The future is likely to offer even more opportunities to apply technology to how HPT practitioners evaluate. Just a few exciting possibilities include avatars that “think”; the emergence of the grid to replace the web and the Internet; and human-machine interactions that are seamlessly integrated in real time.

It is critically important to obey the appropriate copyright laws when using evaluation tools owned by others. Following the copyright laws means obtaining permission from copyright holders to use their work in your evaluation studies. Equally important, it is the duty of HPT practitioners to abide by all ethical standards and confidentiality laws of the U.S. government.


Archer, T. M. (2003, August). Web-based surveys. Retrieved July 7, 2008, from

Foroohar, R. (2002). Life in the grid. Retrieved July 7, 2008, from