How Universities Work: Week 10

Week 10: Measuring Quality

Reading
Essays 10 [due Monday at 5 pm]

Quality is among the most elusive of academic measurements. Everyone speaks about it, everyone is in favor of it, and most academics believe they know it when they see it. But when we ask for quantitative measures of quality for departments, colleges, and universities, the task immediately becomes remarkably complex. The publication The Top American Research Universities, defines quality for institutions by using a variety of measures that indicate the quality of each institution in nine categories. For the purposes of identifying clusters of high quality research university campuses, this methodology serves admirably, even if it is not without controversy. However, the measurement of quality within the university at a high enough level of specificity and reliability to drive value budgeting poses an additional challenge.

Value budgeting requires reasonably robust measures if it is to allocate dollar rewards for improvements in quality. Unless the university can measure something, it cannot fairly or effectively reward its improvement. Incentives and the behaviors that they generate require clear measurement.

Although quality and productivity receive equal treatment in value budgeting, in fact, they do not play the same role in institutional improvement. Quality is the main event because the best universities want the highest quality students, faculty, and educational and research programs. Quality is an end in itself, and in theory, perhaps, if an institution had one faculty member and one student both of the highest quality in the nation, it could be satisfied. This is the theory of extremes that drives sports and other winner take all enterprises. In universities, however, this approach is not effective. Universities do not have seasons and they do not win championships for their academic performance.

Universities serve as stable, permanent, and continuing generators of knowledge and learning and their associated services and benefits to society. What matters to universities is sustained high performance in both productivity and quality. Productivity speaks to the university's commitment to deliver a substantial volume of goods and services in exchange for the investment of individuals, state and federal governments, and other support groups. Quality speaks to the university's commitment to deliver products and services that are of national and international competitive quality. The institution's success as an enterprise comes from its ability to compete in the marketplace for the resources needed to enhance its quality. The relationship between productivity and quality often escapes academic observers primarily because they underestimate the importance of both productivity and quality in acquiring the money needed to compete in the national and international quality marketplace.

Value budgeting approaches quality from two perspectives: national and local. Value budgeting makes the academic judgment that quality for research universities must be measured in a national context. Quality is not an abstraction but a specific measurement against a national standard. Each guild has its own national standards of quality, and what counts as quality for one guild may well be irrelevant for another.

To take a simple example, historians look to books and book-length monographs as the primary quality products they evaluate. Journal articles are significant but they do not substitute for the book. In scientific fields, however, the journal article is the primary vehicle for demonstrating research quality. In some fields single author publications determine quality where in others multiple author publications are the norm. Science fields measure grants while fine arts, theater, and music look to exhibits and performances. The variations on the definitions of quality multiply along with the expansion of fields and sub-fields within each guild and its disciplines.

Although this may appear hopeless, academics are quite good at evaluating the quality of their colleagues' work in the same fields in other institutions. They tend on occasion to be unclear to outsiders about the process, but they know how to measure quality, and if pushed, they can be explicit in the evaluation procedure.

If a university adopts value budgeting and if it provides significant rewards for demonstrated quality improvement, the faculty will develop effective quality measurement. Not perfect of course, but close enough to provide a fair basis for distributing rewards. Experience demonstrates that quality measurement requires two dimensions: an external benchmark relative to the national competition and internal measurement of improvement.

Ideally, all quality measurement would involve national benchmarking. In a national benchmark, the university measures the college or program relative to the best of its type in the nation. Some think the goal is to measure against one's peers, but that is not an appropriate benchmark for universities that compete in the national marketplace for top faculty talent. Peer benchmarking tends to produce comparisons against relatively easy competition, making the unit in question look artificially better than it actually is on a national level. Instead, universities must insist on national benchmarking against standards that reach into the top competitive levels.

Benchmarking is an expensive and time consuming process. Most colleges, and their programs and departments, know what matters in determining quality in their fields (number of publications, number of grants, type of awards, number of citations), but the data for a benchmark on these elements only sometimes exist in any easily acquired, consistent and comparative framework. In many cases colleges and departments have to acquire the data in a consistent form from their counterparts and present them to demonstrate their relative national quality. The cost and time involved in this means that the units cannot accomplish the project every year. A three-year cycle for national benchmarking that involves one-third of a university's units each year is probably about as much as it is reasonable to expect.

While the benchmarks set national levels of performance they do not serve as well for the purpose of driving value budgeting because they lag actual performance changes by the time required to develop the measurements (anywhere from one to three years). Value budgeting is an annual, incremental incentive system, and its effectiveness depends on a short cycle between performance change and the delivery of incentive rewards. The solution to this problem is the addition of internal measures of quality improvement.

Internal measures of quality take the same quality measures identified for the national benchmarking, and track an individual unit's performance from year to year, identifying and rewarding improvements as they occur. This provides immediate feedback and reward for improvement, enhancing the usefulness of the incentives. In addition, the internal measures can often be more detailed and more finely tuned than what is possible to acquire from external benchmarks.

Even though this process appears relatively straightforward, it is not. Many colleges and departments simply do not want to participate in a program that makes explicit the issue of quality. Many academics prefer to speak about quality in the abstract rather than measure it explicitly, either because they worry that perhaps their self-image may not match a data-driven evaluation of quality or because they do not believe in the value based budgeting process itself and the best attack against it is to deny that it can support quality improvement.

Whatever the motivation, two common errors often appear in the development of quality measures for colleges and schools.

Mistaking resources for quality

In this error, the unit will equate quality with the average salaries of faculty, they will equate quality with the number of faculty in the department, they will speak of quality in terms of the materials accessible through the library for their field.

These are all important issues but they do not speak to quality, they speak to the resources available in support of quality. So, if we say the faculty are underpaid, which may well be true, that says nothing about the quality of the work that these underpaid faculty do. It is possible for the faculty to be underpaid and nonetheless produce high quality just as it is possible for the faculty to be overpaid and produce low quality. The issue for value budgeting is the quality produced. Value budgeting does not tell the unit how to produce quality, instead, it measures and rewards the actual annual improvement in the quality.

Asserting that no measurable item of quality exists

In this error, the unit will indicate that the work done by the faculty is so esoteric or indefinable that it cannot be measured in the quantitative way required by value budgeting. This is the Luddite response to quality improvement.

In every case identified, this proves, on closer examination to be false. The reason is simple. If there is no consensus in the guild about what defines high quality, then there is no reason for the university to spend scarce resources in pursuit of what cannot be reliably measured. In practice, every university guild knows what defines quality. Take, for example, the fine arts. Some will say that the quality of a musical performance, a sculpture, or a ballet defies quantification. Perhaps. But it is also the case that those who appreciate music, the visual arts, and ballet have a clear set of criteria for determining excellence: a review in the New York Times, shows at the Metropolitan Museum of Art, performances at Lincoln Center, awards from nationally renown piano competitions, and so on. The elements that define the marketplace for artistic quality are as easily counted as journal articles in prestigious refereed journals. Once a guild discovers that the rewards available for quality improvement require a reasonable and measurable definition of quality, they almost always find a way to develop the measures and use them for internal improvement and external benchmarking.

In some fields, national associations of guilds collect data on the publication rates and other defined elements of quality relevant to the field, and where these exist they help the units demonstrate improvement.

Finally, we come to the very thorny question of teaching quality. This element of quality evaluation offers the greatest challenge and has the least satisfactory resolution. Teaching quality in the aggregate, is quite difficult to measure. Partly it is because the range of quality in teaching at first rank universities tends to be very narrow: most teaching is good to very good. Partly it comes from the almost complete lack of any national guild standards that speak to the evaluation of teaching quality. What we have in teaching is a competency floor. The guilds worry quite a bit about ensuring that all members are acceptable, even proficient teachers, but they do not worry about defining outstanding teaching quality.

Teaching is a performance art, as well as a learning experience. The student evaluations used in almost every university in the country have almost no utility in defining teaching quality. These surveys generate data that on close inspection tell very little about what the student learned in the course or how well the teacher taught the course, although they often tell something about how much the student enjoyed the course, or how wisely the student believes the instructor evaluated the student's performance. Because teaching evaluation is most reliable in assessing enjoyment, instructors can improve their teaching evaluations by improving the popular elements in their performances as teachers, a practice that can, although need not, result in significantly lowered expectations for student learning. The aggregate learning of students in a department or college, a presumed outcome of teaching, defies measurement although some institutions offer tests of critical thinking and other devices designed to discover what students have learned beyond what the grades in their courses signify.

Because the teaching outcomes of individual class and student coursework are so difficult to aggregate, many colleges and units focus on the progress of students who choose particular undergraduate majors in their units. By tracking the success of students in finding jobs, passing licensing examinations, going to graduate school, and similar post-instruction outcomes, some indicators of success appear that may well reflect learning. Additionally, most units can speak with some authority about the quality of their graduate programs in terms of the qualifications of the students who enter (GRE and other entry tests such as GMAT, MCAT, or LSAT). They can speak about postdocs awarded in some fields, about the pass rates on professional association tests, and other external references that speak to the marketability of graduates. The range of possible measures here is relatively wide, but the reliability of the teaching quality indicators is poor, reflecting the national inability to specify the measurable characteristics of quality teaching. National indicators related to teaching almost always fail because, in addition to the elements mentioned here, teaching is a local product with no national marketplace.

The discussion of teaching also raises the issue of student life. Many observers identify student life, student activities, and other non-classroom student interactions as critical elements of the learning provided by universities. Although considerable sympathy exists for this point of view, measures of quality and productivity for student life prove even more difficult to acquire than measures of academic classroom achievement. Professionals in student personnel administration can count participation rates, activities, involvement, and they can run surveys that attempt to identify the before and after impacts of the college experience. In some cases the results serve the purposes of evaluating student life, but whether they contribute to the academic learning achieved is a question not so easily answered.

These indicators offer some hope for developing good measures for student services improvement, but for now, few good measures that would support a system of reward exist. Partly this is because student affairs is much like other social services in that there is an infinite demand for its expertise. Universities can always use more counselors, student advisors, student activities coordinators, specialty service providers, or recreation centers. Establish a student service office and it creates its own clientele in a matter of days. At the same time, the incremental benefit to the university's missions of teaching and research from any additional investment in these services is difficult to establish. For this reason, most student service activities end up as part of the support structure, evaluated in terms similar to physical plant or security or parking services rather than as part of the educational enterprise. Additionally, few faculty, in large research universities, take an interest in this activity and the guilds do not regard it as part of their primary concern. How we measure the quality of student life and create the reward structure that supports that performance remains an unresolved question.

Under value budgeting, every incentive focuses in the same direction: the measurement of teaching and research performance in quality and productivity and the incentive of a reward for improvement. Because the incentives all focus in the same direction, over time, the quality of the measurements themselves improve as colleges and units learn to measure and receive rewards for improvement.

Measurement of quality never comes without controversy, however, and this topic generates a wide range of issues:

What forms of gaming can undermine value budgeting systems for measuring and rewarding quality and how can universities address such gaming?
Why is teaching so difficult to evaluate for quality? Does it make any difference?
Why are academics so resistant in many fields to measuring and rewarding quality?
What other methods besides financial rewards might motivate colleges and units to perform at a higher level?
How can student life become a mainstream element in the university's definition of teaching?

Top