Select Page

The following is an excerpt from Chapter 8: “Inter-rater Reliability” of Dr. Dianna Whitlock’s published book, “Teacher Evaluation as a Growth Process.” To purchase the full book, visit Amazon or Barnes&Noble.

Part 1, Chapter 8: Inter-rater Reliability

A description of a strong teacher evaluation system is not complete without discussion on inter-rater reliability. Inter-rater reliability is vital to employee evaluation practice to eliminate biases and sustain transparency, consistency, and impartiality (Tillema, as cited in Soslau & Lewis, 2014, p. 21). In addition, a data-driven system of evaluation creating a feedback rich culture is considered best practice. Examination by school leadership of quantitative trend data and comparison of evaluators is the essence of professional growth (Graham, Milanowski, & Miller, as cited in Soslau & Lewis, 2014, p. 39). Assurance of inter-rater reliability decreases biases and increases ethical practice in the evaluation procedure. 

The purpose of ensuring inter-rater reliability is two-fold.  First and most obvious, inter-rater reliability is the practice of ensuring that there is more than one set of eyes evaluating a teacher.  Requiring multiple observations by various administrative staff increases inter-rater reliability. In a school with multiple administrators, or a principal and assistant principal, this is fairly easy to manage.  Some smaller schools just have one administrator per building. This creates an issue for the building administrator not only to complete all evaluations, but to have a second voice when gathering data during observations or making decisions on teacher effectiveness. Multiple observations conducted by these evaluators are crucial to sustained inter-rater reliability as well.  

The second element of inter-rater reliability is ensuring that all evaluators are looking for the same traits of good teaching.  The pre-determined rubric is certainly helpful with this, but individual interpretation can lead to different understandings of what good instruction looks like.  This requires on-going conversation and training of administrative teams. 


It is ultimately the responsibility of school leadership to analyze and track data to ensure that observations are being completed with fidelity. If this is not the case, then additional administrative training may be beneficial. Let’s look at a case study of a small rural school that is working to ensure inter-rater reliability:

South Harrison School Corporation is a small rural school in southern Indiana.  Their superintendent reached out to staff members of Standard For Success, who have been providing an online system for managing their teacher evaluation process for several years.  The goal of the school district was increased inter-rater reliability, and they were in search of facilitators to guide this discussion among their administrative team.  Keep in mind that this is a small district and that observations by multiple administrators could have been a challenge. Therefore, the building principals in the district were scheduled to trade buildings and act as a secondary evaluator in another building. This solved the problem of multiple evaluators in a building, but the district leadership team wanted to take things a step further. Their focus was the second element of inter-rater reliability, ensuring that all evaluators in the district were identifying what good instruction looks like in their district.  For example, while the district rubric contained an indicator on student engagement, what does student engagement look like?

To continue reading, click here to purchase the full book.

Featured Articles