Pilot Instructor Rater Training: The Utility of the Multifacet Item Response Theory Model

In this article we use a multifacet measurement technique, the multifacet Rasch model, to analyze the results of an IRR-training program. Our approach is an alternative to the procedures—congruency, consistency, agreement (rwg), sensitivity, and systematic differences—currently used during IRR training within the airline industry. We believe that this multifacet procedure can improve the quality of pilot instructor training by providing pilot instructors with important information that is not available with other techniques. We used the multifacet Rasch method instead of generalizability (G) theory, another multifacet technique.

Similar to multifacet Rasch analysis, G-theory provides information about facets—pilot instructors, videotapes of aircrews used in IRR training, and LOS grade sheets—and their interactions with one another. However, G-theory partitions the variance attributable to each of these facets using an analysis of variance (ANOVA) framework and thus focuses on groups as the unit of analysis (i.e., whether or not pilot instructors as a group are reliable or unreliable as opposed to the performance of a particular instructor undergoing IRR training). Unlike the multifacet Rasch model, G-theory is a classical test theory model.

Within classical test theory, an aircrew’s performance ratings on a LOS are a function of their true score (i.e., true performance) and error (i.e., instructor variability and grade sheet variability). The larger the error component, the less reliable the LOS grades. Generalizability studies partition this error term into identifiable components. By identifying potential sources of error and the magnitude of each source’s contribution to the error component, steps can be taken to reduce error and improve reliability. Inasmuch as high reliability is the goal, reducing variability caused by instructors, grade sheets, and crews are important components in achieving that goal. Within the air-carrier industry, IRR training has been the primary strategy used for error reduction.

The multifacet Rasch technique is an item response theory (IRT) model that focuses on individual elements of the LOS assessment process. This model is useful for pilot instructor rater training because it provides individual-level, as opposed to group-level, information that can be directly fed back to individual pilot instructors. Information about the LOS grade sheet and the videotapes used for practice and feedback can also be gleaned from this analysis.