What is Rasch Analysis?

What is Rasch Analysis? Rasch analysis was chosen to analyze the MATRICx data because it provides a transparent approach to testing unidimensionality and makes explicit assumptions about the conceptual coherence of the underlying trait being captured. In addition, it provides specific metrics so that rating scale structure can be optimized and raw scores rescaled to a continuous measure. Rasch analysis also provides insight into the targeting of items to participants, that is, how well items reflect the sample’s readiness to collaborate. Data were analyzed to examine the rating scale step structure, construct validity, internal consistency, item performance and dimensionality, and to establish the item and rating scale step calibrations. In general, Rasch analysis proceeds in a series of iterative steps ensuring that at each step, overall test psychometrics are improved. We initially examined rating scale structure to ensure all steps proceeded monotonically, and then examined item fit and dimensionality, removing items that showed misfit and examining the extent to which measurement precision improved as a result.

Rasch analysis converts ordinal item responses into log-odd probabilities (or logits. Higher logit values reflect more readiness to collaborate (for persons) and greater need for a collaborative stance (for items). An important assumption of the Rasch model is that items cohere to express a single underlying trait (unidimensionality), that more challenging items are more challenging for all respondents, and that the respondents most ready to collaborate will score more highly on the more challenging items. Analyses were conducted using WINSTEPS software to apply a rating scale model.

Category Threshold Order: The logit values for each of the rating scale categories should proceed in order, reflecting greater readiness to collaborate, across the dimension for each of the items.

Item Hierarchy: The logit ordering of the items from easiest to endorse to hardest to endorse should make conceptual sense and should represent a broad range of the construct, without multiple items grouped at the same level and without “gaps” in the construct where there are no items to measure person readiness.

Item Fit and Unidimensionality: The degree to which each item fits the measurement model is examined using infit and outfit mean squares. Fit statistics provide an indication of the degree to which person response strings match readiness level. Infit (information-weighted) MnSq is a chi-square that reflects unexpected responses to items that are near the ability of the person. For this study, we considered acceptable infit mean square (MnSq) to be between 0.7 and 1.3 and standardized z-scores (zstd) <2.0. Higher than expected values indicate items are most likely capturing a different construct. Additionally, a principal component analysis was used to further examine dimensionality.

In general, the percent of variance that is unexplained in the first contrast should be less than 10%, and the eigenvalue for that contrast should be minimized, generally <2.0. In addition, items with loadings greater than 0.3 were considered to load on a contrast.

Precision and Reliability: The person separation index (PSI) describes the degree to which the tool distinguishes among respondents. The more a tool is able to distinguish among respondents the more precise it can be considered to be. Generally, the lowest acceptable PSI for a tool is 2.0. This indicates that the respondents can be grouped into 3 different categories (or that there are 2 points of division). The separation reliability is a correlation coefficient describing the ratio of true to observed variance of the measures, with values interpreted in the same way as Cronbach’s alpha. An SR of 0.90 or higher is desired.

Targeting and Person Fit: Targeting describes the match of the distribution of the items to the respondents and is evaluated by examining the person mean measure and standard deviation relative to the item mean and standard deviation. By default, the items are always centered on a mean calibration of 0.0. If the person mean is higher than the item mean, then the respondents have greater readiness to collaborate than is reflected in the test items.