mixedsubjectsirt - Item Response Theory Calibration with a Mixed Subjects Design
Integrates large language model generated item responses
into psychometric calibration studies through a mixed-subjects
design for unidimensional two-parameter and one-parameter
logistic item response theory models. Human pilot responses are
augmented with model-generated responses using a
prediction-powered inference estimator (Angelopoulos, Bates,
Fannjiang, Jordan and Zrnic (2023)
<doi:10.1126/science.adi6000>; Angelopoulos, Duchi and Zrnic
(2023) <doi:10.48550/arXiv.2311.01453>) adapted to marginal
maximum-likelihood estimation, following the mixed-subjects
design of Broska, Howes and van Loon (2025)
<doi:10.1177/00491241251326865>. The estimator is anchored to
the human responses and is asymptotically unbiased for the
human item parameters at any tuning weight; the weight on the
synthetic responses is chosen to minimize propagated
ability-score risk, down-weighting uninformative or biased
generated responses. Louis-corrected sandwich standard errors,
ability scoring, cross-fitted tuning, and scale linking are
also provided.