Seven well-established measurement contracts. Pick the one that matches your research question, not the one that looks coolest.
🏆
Blind comparison (RANKING)
When to use: When you want a defensible ranking of items — designs, stimuli, logos, candidate options — free of order effects and anchoring bias.
What it measures: Pairwise/quad votes aggregated into ELO + Bradley-Terry rankings with standard errors and a publishable-threshold gate.
References
- Bradley & Terry (1952) Biometrika
- Elo (1978) The Rating of Chessplayers
⚡
Reaction time (REACTION_TIME)
When to use: Whenever the dependent variable is speed: Stroop, Simon, flanker, go/no-go, lexical decision, primed classification.
What it measures: Trial-by-trial RT + accuracy, per-condition means, timing-audit grade (Reimers & Stewart 2015) on every published study.
References
- MacLeod (1991) Psychological Bulletin – half a century of Stroop
- Simon & Rudell (1967) Journal of Applied Psychology
- Reimers & Stewart (2015) Behavior Research Methods – browser RT timing
🎯
Categorical perception (DISCRIMINATION)
When to use: Match-to-sample within vs cross-category trials – tests whether a linguistic or perceptual category boundary sharpens discrimination.
What it measures: Accuracy, RT, cross-minus-within advantage; ΔE-controlled CIE Lab stimuli via the lab-generator so luminance never confounds the result.
References
- Winawer et al. (2007) PNAS – Russian blues
- Harnad (ed., 1987) Categorical Perception
📏
Method of adjustment (METHOD_OF_ADJUSTMENT)
When to use: When the DV is a continuous error — participant drags a stimulus until it matches a reference. Müller-Lyer, Ebbinghaus, Ponzo and other length / size illusions.
What it measures: Settled value vs reference, per-trial illusion magnitude in pixels, per-participant adjustment trajectory, group means with normative-range overlay.
References
- Fechner (1860) Elemente der Psychophysik – classical adjustment method
- Müller-Lyer (1889) Archiv für Anatomie und Physiologie
🆎
Between-subjects A/B
When to use: Product or landing-page tests where each participant sees only one variant. Random assignment with counterbalancing, per-variant KPIs, and a built-in χ² test.
What it measures: Conversion rate, click-through, time-on-task per variant; confidence intervals and effect size.
References
- Kohavi, Tang & Xu (2020) Trustworthy Online Controlled Experiments
📋
Survey
When to use: Attitudes, demographics, PROs. Drop-in question library for Likert, binary, numeric, free-text, with piped logic and jump rules.
What it measures: Per-item descriptives, Cronbach α, item-total correlations, exportable to SPSS/R/CSV.
References
- DeVellis (2017) Scale Development, 4th ed.
📝
Vignette study
When to use: Factorial moral / judgement studies: short scenarios with manipulated factors, followed by attitude or decision items. AI-assisted scenario generation + TTS narration.
What it measures: Full factorial designs with attention checks; item-level scores and between-condition contrasts.
References
- Atzmüller & Steiner (2010) Methodology – experimental vignette studies
- Aguinis & Bradley (2014) ORM