Skip to content
SciBLIND
Learn

Why every methodological decision matters

When to use each paradigm, what it measures, and live demonstrations of how bias, low power, and missing counterbalancing turn a clean study into a confidently wrong one. Every section links to a one-click template you can clone and adapt.

Seven paradigms

Seven well-established measurement contracts. Pick the one that matches your research question, not the one that looks coolest.

🏆

Blind comparison (RANKING)

When to use: When you want a defensible ranking of items — designs, stimuli, logos, candidate options — free of order effects and anchoring bias.

What it measures: Pairwise/quad votes aggregated into ELO + Bradley-Terry rankings with standard errors and a publishable-threshold gate.

References

  • Bradley & Terry (1952) Biometrika
  • Elo (1978) The Rating of Chessplayers

Reaction time (REACTION_TIME)

When to use: Whenever the dependent variable is speed: Stroop, Simon, flanker, go/no-go, lexical decision, primed classification.

What it measures: Trial-by-trial RT + accuracy, per-condition means, timing-audit grade (Reimers & Stewart 2015) on every published study.

References

  • MacLeod (1991) Psychological Bulletin – half a century of Stroop
  • Simon & Rudell (1967) Journal of Applied Psychology
  • Reimers & Stewart (2015) Behavior Research Methods – browser RT timing
🎯

Categorical perception (DISCRIMINATION)

When to use: Match-to-sample within vs cross-category trials – tests whether a linguistic or perceptual category boundary sharpens discrimination.

What it measures: Accuracy, RT, cross-minus-within advantage; ΔE-controlled CIE Lab stimuli via the lab-generator so luminance never confounds the result.

References

  • Winawer et al. (2007) PNAS – Russian blues
  • Harnad (ed., 1987) Categorical Perception
📏

Method of adjustment (METHOD_OF_ADJUSTMENT)

When to use: When the DV is a continuous error — participant drags a stimulus until it matches a reference. Müller-Lyer, Ebbinghaus, Ponzo and other length / size illusions.

What it measures: Settled value vs reference, per-trial illusion magnitude in pixels, per-participant adjustment trajectory, group means with normative-range overlay.

References

  • Fechner (1860) Elemente der Psychophysik – classical adjustment method
  • Müller-Lyer (1889) Archiv für Anatomie und Physiologie
🆎

Between-subjects A/B

When to use: Product or landing-page tests where each participant sees only one variant. Random assignment with counterbalancing, per-variant KPIs, and a built-in χ² test.

What it measures: Conversion rate, click-through, time-on-task per variant; confidence intervals and effect size.

References

  • Kohavi, Tang & Xu (2020) Trustworthy Online Controlled Experiments
📋

Survey

When to use: Attitudes, demographics, PROs. Drop-in question library for Likert, binary, numeric, free-text, with piped logic and jump rules.

What it measures: Per-item descriptives, Cronbach α, item-total correlations, exportable to SPSS/R/CSV.

References

  • DeVellis (2017) Scale Development, 4th ed.
📝

Vignette study

When to use: Factorial moral / judgement studies: short scenarios with manipulated factors, followed by attitude or decision items. AI-assisted scenario generation + TTS narration.

What it measures: Full factorial designs with attention checks; item-level scores and between-condition contrasts.

References

  • Atzmüller & Steiner (2010) Methodology – experimental vignette studies
  • Aguinis & Bradley (2014) ORM
Newv2.11.8.0

Methodological demos

These are not »try a paradigm« demos – those live on /demo. These are demonstrations of why each methodological decision matters: drag a contaminant into a clean study and watch the result break.

Methodological demo

What position bias does to your ranking

Ranking is intact · ρ = 1.00

The measured order matches the truth — bias is small enough that ELO recovers cleanly.

Truth (ground-truth quality)

  1. 1.Item A
  2. 2.Item B
  3. 3.Item C
  4. 4.Item D
  5. 5.Item E

Measured (ELO leaderboard)

  1. 1.Item A
  2. 2.Item B
  3. 3.Item C
  4. 4.Item D
  5. 5.Item E

How SciBLIND prevents this: every blind-comparison study runs Latin-square counterbalancing on item ordering by default, with the matchmaking layer guaranteeing each item appears in each slot equally often. The leaderboard you publish is the ρ ≈ 1.0 column on the left, not the ρ < 0.5 column you can drag yourself into above. · 200 simulated comparisons · seed 42

In the works

Power-curve playgrounds

Queued

Sample-size × effect-size × α calculator with a live curve and SciBLIND-default warning thresholds — so you size your study before you run it, not after.

Replication failure replays

Queued

Animated walkthroughs of published studies that failed replication, with the methodological flaw highlighted and the SciBLIND-built equivalent shown side-by-side.

Stimulus-design previews

Queued

Latin-square counterbalancing as a grid; ELO matchmaking depth as a saturation heatmap; staircase convergence as an animated trace. The inner workings participants never see.

Effect-size reference cards

Queued

Interactive d / OR / r explorers with worked examples mapped to SciBLIND paradigms. Translate a literature claim into a sample-size budget in two clicks.

Each kind ships when it can do its category justice. The alternative – four half-built demos – would be scientific theatre, and that's the opposite of what /learn is for.

Methods primer

Short, plain-English notes on the concepts researchers ask about most. No equations hidden; no jargon surfaced.

Web-browser RT timing — what you can and cannot measure

Jitter is dominated by display refresh and event-loop latency. With ≥60 Hz and our timing audit, differences ≥25 ms reproduce cleanly; sub-10 ms effects need lab equipment.

Read article

Color ΔE and why it matters

Hand-picked hex codes give uneven perceptual distances. CIE Lab with fixed L* and uniform ΔE between swatches removes luminance confounds from any color discrimination study.

Read article

Signal detection theory in 5 minutes

Hit rate and false-alarm rate → d′ (sensitivity) and c (bias). Separates »good at the task« from »trigger-happy.« SciBLIND computes both on every DISCRIMINATION study.

Read article

Adaptive staircases without a PhD

Pick a target accuracy (e.g. 75%). The n-up/n-down rule converges to the matching stimulus level in ~30 trials. Our builder exposes this as a single slider.

Read article

Counterbalancing and order effects

Latin-square randomization neutralises stimulus-order confounds across participants. Blocked randomization preserves within-condition learning. The builder picks sensibly by default.

Read article

Experience sampling (ESM / EMA)

Ecological momentary assessment done right: schedules, prompt channels (push / Telegram / WhatsApp / email), compliance benchmarks, and how SciBLIND turns a 2-week diary study into one builder step.

Read article

MTurk / Prolific / Bakker completion codes

How completion codes work, why SciBLIND mints them server-side, and how to wire them into your MTurk HIT, Prolific study, or Bakker project. Covers fixed / per-session / JWT-embedded modes.

Read article

Ready to build?

Skip the scripting. Describe your research question; the AI configurator pre-fills a valid study. You review every suggestion before participants see it.