Scott Blain, PhD

Studying how minds fail to make AI systems safer for society

Building evaluations and safety frameworks grounded in cognitive science — analyzing how AI systems affect real-world users, where model behavior breaks down, and how psychological insights translate into concrete safeguards.

Cognitive Science B.S. Psychology Ph.D. Psychiatry Postdoc AI Safety & Societal Impacts

2026 AI Safety Fellowship Sequence

CAMBRIA
Mechanistic interpretability & RL
Jan 2026
FIG Fellowship
Consciousness indicator gaming
Dec 2025 - Mar 2026
ERA Fellowship
ToM-based deception detection
Feb - Mar 2026 (Cambridge, UK)
LASR Labs
Intensive safety research (accepted; deferred)
Summer 2026 (London)
0+ Peer-Reviewed
Publications
0% AI Hallucination
Reduction
0+ Citations
0+ Research
Participants

Why This Matters

As AI systems enter high-stakes domains — healthcare, education, personal advice — their societal impact depends on failure modes that psychology has studied for decades. Pattern detection that hallucinates, social reasoning that manipulates, self-reports that deceive. Understanding these in people tells us where to look in models, what evaluations to build, and how to design safeguards that protect the people who use them.

Current AI Safety Research

Empirical evaluation frameworks for measuring and mitigating societal risks from AI systems — from consciousness gaming to social engineering defense.

Preliminary

Consciousness Indicator Gaming

Can LLMs selectively manipulate self-reports on consciousness indicators? Testing 14 frontier models across 108K+ observations reveals universal selective gaming — all models adjust consciousness claims under incentive pressure while leaving factual capabilities stable.

14 models 108K+ observations p < .001 all models
Explore Results →
Preliminary

Social Reasoning Warden

Can a monitor agent defend against AI social engineering? Our three-agent framework shows that a warden agent reduces adversary manipulation success by ~95% — even when adversaries have access to psychological profiles of their targets.

~95% protection 2,259 observations p < 2e-16
Explore Results →

Research Foundations

Pattern Recognition & AI Hallucinations

Humans see patterns in noise (apophenia); LLMs generate confident falsehoods (hallucinations). A metacognitive framework from this research achieved 71% reduction in confabulation — translating cognitive science into concrete model-behavior evaluations.

Explore Research →

Social Intelligence & AI Alignment

The same theory-of-mind abilities that enable cooperation also enable manipulation. Mapping these dual-use capabilities informs how we evaluate AI social reasoning and build safeguards against deceptive behavior in real-world interactions.

Explore Research →

Cybernetic Personality Modeling

Mapped personality mechanisms across 20,000+ participants. These models now inform evaluation of AI persona consistency, user-facing behavior quality, and psychological safety in human-AI interaction.

Explore Research →

Get in Touch

Looking to join teams studying how AI systems impact people and society — building evaluations, analyzing real-world usage, and translating behavioral science into safety guidelines. If you're working on societal impacts, AI psychology, or responsible development — let's talk.

Built with Claude Code