About - Scott Blain, PhD

Why Psychology for AI Safety?

🧠

Model 'Cognition' Diagnostics

Apophenia research maps onto hallucination mitigation.

🤝

Human-Compatible Design

Social-cognitive frameworks can steer AI toward cooperation and away from manipulation.

⚖️

Trait-Based Evaluation

From ad-hoc red-team prompts to psychometrically rigorous eval sets.

Vision

Where others perceive purely technical challenges, I see cognitive analogies rooted in empirical cognitive science research—critical for deeply aligning AI systems with human values.

Model Behaviour Portfolio

                        
                        Metacognitive Hallucination Framework
                        71% Reduction

CBT principles → 71% fewer LLM hallucinations.

LLM Mentalizing Framework

Benchmarks to assess multi-layer nested belief reasoning—a core capability for strategic manipulation and situational awareness.

Dynamic Personality Layers

Working to develop controlled personality-trait tuning systems for predictable, user-aligned behaviour.

Professional Experience & Development

Industry Experience

Senior Data Quality Specialist (Freelance)

2024 – Present

Cohere, San Francisco, CA | Snorkel, Redwood City, CA

Evaluate large language model outputs for safety and alignment, focusing on identifying failure modes and improving reliability. Develop systematic approaches to assess model behavior and create evaluation frameworks based on cognitive science principles.

Essential contributor of qualitative model output evaluations enabling rollout of Cohere's Command A frontier model
Contributed 250+ benchmark items to assess model behavior across domains (e.g., social reasoning, factuality, honesty)
Apply cognitive science frameworks to evaluate model outputs and identify potential risks
Develop prompt engineering methodologies informed by psychology research
Create evaluation rubrics for assessing model behavior in multi-turn dialogues
Contribute to safety datasets and quality improvement initiatives

AI Safety & Ethics Training

◆ Global Challenges Project (2025)

Emerging challenges in AI safety and biosecurity and interdisciplinary approaches to existential risks

◆ Ethics of AI | University of Helsinki (2025)

Ethical AI development/use and applying frameworks from moral philosophy to questions related to contemporary AI

◆ AI Safety Fundamentals | BlueDot Impact (2024)

Technical alignment studies, including inner/outer alignment, interpretability, and safety frameworks

Next Directions

Mechanistic Interpretability

Enhancing mechanistic interpretability research through techniques from model-based cognitive neuroscience and mathematical psychology.

Hierarchical Alignment Frameworks

Developing frameworks for alignment that incorporate universal principles as well as strategies tailored to specific use cases and user personality profiles.

About & Approach

My Journey to AI Safety

Bridging Minds: Human & Machine