My Journey to AI Safety

My path began with questions in developmental and cognitive psychology — how people construct reality, reason about others' minds, and develop social-cognitive capacities across the lifespan. Over 40+ publications and 20,000+ research participants, I've built expertise in exactly the failure modes that matter most as AI enters high-stakes social roles: false pattern detection, manipulative social reasoning, and unreliable self-report. Now I apply that expertise to evaluate model behavior, understand AI's societal impact, and build safety frameworks grounded in behavioral science.

Bridging Minds: Human & Machine

As AI systems increasingly serve as advisors, companions, and decision-support tools, the psychological dimensions of these interactions become central safety questions. Understanding how people — across ages, contexts, and vulnerabilities — actually experience AI is essential for building systems that help rather than harm.

AI and Human Connection - The Future of Collaborative Intelligence

Why Psychology for AI Safety?

🧠

Evaluation Design from Behavioral Science

Psychometric methods and experimental design produce rigorous, scalable evaluations for model behavior — from hallucination detection to social reasoning assessment.

🤝

Societal Impact Analysis

Understanding how people actually interact with AI — across ages, contexts, and vulnerabilities — reveals where systems help and where they cause harm.

⚖️

Safety Guidelines from Psychological Principles

Translating cognitive biases, developmental considerations, and well-being research into concrete technical requirements and model-behavior safeguards.

Vision

AI safety is not purely a technical problem — it requires understanding the humans who use these systems. Decades of research in developmental, social, and cognitive psychology provide the empirical foundation for evaluating model behavior, anticipating societal impacts, and designing AI that works responsibly in the real world.

Model Behaviour Portfolio

Metacognitive Hallucination Framework

71% Reduction

Cognitive-behavioral principles translated into model evaluations and prompting strategies — achieving 71% fewer LLM hallucinations.

LLM Mentalizing Framework

Evaluation benchmarks for multi-layer nested belief reasoning — assessing a core capability for strategic manipulation, social engineering, and situational awareness in AI systems.

Dynamic Personality Layers

Personality-informed safety guidelines for AI behavior — ensuring model persona consistency and psychologically appropriate interactions across user contexts.

Professional Experience & Development

Industry Experience

AI Safety & Ethics Training

Harvard AI Safety Student Team (2025)

Advanced AI safety research methodology and collaborative project development

NeuroMatch Academy (2025)

Computational neuroscience methods with applications to mechanistic interpretability

Global Challenges Project (2025)

Emerging challenges in AI safety and biosecurity and interdisciplinary approaches to existential risks

Ethics of AI | University of Helsinki (2025)

Ethical AI development/use and applying frameworks from moral philosophy to questions related to contemporary AI

AI Safety Fundamentals | BlueDot Impact (2024)

Technical alignment studies, including inner/outer alignment, interpretability, and safety frameworks

2026 AI Safety Fellowship Sequence

CAMBRIA
Mechanistic interpretability & RL
January 2026
FIG Fellowship
Consciousness indicator gaming
December 2025 - March 2026
ERA Fellowship
ToM-based deception detection
February - March 2026 (Cambridge, UK)
LASR Labs
Intensive safety research (accepted; deferred)
Summer 2026 (London)

Next Directions

Real-World Usage Analysis & Societal Impact

Using observational and quantitative methods to analyze how diverse populations interact with AI systems — surfacing patterns in real-world use that inform safety evaluations, policy recommendations, and model improvements.

Psychological Safety Frameworks for AI

Translating developmental, social, and cognitive psychology into implementable safety guidelines — ensuring AI systems interact with people in ways that are healthy, appropriate, and informed by empirical science on well-being and vulnerability.

Evaluation Methodology & Behavioral Benchmarks

Building psychometrically rigorous evaluation suites that assess model behavior across safety-critical dimensions — from advice quality in high-stakes situations to social manipulation resistance.

🎯

Current Mission

Seeking roles where psychological expertise directly shapes how AI systems are evaluated, improved, and governed — whether that's analyzing societal impacts of real-world AI use, building behavioral evaluations for model safety, or developing psychological safety guidelines for responsible AI development.

Back to Home

Explore more research areas and projects

Contact

Let's build AI that's safe for the people who use it.

Built with Claude Code — From Research to Reality