My Journey to AI Safety

My passion for cognitive science began with questions about how our brains construct reality and facilitate social interaction. Today, my interdisciplinary research approach leverages insights from my 40+ publications and studies involving over 20,000 participants to address critical AI safety challenges—helping ensure that AI remains beneficial as capabilities grow.

Why Psychology for AI Safety?

đź§ 

Model 'Cognition' Diagnostics

Apophenia research maps onto hallucination mitigation.

🤝

Human-Compatible Design

Social-cognitive frameworks can steer AI toward cooperation and away from manipulation.

⚖️

Trait-Based Evaluation

From ad-hoc red-team prompts to psychometrically rigorous eval sets.

Vision

Where others perceive purely technical challenges, I see cognitive analogies rooted in empirical cognitive science research—critical for deeply aligning AI systems with human values.

Model Behaviour Portfolio

Metacognitive Hallucination Framework

71% Reduction

CBT principles → 71% fewer LLM hallucinations.

LLM Mentalizing Framework

Benchmarks to assess multi-layer nested belief reasoning—a core capability for strategic manipulation and situational awareness.

Dynamic Personality Layers

Working to develop controlled personality-trait tuning systems for predictable, user-aligned behaviour.

Professional Experience & Development

Industry Experience

AI Safety & Ethics Training

â—† Harvard AI Safety Student Team (2025)

Advanced AI safety research methodology and collaborative project development

â—† NeuroMatch Academy (2025)

Computational neuroscience methods with applications to mechanistic interpretability

â—† Global Challenges Project (2025)

Emerging challenges in AI safety and biosecurity and interdisciplinary approaches to existential risks

â—† Ethics of AI | University of Helsinki (2025)

Ethical AI development/use and applying frameworks from moral philosophy to questions related to contemporary AI

â—† AI Safety Fundamentals | BlueDot Impact (2024)

Technical alignment studies, including inner/outer alignment, interpretability, and safety frameworks

2026 AI Safety Fellowship Sequence

CAMBRIA
Mechanistic interpretability & RL
January 2026
FIG Fellowship
Consciousness indicator gaming
December 2025 - March 2026
ERA Fellowship
ToM-based deception detection
February - March 2026 (Cambridge, UK)
LASR Labs
Intensive safety research
Summer 2026 (London)

Next Directions

Consciousness Indicator Gaming

Investigating how AI systems may exploit human false-positive pattern detection to produce signals associated with moral patienthood—applying apophenia research to AI deception risks.

Emergent Misalignment in Multi-Agent Systems

Examining how personality traits Ă— ToM capabilities interact to produce cooperation, deception, and scheming behaviors under social pressure in AI systems.

Mechanistic Interpretability

Enhancing mechanistic interpretability research through techniques from model-based cognitive neuroscience and mathematical psychology.

Personality-Informed Model Organisms

Developing AI systems with calibrated trait profiles for alignment stress testing, enabling prediction and prevention of failure modes before deployment.

🎯

Current Mission

Seeking impactful roles integrating cognitive science, AI safety, and human-computer interaction. My aim is ensuring AI remains both powerful and fundamentally aligned with human values.

Contact

Let's build safe, aligned AI together.