My Journey to AI Safety

My passion for cognitive science began with questions about how our brains construct reality and facilitate social interaction. Today, my interdisciplinary research approach leverages insights from my 35+ publications and studies involving over 20,000 participants to address critical AI safety challenges—helping ensure that AI remains beneficial as capabilities grow.

Bridging Minds: Human & Machine

As we stand at the intersection of human cognition and artificial intelligence, we face unprecedented opportunities to shape how these two forms of intelligence interact, collaborate, and evolve together.

AI and Human Connection - The Future of Collaborative Intelligence

Why Psychology for AI Safety?

🧠

Model 'Cognition' Diagnostics

Apophenia research maps onto hallucination mitigation.

🤝

Human-Compatible Design

Social-cognitive frameworks can steer AI toward cooperation and away from manipulation.

⚖️

Trait-Based Evaluation

From ad-hoc red-team prompts to psychometrically rigorous eval sets.

Vision

Where others perceive purely technical challenges, I see cognitive analogies rooted in empirical cognitive science research—critical for deeply aligning AI systems with human values.

Model Behaviour Portfolio

Metacognitive Hallucination Framework

71% Reduction

CBT principles → 71% fewer LLM hallucinations.

LLM Mentalizing Framework

Benchmarks to assess multi-layer nested belief reasoning—a core capability for strategic manipulation and situational awareness.

Dynamic Personality Layers

Working to develop controlled personality-trait tuning systems for predictable, user-aligned behaviour.

Professional Experience & Development

Industry Experience

AI Safety & Ethics Training

Global Challenges Project (2025)

Emerging challenges in AI safety and biosecurity and interdisciplinary approaches to existential risks

Ethics of AI | University of Helsinki (2025)

Ethical AI development/use and applying frameworks from moral philosophy to questions related to contemporary AI

AI Safety Fundamentals | BlueDot Impact (2024)

Technical alignment studies, including inner/outer alignment, interpretability, and safety frameworks

Next Directions

Mechanistic Interpretability

Enhancing mechanistic interpretability research through techniques from model-based cognitive neuroscience and mathematical psychology.

Hierarchical Alignment Frameworks

Developing frameworks for alignment that incorporate universal principles as well as strategies tailored to specific use cases and user personality profiles.

🎯

Current Mission

Seeking impactful roles integrating cognitive science, AI safety, and human-computer interaction. My aim is ensuring AI remains both powerful and fundamentally aligned with human values.

Back to Home

Explore more research areas and projects

Contact

Let's build safe, aligned AI together.

Built with Claude Code — From Research to Reality