Social Intelligence & AI Alignment

Understanding how the same abilities that enable cooperation also enable manipulation

Visualizing Multi-Layer Nested Beliefs

This visualization demonstrates the complexity of nested belief states—the foundation of both cooperation and deception in human and artificial intelligence.

Nested Beliefs Interactive Demo Interactive Demo

Explore these multi-agent nested belief scenarios to experience the complexity of nested mental states fundamental to theory of mind. Each scenario tests your ability to track what different characters believe about other characters' beliefs—a crucial skill for both human social interaction and AI alignment.

How It Works:
1 Choose a scenario that interests you
2 Read the story carefully—track who knows what
3 Answer questions about nested beliefs ("What does X think Y believes?")
Choose Your Scenario
The Watch Swap

Track beliefs about a moving object as it's secretly moved multiple times with layers of deception

Moderate Complexity
Sarcastic Pancakes

Navigate the complexities of misunderstood sarcasm and layered social misinterpretations

Challenging
Portland Confusion

Untangle multiple layers of misdirection and intentional geographic confusion

Expert Level

From Human Theory of Mind to AI Alignment

Just as humans with enhanced theory of mind can choose cooperation or manipulation, AI systems with ToM capabilities face the same crossroads. This is not a theoretical concern—it's an existential challenge.

An AI that can model human beliefs at multiple levels could predict and manipulate human behavior with unprecedented sophistication. My research on human social cognition provides crucial insights for navigating this challenge.

LLM Mentalizing Framework

I am currently building an evaluation suite based on these scenarios and others, to systematically assess LLM social cognition—identifying emergent capabilities relevant to both deceptive capabilities and situational awareness.

Bridging Psychology & AI Safety

From human apophenia to AI hallucinations, from social cognition to alignment—my interdisciplinary approach offers unique insights for building safer, more predictable AI systems.

Explore My Full Research