Using human traits to understand and shape AI behavior
Using human traits to understand and shape AI behavior
My research with 20,000+ participants reveals that the Big Five personality traits predict specific failure modes in humans. Extreme trait expressions—whether pathological openness leading to psychosis, antisocial tendencies from low agreeableness, or clinical depression from high neuroticism—offer crucial insights for AI safety. Given that personality traits relate to variation in basic mechanisms that parallel those of AI (e.g., pattern detection, sensitivity to reward and punishment signals, behavioral activation/inhibition systems), personality psychology provides a robust framework to predict and prevent analogous failure modes in AI systems. This interdisciplinary approach bridges decades of psychometric research with cutting-edge AI alignment challenges.
Excessive pattern detection and creativity
Human: Apophenia, psychosis risk
AI Risk: Hallucinations
Reduced cooperation and trust
Human: Social manipulation
AI Risk: Deceptive behavior
Emotional instability and negativity
Human: Depression, anxiety
AI Risk: Pessimistic outputs
Adjust the personality sliders and watch how the AI's response changes
I am currently developing personality-inspired dynamic fine-tuning systems for LLMs by training LoRA modules to various personality trait profiles using multi-source human natural language corpora. This approach would allow users to easily customize the personalities of their LLMs, or enable LLMs to detect users' personalities, predict preferred conversation partner personalities, and adapt accordingly—creating more personalized and effective AI interactions while maintaining safety guardrails.
Building on Anthropic's model organisms approach to alignment research, I am developing personality-based frameworks for creating and studying AI "model organisms"—systems with deliberately calibrated trait profiles that exhibit specific behavioral patterns under stress conditions.
Applications: Personality-informed constitutional AI design, trait-based behavioral prediction, and systematic identification of failure modes before deployment. By understanding how different trait configurations interact with capability levels, we can predict which models require enhanced monitoring and develop targeted interventions.
From human apophenia to AI hallucinations, from social cognition to alignment—my interdisciplinary approach offers unique insights for building safer, more predictable AI systems.
Explore My Full Research