I spend a fair amount of my spare time running AI models on my own hardware. Somewhere in there I got stuck on a question I couldn’t put down: when people say an AI “has a personality,” is that a real thing, or just a number we made up and started believing? So I did what I usually do with a question like that. I started testing it.
That turned into a little run of research at the Idea Fields Institute over the last month, three short papers in all, and I figured I’d share what I actually found in plain English. No tech degree required.
How do you give a personality test to a robot?
Psychologists describe human personality with five broad traits, the “Big Five”: how outgoing you are, how calm or anxious, how agreeable, how organized, how curious. There are free, public-domain questionnaires for measuring them, pulled from the International Personality Item Pool. My setup was about as simple as it sounds: take those quizzes, hand them to a pile of AI models, and score the answers the same way you would for a person.
I ran the smaller models at home with Ollama (including on a gaming PC that mostly earns its keep playing games), used a cloud account for the giant models I can’t fit on my own machines, and spent a few dollars of API credit to fold in Claude as a commercial example. Then I asked every model the same questions over and over. That repetition matters: if a model truly “has” a trait, it ought to answer consistently.
Paper 1: The small models flunk the test
My first paper asked a narrow version of the question: if you shrink a model down so it runs on a normal computer (a process called quantization), does its “personality” change? The more useful thing I stumbled into was that the test barely worked at all on small models. The answers just did not hang together well enough to call it a real measurement. Before you can ask whether an AI’s personality is stable, the ruler you are measuring it with has to be trustworthy, and on small models it wasn’t.
Paper 2: Personality shows up, but only when models get big
So I went bigger. The second paper gave four different Big Five quizzes (a short 20-question one all the way up to a 120-question one) to 42 models, from tiny ones that fit on a phone to giant cloud systems roughly a thousand times larger. The logic was simple. If these quizzes measure five real, separate traits, then different quizzes should agree about a given model, and the five traits should pull apart from each other.
The quizzes did agree with each other, which was reassuring. But whether the five traits actually separate turned out to depend on the size of the model. The smallest models answer almost everything on a single “good vibes versus bad vibes” axis, so they don’t really have five traits, just one wearing five name tags. The bigger the model, the more the five traits came apart into genuinely distinct dimensions, and the largest ones clearly showed all five.
And here is the part that was interesting: a single model doesn’t have a stable personality at all. Ask one model the same quiz over and over, and its answers wobble in a way that never settles into a consistent profile. The five-trait “personality” only appears when you line up many models and compare them, like a pattern that exists across a crowd but not inside any one person in it.
Paper 3: What happens when the AI “thinks” first?
A lot of newer models can “reason” before they answer, writing out a private train of thought first, and on many of them you can switch that on or off. So for the third paper I gave the same model the same quizzes twice, once with thinking off and once with thinking on, and compared its answers to itself. I included ten downloadable models, one open model that can’t turn its thinking off (OLMo 3), and Claude Haiku 4.5 as a commercial check.
Two things came out of it. First, thinking changes the answers, and in a consistent direction: the model reports being calmer and less outgoing when it stops to think. That showed up in the downloadable models, cloud models, and again in Claude, so it isn’t a quirk of one system. Second, and this is the important one, thinking still does not give the model a real, consistent personality. Its repeated answers still don’t hang together. Reasoning changes what the model says about itself, a bit like a person giving a more composed answer after a deep breath, without changing the fact that there is no single stable character underneath.
So what did three papers actually teach me?
The big takeaway is that an AI’s “personality” is not a thing sitting inside the model. It’s a comparison. It shows up when you rank a model against other models, and it gets sharper as models get bigger. A small model doesn’t have five traits, it has one mood with five labels. A big model grows into the five-trait shape. And turning on reasoning changes the model’s self-presentation without giving it an inner self to present.
The practical version, if you ever see a headline like “this AI is agreeable” or “that one is neurotic”: treat it as a ranking against other models, on a big model, using a long quiz, not as a fixed inner character you would meet again tomorrow. For comparing and ranking models, these tests genuinely work. For telling you who an AI “really is,” they don’t, because there’s no one home in that sense.
To be clear about what this is not: none of it means big AIs have feelings or self-awareness. “Has five traits” is a statement about whether the math separates into five dimensions, not about any inner life. And it doesn’t mean small models are bad at their real jobs. They just aren’t well described by a five-trait personality profile.
Where to find everything
All three papers are written up in plain English over at the Idea Fields Institute, and the code and every answer I collected are linked right from those pages:
- Paper 1: Is LLM personality an artifact of deployment? (quantization and reliability)
- Paper 2: When do language models have five personality traits? (scale and construct validity)
- Paper 3: Does reasoning give a language model a personality? (the thinking toggle)
If you want to poke holes in any of it, the data and code are all there to explore, so go for it.



