Learning before Reasoning

Map and compass How do we learn before we can reason? How do babies learn? Seemingly they learn about the world and learn language starting from almost no basis at all; they learn long before they can apply logic, long before they can understand what we are saying. So how do our brains do that? The short answer is: our brains minimize surprise. Learning both what can be predicted and what is difficult to predict¹. And slowly learning to do so more abstractly² and over longer periods of time.

Minimizing surprise is great for learning statistical or behavioral models, but has two potential pitfalls when learning causal models or intentional models. Some correct ideas might be unintuitive, unfamiliar, or at odds with other ideas, all a cause for surprise. Or we might minimize surprise by finding reasons to explain away what we observe. While our brains like such explanations, these completely ignore that models are only as good as their predictions.

We Learn Mental Models

What we learn are mental models, and we can separate out some broad categories that we use to reason about the world. While I’ll present them clearly delineated, in reality they are much more fluid. As we reason, we’ll draw from all of them and do so seamlessly³.

model	mode	examples
statistical	intuitive	When you listen to the same playlist so many times you know the melody of the next song before it starts. Or how we can be good at sorting chicks or identifying planes (pdf) without being able to explain why.
behavioral	intuitive	Drawn from experience of interacting with the world, including intuitive physics. For example that things fall to the ground. Or knowing what burns, how to behave safely around a fire, how to put it out, etc.
logical	reasoning	About the properties objects have and the rules that govern them; about logic and math. Going from the instance to the general. Like how roundness is important for how a ball rolls. Learning that color, smoothness, hardness, weight, etc. are all independent properties objects might have.
causal	reasoning	About what causes what, about counterfactuals, about what could have happened. Related to behavioral models, but not focused on what happens, but on the mechanics behind it, on why it happens. By knowing what burns and what doesn’t, you can reason about and predict when a fire is safe. By knowing that fire is a sustained oxygen reaction, you can predict what might burn and what might not burn from first principles.
intentional⁴	reasoning	Reasoning about agency and knowing other minds have goals and desires. About guessing intentions behind observed behavior. When seeing two shapes on a screen, we might say one is chasing the other; a bit too quick.

We are Surrounded by Models

Reality is big and messy with so much detail, we must represent it simplified and focus on the key parts if we are to navigate it well. There are so many examples of this.

A map is a model of a city;
A light switch is a model of the lights in a room;
A menu is a model of the food served in a restaurant;
A key is a model of the lock it belongs to;
A history book is a model of past events;
A business plan is a model of a business;
A fossil is a model of a previously alive organism;
A contract is a model of how two or more parties must behave;
Any scientific theory is a model of how some aspect of the world behaves;

“There can no longer be question about whether the brain models its environment: it must.” — Conant & Ashby

Only a good model can be used if we wish to have any grip on reality. This can even be shown mathematically, as is done in this amazing paper by Conant & Ashby (pdf). And this is further elaborated on in this paper (pdf).

The Mind’s Eye

All we observe is light coming in through our eyes; sounds vibrating in our ears; and urges bubbling up from our body. We learn to model these by minimizing surprise. That is how we come to understand and actually see the world around us. We also model ourselves, so we also see ourselves.

Think about the mathematical model for fair coin tosses. We can predict the overall trend a coin should have, but cannot predict each individual toss. (See also wikipedia on prediction.) ↩
For example, it’s not that particular favorite red ball that rolls well, but all round things roll easily. Roundness is abstracted from the particulars as important for rolling. ↩
I’m a computer scientist, not a psychologist, this is my model for what we do inside our heads, based on various sources and some AI research. See also infant cognitive development, mental models, causal models, attribution theory, theory of mind ↩
We are quick to recognize actors in this world even when there are none. Also see this on agenticity. ↩