Notes · LMM Technologies

The second half of the architecture

If today's models are manufactured instinct, the part still missing is learning — and the measure of it is how little data it takes.

A companion note argued that much of what we call machine learning is better understood as manufactured instinct: the same compression of environmental structure into priors that evolution performs for living things, run at great expense on a shorter clock. If that is right, it describes only the bottom layer. Instinct is what an organism is born with. It is not the whole of a mind, and it is not even the most interesting part. The part that does the work we actually admire — adjusting to a world that keeps changing, after only a little experience of it — is learning, and it sits on top.

The two are easy to blur because the field calls both of them learning. But they are different faculties with different economies. Instinct is paid for once, in bulk, before the individual arrives — billions of examples, enormous compute, a long offline process that yields a fixed prior. Learning is paid for continuously, in tiny installments, from whatever the world happens to present. A child touches a hot stove once. An athlete revises a movement after a single failed repetition. The remarkable thing is not that they learn but how little they need to — and the reason they need so little is that the instinct is already there. Learning is cheap precisely because it is not starting from nothing.

This reframes a problem that usually arrives wearing the costume of a defect. In applied machine learning the complaint is always the same: not enough data. The clinical recordings are too few, the rare condition has a dozen examples, the decisive event has happened twice. We treat this as a shortfall to engineer around. But scarcity is not an accident of particular datasets. It is the signature of the regime where learning, rather than instinct, is the thing being asked for. Where data is abundant, you are manufacturing a prior. Where it is scarce, you are being asked to learn — to do the hot-stove thing, to pull a rule from almost nothing by leaning on structure you already hold. The data was never going to be plentiful. If it were, it would not be learning.

Where data is abundant, you are manufacturing a prior. Where it is scarce, you are being asked to learn.

It helps to separate three faculties rather than two. There is instinct — the structure pressed into a model's weights by large-scale training. There is memory — holding specific past episodes and consulting them when a similar moment returns. And there is learning proper — folding new experience back into the prior so the next response is permanently different. These are not the same thing, and the field is unevenly good at them. We have become very good at manufacturing instinct. We are passable at memory, mostly by retrieving past examples and setting them beside the present one. We are still poor at the third — sample-efficient updating of a strong prior from a handful of examples, the one thing biological learning makes look effortless.

In motion this is concrete rather than abstract. A foundation model trained on a large, diverse corpus of human movement has the instinct: it knows, in the only sense a model can, that gait is periodic, that joints have limits, that balance is continuous. Asking it to recognize one patient's particular compensation from a few recorded sessions is not asking it to learn how bodies move. It is asking it to learn this body, quickly, on top of everything it already holds. That is the hot-stove problem in clinical clothing, and it is the part that is genuinely unsolved — not the prior, but the adaptation on top of it.

There is a clean test of whether instinct can stand in for learning, and it is the rare event. We met it most sharply not in motion but in another domain we have been applying the same architecture to — a structured signal that unfolds in time, where the moments that matter most are the ones with the fewest examples in the record. The temptation there is to manufacture the missing experience: train a generator on what you have and synthesize the rest. It works for the typical case and fails exactly where it counts. A generator reproduces an ordinary stretch convincingly because it has seen thousands of them. It produces a smoothed average of the rare, decisive event because it has seen it twice. The model then validates beautifully on data that has quietly hallucinated the very thing you needed it to know.

That failure is not a nuisance. It is the boundary itself, drawn sharply. Instinct interpolates the typical; it cannot conjure the rare. Only exposure — actual experience of the unusual case — supplies it. This is the precise line between the two faculties: a manufactured prior can give you everything common in the world it was built from, and nothing genuinely new. The rare event is where learning has to be real, because no amount of prior cleverness substitutes for having been there.

So the frontier is not quite where the headlines put it. It is not larger priors, more parameters, more of the instinct we already know how to manufacture. It is the second half of the architecture: the learning layer — the part that adapts from little because it stands on a great deal. And it has the useful property of being indifferent to domain. A model that can learn one body from a few sessions, and a model that can learn one rare regime from a few instances, are solving the same problem in different clothing — structured signal, scarce where it counts, a strong prior underneath. Instinct was the half we knew how to build. Learning is the half worth building next.

Essay: The second half of the architecture

The second half of the architecture