Sapient Class Problems

Gustav Svalander ยท

There are two kinds of work, and an agent treats them completely differently. With one, it can take a shot, see whether the shot landed, and keep trying until something works. With the other, it can hand you an answer that looks right and have no way to tell. What separates the two isn't how hard the work is.

We build infrastructure for agent systems, so this is the line we think about most. It marks what you can safely hand to an agent and what still has to stay with you. The test for which side a piece of work falls on holds up better than "easy" versus "hard": ask whether the answer can be checked.

Two kinds of problem

Some problems come with a built-in answer key. You can confirm a solution is right without having to be the genius who found it. A key opens the lock or it doesn't. A puzzle piece fits or it doesn't. A sum can be added up again. A word can be looked up in the dictionary. When a check like that exists, and the check is cheaper than solving the problem from scratch, an agent can throw many attempts at the problem, keep the ones that pass, and discard the rest. Cheap to try, cheap to verify, easy to scale. This is most of what people mean when they say agents will automate a task.

Other problems have no answer key. You can produce an answer, but nothing tells you it was the right one, no matter how capable you are. These are the problems where checking is as hard as solving, or impossible in principle. They don't get easier when you add more compute, because the thing you're missing isn't effort. It's a way to know you're right.

That second kind is the work that stays scarce. It's worth being precise about why, because the reasons are specific and they recur everywhere.

Why some answers can't be checked

There are four situations where the check simply isn't available. You'll recognize all of them from real work.

The goal itself is unclear or contested. Someone asks you to cook the best dinner. Best by what, taste, cost, how healthy it is, how fast it's on the table? Until that's settled, there's no target to hit, and every choice you make was already shaped by the answer you quietly assumed. You can cook a flawless meal and still have made the wrong one. Nothing about the food will tell you, because what counts as good came from your own choice of what to measure. Call it the Specification property.

The answer depends on something that did not happen. Was canceling the trip because of the clouds the right call? The version of the day where you went anyway doesn't exist. You can't observe it, so you can't compare against it. You can reason about it, but the reasoning is exactly the thing in question. The honest answer is a judgment, not a measurement, and it always will be. This is the Counterfactual property.

Someone benefits from misleading you. When the information you're working from comes from a party who gains if you get it wrong, more information doesn't help. What you're reading isn't a neutral sample of reality, it's a move chosen to steer you. Gathering more of it just gives the other side more room to shape what you see. Haggling over a price, a poker bluff, someone covering their tracks, they all live here. That's the Adversarial property.

The act of deciding changes the situation. Point to one person in the group as the source of the tension and the group reacts, that person adapts, and the thing you were trying to assess is now different because you assessed it. There's no neutral way to "just report" your conclusion, because reporting it is itself an action with consequences that feed back into what you were measuring. Name this the Reflexive property.

None of these is a tooling gap. You can't sense your way out of them with a better instrument, because the missing information doesn't exist yet, was generated by the assumption you're testing, is controlled by an adversary, or is changed by the act of looking. We call this whole region the sapient class: the problems where competence can be exercised but never confirmed from the outside.

This is also the line between narrow and general

The terms "narrow" and "general" intelligence get thrown around with almost no shared meaning. A reader has heard "AGI" mean everything from a chatbot to a superintelligence. Here's a definition that earns its keep, because it's anchored to something concrete: the reach of verification.

A capability is narrow, Artificial Narrow Intelligence (ANI), to the degree a sound, affordable check can score its work, against a clear success criterion, on observable outcomes, through a channel nobody is gaming. A capability is general, Artificial General Intelligence (AGI), to the degree it works beyond where checking reaches, where it has to set its own criterion, weigh outcomes it can't observe, and stay sound when the channel is adversarial or when judging changes what's judged. Narrow doesn't mean weak. A capability can be narrow and still vastly superhuman, because narrow marks where verification reaches, not how good the work gets. Both are matters of degree, not a yes/no, and the line moves as our checks get better.

The consequence is the part worth holding onto. Piling up checkable competence does not carry you across that line. A system can be superhuman at every scorable task in existence and still produce nothing in the sapient class, because that class is defined by the absence of the very thing every checkable advance relies on. Narrow and general are opposite sides of that boundary, not points on one road. More benchmark wins move you further along the narrow side. They don't move the line.

Designing for abundance

So here's the shape of the next few years. Checkable work gets abundant. Agents will saturate it, because it's exactly the work you can attempt cheaply and confirm cheaply. The sapient class does not get abundant, because no amount of cheap attempts helps when nothing can score them.

That reshapes where human attention is worth spending, and how you build with agents.

Point agents at the checkable, and let them exhaust it. Wherever a real check exists, hand it over. Many attempts against a sound check is precisely what agents are good at, and it's a poor use of a person's time.

Spend your own judgment on the sapient class. Deciding what to optimize, weighing a path against the one you didn't take, reading an adversary, making a call that changes the room: this is the work that stays yours. It's not a gap waiting to be automated. It's the part that was always the point.

Don't let a confident answer hide an unanswerable question. The most expensive mistake, for a person or an agent, is a polished answer to a question that had no check behind it. The answer looks right and nobody can tell otherwise until it's too late. The discipline is simple: before trusting an answer on a hard problem, ask whether anything could have told you it was wrong. If yes, push toward that. If no, treat the answer as a judgment, name the assumption it rests on, and keep the uncertainty visible instead of papering over it.

This is foundational for the agents themselves, not just for the people directing them. A narrow system that can tell a checkable problem from an uncheckable one knows when to keep trying and when to stop. Without that line it grinds on a question no effort can settle, or dresses a guess as an answer and reports it with full confidence. Knowing where verification runs out is how an ANI recognizes the edge of its own competence instead of mistaking it for more room to push. The definition isn't only how we describe its limits from the outside. It's what the system needs to see them itself.

So look at what's left once the checkable work is gone. Choosing what's worth doing in the first place. Making the call when there's no precedent and no way to test it beforehand. Reading a person who's trying to read you. Acting in a situation that shifts the moment you act. Taking responsibility for an outcome you can't prove was right until long after you've committed to it. This is the work that has no answer key, and that's exactly why it stays with us.

It's also, plainly, the best kind of work. It's the part that takes judgment instead of grinding, the part where being right is an act of nerve rather than a passed test, the part you'd actually want credit for. The checkable work was never the satisfying part; it was the part we tolerated to get to this. Agents clearing it away doesn't shrink what's left for people. It concentrates us on the only work that was ever really ours.