← Back to Blog

Sapient Class Problems

Gustav Svalander · June 5, 2026

A general intelligence knows it already

A definition of general intelligence should feel intuitive. If the line between a narrow mind and a general one is real, it ought to show up in an ordinary day: cooking dinner, canceling a trip, reading the person across the table. You shouldn't need a benchmark or a parameter count to point at it. You should be able to explain it with examples anyone already knows.

Here is one that holds up. The line is whether the answer can be checked, and at what price. Some work tells you when you've got it right, so an agent can try, check, and try again until it lands. Some work could tell you, if you paid for the check. And some work never tells you, at any price. The first kind you can hand to an agent today. The second is a purchasing decision. The third stays with you, and not because it's the harder of the two. We build infrastructure for agent systems, so this is the line we think about most.

Three kinds of problem

Some problems come with a built-in answer key. You can confirm a solution is right without having to be the genius who found it. A key opens the lock or it doesn't. A puzzle piece fits or it doesn't. A sum can be added up again. A word can be looked up in the dictionary. When a check like that exists, and the check is cheaper than solving the problem from scratch, an agent can throw many attempts at the problem, keep the ones that pass, and discard the rest. Cheap to try, cheap to verify, easy to scale. This is most of what people mean when they say agents will automate a task.

Other problems have an answer key that costs something. Was the used car a good buy? A full inspection would have told you; you just didn't pay for it. Does the new recipe actually taste better, or do you only think so? A blind taste test with friends settles it, at the cost of an evening. These problems feel uncheckable in the moment, but they aren't. The check exists, it has a price, and the honest question is whether it's worth buying. Most of what gets called unverifiable in daily life and in companies is this kind: unverified at the price anyone was willing to pay.

The third kind has no answer key at any price. You can produce an answer, but nothing could ever tell you it was the right one, no matter how capable you are or how much you spend. These problems don't get easier when you add more compute or more budget, because the thing you're missing isn't effort or money. It's a way to know you're right.

That third kind is the work that stays scarce. It's worth being precise about why, because the reasons are specific and they recur everywhere.

Why some answers can't be checked

Every check is the same small machine. The world gives off evidence, someone or something produces it, it travels to you, and you read it against a standard for what counts as right. Sometimes your verdict even feeds back into the world. A check fails when one part of that machine fails, and walking through the parts gives you the complete list of ways it can happen. You'll recognize all five from real life.

The goal itself is unclear or contested. Someone asks you to cook the best dinner. Best by what, taste, cost, how healthy it is, how fast it's on the table? Until that's settled, there's no standard to read the evidence against, and every choice you make was already shaped by the answer you quietly assumed. You can cook a flawless meal and still have made the wrong one. Nothing about the food will tell you, because what counts as good came from your own choice of what to measure. Call it the Specification problem.

The answer depends on something that did not happen. Was canceling the trip because of the clouds the right call? The version of the day where you went anyway doesn't exist. The world never emitted the evidence, so there is nothing to read. You can reason about it, but the reasoning is exactly the thing in question. The honest answer is a judgment, not a measurement, and it always will be. This is the Counterfactual problem.

Someone benefits from misleading you. When the information you're working from comes from a party who gains if you get it wrong, more information doesn't help. What you're reading isn't a neutral sample of reality, it's a move chosen to steer you. Gathering more of it just gives the other side more room to shape what you see. Haggling over a price, a poker bluff, someone covering their tracks, they all live here. That's the Adversarial problem.

The act of deciding changes the situation. Point to one person in the group as the source of the tension and the group reacts, that person adapts, and the thing you were trying to assess is now different because you assessed it. There's no neutral way to "just report" your conclusion, because reporting it is itself an action with consequences that feed back into what you were measuring. Name this the Reflexive problem.

The evidence existed, and it's gone. What did the two of you actually say in that argument last spring? The messages are deleted and both memories have drifted. The world emitted the evidence honestly; it just decayed before anyone could read it, and no spend brings it back. Old disputes, cold trails, unrecorded meetings. This is the Erosion problem.

None of these yields to a better instrument, at any price. The missing information doesn't exist yet, was generated by the assumption you're testing, is controlled by an adversary, is changed by the act of looking, or has been destroyed. We call this whole region the sapient class: the problems where no spend buys the check, so competence can be exercised but never confirmed from the outside. Notice how much that definition excludes. The used car and the taste test are not in it. Declaring a problem sapient-class is a strong claim, and it should cost an argument, not a shrug.

This is also the line between narrow and general

The terms "narrow" and "general" intelligence get thrown around with almost no shared meaning. A reader has heard "AGI" mean everything from a chatbot to a superintelligence. Here's a definition that earns its keep, because it's anchored to something concrete: the reach of verification.

A capability is narrow, Artificial Narrow Intelligence (ANI), to the degree a sound check can score its work at a price below producing it, against a clear success criterion, on observable outcomes, through a channel nobody is gaming. A capability is general, Artificial General Intelligence (AGI), to the degree it works beyond where checking reaches, where it has to set its own criterion, weigh outcomes it can't observe, and stay sound when the channel is adversarial or when judging changes what's judged. Narrow doesn't mean weak. A capability can be narrow and still vastly superhuman, because narrow marks where verification reaches, not how good the work gets. Both are matters of degree, not a yes/no, and both are indexed to today's checks.

The consequence is worth stating carefully, because it's easy to overclaim in either direction. Narrow competence does not automatically become general competence by getting bigger. Stacking wins on checkable tasks is progress along the narrow side, and crossing into the sapient class requires something checkable wins don't supply, since that class is defined by the absence of the very thing every checkable advance relies on. At the same time, the line between them is not fixed. It moves, and one of the things that moves it is narrow capability itself, because narrow systems are how better checks get built. The line is real, and the line is mobile, and both facts matter.

The line moves

Checks get cheaper, and every time one does, work crosses the line. A dashcam settles who drifted out of their lane, an argument that used to be one word against another. A sleep tracker settles whether you actually slept badly or just feel like it. An A/B test settles which subject line works, a question marketing teams once decided by taste and seniority. Each of these took a question that lived on judgment and gave it an answer key.

That is the quiet engine under everything else in this post: verification getting cheaper moves problems out of the expensive regime and shrinks what people mistake for the sapient class. It's also a place to invest. Buying better checks, instrumentation, evals, ground truth, records that don't erode, moves your own line outward, and everything that crosses it becomes work an agent can take. The questions that survive every such advance, the ones no instrument and no budget will ever settle, are the sapient class proper, and they are what remains when the line has moved as far as it can.

Designing for abundance

So here's the shape of the next few years. Checkable work gets abundant. Agents will saturate it, because it's exactly the work you can attempt cheaply and confirm cheaply. The sapient class does not get abundant, because no amount of cheap attempts helps when nothing can score them.

That reshapes where human attention is worth spending, and how you build with agents.

Price the check before calling anything impossible. When a problem feels unverifiable, the first question is whether a check could be bought. A trial, an audit, a measurement, a record. If yes, it's a cost-benefit call, not an epistemic crisis. Reserve the strong claim, no check at any price, for the problems that have actually earned it.

Point agents at the checkable, and let them exhaust it. Wherever a real check exists at a real price worth paying, hand it over. Many attempts against a sound check is precisely what agents are good at, and it's a poor use of a person's time.

Spend your own judgment on the sapient class, and know what judgment is. It isn't seeing the unseeable. Good judgment is mostly restructuring: choosing the reversible version of a decision so the road not taken stays observable, naming the standard out loud so the goal stops being contested in silence, distrusting exactly the evidence the other side controls, and being precisely as confident as what's left deserves. Good judges don't perceive what others can't. They move the line, and they tell the truth about where it sits.

Don't let a confident answer hide an unanswerable question. The most expensive mistake, for a person or an agent, is a polished answer to a question that had no check behind it. The answer looks right and nobody can tell otherwise until it's too late. The discipline is simple: before trusting an answer on a hard problem, ask whether anything could have told you it was wrong. If yes, push toward that. If no, treat the answer as a judgment, name the assumption it rests on, and keep the uncertainty visible instead of papering over it.

This is foundational for the agents themselves, not just for the people directing them. A narrow system that can tell a checkable problem from an uncheckable one knows when to keep trying, when a check could be bought, and when to stop. Without that line it grinds on a question no effort can settle, or dresses a guess as an answer and reports it with full confidence. Knowing where verification runs out is how an ANI recognizes the edge of its own competence instead of mistaking it for more room to push. The definition isn't only how we describe its limits from the outside. It's what the system needs to see them itself.

So look at what's left once the checkable work is gone. Choosing what's worth doing in the first place. Making the call when there's no precedent and no way to test it beforehand. Reading a person who's trying to read you. Acting in a situation that shifts the moment you act. Taking responsibility for an outcome you can't prove was right until long after you've committed to it. This is the work that has no answer key, and that's exactly why it stays with us.

It's also, plainly, the best kind of work. It's the part that takes judgment instead of grinding, the part where being right is an act of nerve rather than a passed test, the part you'd actually want credit for. The checkable work was never the satisfying part; it was the part we tolerated to get to this. Agents clearing it away doesn't shrink what's left for people. It concentrates us on the only work that was ever really ours.