Philosophy

The case for boring automation

Why Pollen uses heuristics and a sprinkle of AI — not a browser agent — and ends up faster, cheaper, and more reliable than the hype.

If you've been anywhere near the "AI agents" conversation in the last year, you've heard this pitch: an autonomous agent looks at your screen, reads the page, reasons about what to do, and clicks its way through any website you point it at. Natural language in, task completed out. No selectors. No scripts. No integrations.

It's a compelling demo. It's also, for the most common automation task on the web — filling a form — almost always the wrong tool.

Pollen takes the opposite bet. We wrote a Chrome extension that captures form data on one site and fills it into a form on another, across arbitrary pages we've never seen before. We considered building it as an autonomous agent. We decided not to. This post explains why.

The one-line summary: use AI exactly where it earns its keep, and no further.


What autonomous browser agents promise

The pitch writes itself. You tell an agent, "Fill the onboarding form on site B with the same data I just submitted on site A." The agent:

  1. Looks at your screen (or reads the DOM)
  2. Reasons about which fields mean what
  3. Clicks, types, scrolls, submits

Under the hood, it's an LLM in a tight loop with a browser driver — Claude's computer use API, Browser Use, Skyvern, OpenAI Operator, Stagehand, Lavague. Pick your flavor. The interface is natural language; the backend is a model that generates actions.

When it works, it looks like magic.

What autonomous browser agents actually cost you

When it doesn't work — which is most of the time, at scale — here's what you're paying:

Latency. A single agent run is an LLM loop. Each step is: screenshot → tokenize → reason → output an action → wait for the page → repeat. A single form takes 30 to 120 seconds. Multiply that by 100 submissions and you're watching paint dry.

Money. Every step bills tokens. Computer-use-style agents send large screenshots into the context window; DOM-reading agents send chunks of serialized HTML. Real-world cost per form fill lands somewhere between $0.10 and $1.00, depending on model and page complexity.

Non-determinism. LLMs are probabilistic. Run the same prompt twice, get different actions. An agent might:

You don't find out until the submission is corrupted on the other end.

Silent failure. When a script fails, it throws. When an agent fails, it returns a plausible-looking report of success. You discover the problem downstream, when someone asks why half your records are blank.

Debuggability. Ask "why did the agent do that?" and the answer is, best case, a chain-of-thought transcript. Worst case, a shrug. You can't breakpoint a probability distribution.

Bot detection. Agents type at superhuman speeds, don't move their mouse naturally, and fetch the same page in the same pattern every run. CAPTCHA providers are getting better at flagging this. Good luck when your monthly invoice shows up.

Security. Most agent architectures route your credentials through a cloud service that then automates a browser to log in as you. That's two extra hops of trust for every password.

Zero caching by default. Every run is a fresh LLM trip, even for the same site you've automated a thousand times. There's nothing to learn from past runs without extra engineering.

Put together: you're paying a lot to run a slow, unpredictable, unaudited, security-questionable process that often produces the wrong answer. For form filling.

The boring alternative

Here's what Pollen does instead.

The extension is a Chrome Manifest V3 side panel plus a content script. When you visit your source site, you click Capture. A deterministic DOM walk extracts every input, its label (resolved via <label for>, aria-label, wrapping <label>, placeholder, and a few more heuristics — all written down as code you can read), its type, its options if any, and its current value.

Then you navigate to the target site. You click Analyze. The extension reads the target form the same way.

Now we have two structured lists: source fields with values, target fields waiting to be filled. This is the moment AI earns its keep — we send these two lists (not screenshots, not full HTML, just the structured schemas) to Claude with a single classification prompt:

Here are N source fields. Here are M target fields. Return a JSON array of {sourceFieldIndex, targetSelector, confidence} pairs.

Claude is excellent at this. Semantic matching between "E-mail" and "contact email address" is exactly what LLMs are good at. It's a one-shot task. Tokens are ~500 in, ~200 out. Cost: about $0.005. Time: about 2 seconds.

We cache the result. Next time you go from the same source to the same target, no AI call at all.

Then we fill. This part is completely deterministic:

const el = document.querySelector(mapping.targetSelector);
el.value = value;
el.dispatchEvent(new Event("input", { bubbles: true }));
el.dispatchEvent(new Event("change", { bubbles: true }));
el.dispatchEvent(new Event("blur", { bubbles: true }));

That's it. No LLM, no reasoning, no uncertainty. If the selector matches, the field fills. If it doesn't, we log it. Filling 50 fields takes 50 to 200 milliseconds.

Where heuristics are actually better than AI

The naive take is "AI is the smart way; heuristics are the dumb way." That's backwards for the specific problems we're solving. Consider:

Detecting multi-entity forms. Some forms contain data for N users in one submission: users[0].name, users[0].email, users[1].name, users[1].email. An agent would have to infer this structure from the page each time and probably get it wrong on edge cases. Pollen's detector is three regexes. It's right every time, in microseconds.

Reading listing pages. A table of 500 rows or a grid of 50 cards? An agent would try to scroll and reason its way through. Pollen walks <table> and <tbody>, or finds parents with 3+ same-shaped children, and extracts in one pass.

Checkbox parsing. When a target checkbox receives "yes", "true", "on", "1", "checked", what does it mean? An agent will get it right most of the time, wrong sometimes. Pollen has a 10-line function with a hardcoded list of falsy strings. Covered by 25 unit tests.

Triggering React forms. React tracks input state internally. Setting .value isn't enough — you have to dispatch synthetic input/change events the right way. Every AI agent stumbles on this eventually. Pollen does it the same way every time.

Reliable selectors. #username is a stable ID. div > div > form > div:nth-child(3) > input:nth-child(2) isn't. An agent has no idea which path it's picking. Pollen prefers the ID, falls back to a name attribute, and only uses a structural path as a last resort — a choice made once, cached forever.

These are not exotic edge cases. These are the bread and butter of form automation. And in each one, a deterministic implementation beats a model call on every axis: correctness, speed, cost, debuggability.

Where AI is actually better than heuristics

The one place we can't beat AI is arbitrary semantic matching across sites we've never seen. Example:

A hard-coded rule would need to anticipate every language, every phrasing, every synonym. Claude just knows these mean the same thing. That's what we use it for. Once. Per source-target pair.

If a power user wants to adjust a mapping — say, "actually point our email field at the target's work_email, not personal_email" — that's a dropdown in the UI, not a re-prompt.

The numbers

For a typical form fill with ~10 fields:

Pollen Autonomous browser agent
First fill (with AI) ~2s + 200ms ~60s
Subsequent fills ~200ms ~60s (every time)
First fill cost ~$0.005 ~$0.25
Subsequent fill cost $0.00 ~$0.25 (every time)
Deterministic? Yes, after mapping No
Cacheable? Fully Not without rearchitecting
Debuggable? Every decision is in code LLM chain-of-thought
Fails loudly? Yes (selector miss logs) Rarely (silent wrong fill)

After ten runs of the same flow, Pollen has made one AI call. An agent has made ten and will make ten more tomorrow.

The reliability number

We wouldn't claim "100% reliable for every website on the internet" — that's marketing talk, and the internet is a hostile environment (iframes, shadow DOM, anti-automation heuristics, custom React widgets, CSP restrictions). What we will claim is this:

Once a mapping is established, the fill step itself is deterministic. The same inputs produce the same DOM mutations, every time, measured in milliseconds.

That's a much stronger guarantee than an agent can ever give, because an agent's fill step is itself a sampled token sequence. Pollen's variance lives entirely in the mapping step — where you can see it (we show confidence scores), override it (click to re-map), and freeze it (it's cached).

You don't have to trust the model at runtime. You just have to trust it for thirty seconds during setup.

When would you still want a full agent?

Honestly:

If you're filling forms, submitting data, scraping listings into a target flow, or repeatedly doing the same multi-site handoff — every one of those problems is bounded, structured, and repeat. The structure is what heuristics eat for breakfast.

The takeaway

The debate is not "AI vs no AI." It's "how much AI, used where?"

Autonomous browser agents say: all AI, everywhere, every time. Pay in time, money, and trust.

Pollen says: heuristics for the mechanical work, a sprinkle of AI for the one classification step that genuinely needs it, and caching so you never pay twice. Fast enough to be imperceptible. Cheap enough to be free after the first run. Deterministic enough to trust unattended.

The hype cycle will swing back. When it does, boring automation will still be here — quietly finishing in 200 milliseconds while the agent in the next tab is still reading the page.

Want to try it?

Pollen is launching soon. Join the waitlist for early access.

Join the waitlist