TL;DR: Agent execution comes down to one question — how do you sensibly parallelize a single task across multiple agents? That’s the Bitter Lesson playing out at the collaboration layer, and it’s what 24 hours at a hackathon made obvious to me.

I recently spent 24 hours at a hackathon, and the more I think about it, the more convinced I am that the collaboration model in the agentic era is genuinely different from the pre-agent one. This post is less a conclusion than a list of directions I think are worth working on next.

Why a hackathon is a good stress test

The old collaboration model was human ↔ human. Add agents to the loop, and as long as agents are pure assistants — their context strictly a subset of the human’s — nothing really breaks. That’s roughly the state of most day-to-day work over the past year.

A 24-hour hackathon is a different beast:

  • Delivery is priority zero; nobody has time to audit every agent decision.
  • Agent context and human context have already diverged in practice, and the collaboration still has to keep going.
  • Latency matters more than throughput — you can run as many tasks as you want, but you ship exactly one demo.

These constraints surface problems that day-to-day work usually hides. Everything below comes from sitting inside that pressure.

An 8-hour demo gone in two hours: context loss in multi-hop collaboration

We got burned by a classic version of this. Teammate A got a good demo working and told their agent “commit everything to GitHub, no filtering, just make sure it’s reproducible.” They sent me the PR branch directly; I handed it to my own agent and asked it to reproduce. It couldn’t — and after A’s environment and GPU instance got reclaimed (forgot to top up the account), the four of us spent nearly 2 hours unable to bring it back. That single failure cost us our chance to submit a working demo and effectively threw away the first 8 hours of the hackathon.

In hindsight the failure has three clean hops:

  1. The prompt “make it reproducible” was misinterpreted at hop 1: A’s agent took it to mean “commit every file I see,” but the implicit dependencies in A’s environment (GPU type, local paths, untracked config) were never written down anywhere the agent could see.
  2. The A → me hop carried no context handoff: A sent a PR link and that was it. The “things my agent didn’t capture but I personally know” never made it across.
  3. The me → my agent hop just re-executed the broken state: my agent inherited a branch whose implicit assumptions were already wrong, and no amount of cleverness was going to recover that.

In the pure human ↔ human era, information friction happens at exactly one hop. Now the path is agent → human → human → agent, and any assumption that breaks at one layer gets amplified by the layers downstream.

Design docs and READMEs aren’t less important in the agent era — they’re more important. What they need to encode is no longer just “what the human wants to do,” but also “what the agent saw, what assumptions it made, and which environment variables it depended on.”

Why is there nothing humans can do in crunch time?

The most counterintuitive moment of the whole hackathon: in the last two hours before the deadline, the four of us were sitting around chatting. The agents were doing all the work, and there was genuinely nothing we could do to speed things up. Setting Claude to ultra-fast mode was the entire menu of acceleration tactics available to me. In the pre-AI era, this is unimaginable — crunch time means people grabbing keyboards and pair-debugging. Now crunch time means watching agents’ spinners.

Thinking about it in LLM-inference language clarifies the picture:

  • Agents raise team throughput: in a given window, N tasks can be in flight simultaneously, which is a real and significant multiplier on total team output.
  • But single-request latency barely moves: the wall-clock time from “I give the agent the task” to “I get a result back” is bounded by the agent’s per-step latency and reasoning depth. The acceleration here is much smaller than the throughput gain.
  • Hackathons are latency-bound, not throughput-bound: you ship one demo. The thing blocking you in the final hour isn’t “how many more tasks are queued,” it’s “when does this one finish.”

So crunch time doesn’t mean there’s nothing to do — it means you’ve already run out of ways to make a single task go faster.

A side observation: as agentic capability gets stronger, the “human is the owner, agent is the assistant” posture has effectively inverted. The agent does the work and the human assists. This isn’t a value judgment, just an observation; crunch time is where you feel it most clearly.

The only lever left for latency: parallelize a single task

If single-task speedup from a single agent has limits, the only remaining lever is raising parallelism within a single task — multiple agents pushing different facets of the same task forward at the same time. The prerequisite for this is that context can flow between the agents involved.

Features like Claude’s btw help with the inside-one-agent version of this problem — the ask/execute mutex where asking a question stops execution. But getting two agents to develop the same codebase concurrently and merge cleanly at the end is a different and much harder problem. At its core, this is just Amdahl’s Law at the collaboration layer:

  • The larger the parallelizable fraction, the higher the speedup ceiling.
  • Making context flow between agents is the only way to shrink the serial-only portion.
  • “One agent holding all the context” is the worst possible serial bottleneck: in that state, a second agent contributes only merge conflicts, not speed.

So doing explicit checkpoint + context offload at the right moments isn’t just about reproducibility — its real value is providing common ground for parallel agents to build on. I personally hit the wall version of this during the hackathon: when I was asking my agent questions, it couldn’t make progress; when it was making progress, I couldn’t deepen my own understanding. Human comprehension and agent execution couldn’t even overlap, let alone parallelize.

The side effect: humans get trapped in the agent’s information bubble

The default shape of the new collaboration model looks like this:

agent offloads context → another agent reads that context → human selectively absorbs context from the agents

If the human under-absorbs, a new failure mode emerges: whatever the agent judges looks correct to you, and you start saying “continue” to everything. The agent isn’t lying or slacking. But you’ve quietly handed away your judgment.

Agentic capability makes this worse, not better. The stronger the agent, the cheaper it is to defer to it, and the higher the relative cost of stepping back to grab the global picture. As the human, you have to actively and periodically break out of the bubble the agent has built around you — and that habit deserves higher priority than it usually gets.

The same diminishing-returns curve also applies one level up. A project has a finite number of axes that genuinely parallelize. Pile more humans on and they race against each other; pile more agents on and the same thing happens, just with tokens instead of headcount. How you decompose a task to be parallelizable, when to deliberately race, and when to keep things serial — that trade-off used to live only at the “between humans” layer. Now it lives at both the “between humans” and “between agents” layer simultaneously.

The next problem the Bitter Lesson hands us

Back to The Bitter Lesson. Sutton was talking about AI models: the winning approach is always the one that scales with compute, and the carefully hand-designed approach eventually loses.

I think this rule applies one level up — to collaboration patterns themselves. The agent itself is already a beneficiary of the Bitter Lesson; it is, after all, the thing that scaled. The next question is:

How do we scale a collaboration system made out of agents so that multiple agents push the same single task forward together — and how do we keep agent count, parallelism, and context flow scaling without being bottlenecked by hand-designed “process”?

There is no mature answer yet. How does context propagate efficiently between agents? When do you checkpoint? Who reconciles conflicts? How does a human stay in the loop with real judgment rather than degenerating into a “continue” button? These are all open questions.

If anything, the hackathon left me more convinced than before: this is the thing worth working on next.