Jelly48 icon Jelly48

We shipped a win condition that was geometrically impossible

June 11, 2026

We're building a second game on the Jelly48 engine: a soft-body Suika-like called Jellygon, coming to iOS. Polygon pieces — triangle through hendecagon, one side per tier — drop into a cup, squish, and fuse when same-tier pieces touch. Making the top-tier piece, two tier-7 decagons fusing and bursting, is the game's watermelon moment. We had tuned the piece sizes for feel, validated stability with soak tests, and built the celebration animation. Then a simple question came up in review: is there actually enough room in the cup to do this?

The answer was no. Not "very hard" — geometrically impossible. No human or bot would ever have seen the win screen. This post is the procedure that caught it, the fix, and the experiment that turned the tuning constant into a dose-response curve we can now adjust with data instead of vibes.

Jellygon screenshot: two jelly pieces caught mid-fusion as a single peanut-shaped blob inside the cup
Two tier-5 jellies caught mid-fusion by the screenshot harness — the merge this whole question hangs on.

The napkin math said "tight but fine" — the napkin was wrong

Piece sizes grow geometrically: scale = 0.62 · g^tier, with growth g = 1.26 at the time. Measuring the actual spawned rings (not circumradius estimates — flat-sided polygons are narrower than their bounding circle):

tierwidthareacumulative 0..tier
53.8010.925.4
64.8217.743.1
75.9528.671.6
87.7245.9

The cup is 8.4 wide with 9.8 of usable height: 82.3 area. One of every tier 0–7 — the worst-case build inventory — is 71.6, i.e. 87% occupancy. Tight but achievable, we concluded, since soft bodies pack better than rigid ones and every merge in a cascade frees ~19% of the merging pair's area.

That analysis was arithmetically right and practically wrong, because the binding constraint isn't total area — it's that the 7→8 leg concentrates the load into three enormous, badly-packing blobs. To fuse the second tier-7 you must hold the first one (28.6) plus two tier-6s (17.7 each) at the moment they chain: 64 of 82 area in three pieces, before counting any of the working inventory that built them. Three giant rounded polygons in a rectangle pack far worse than the 87% headline suggested.

76543lose line
Not an illustration — the engine itself, settling one piece of every tier 0–7 (the minimum inventory a top-tier piece can be assembled from) at the original tuning, largest first. The five smallest pieces have nowhere to go but up: the pile rides 2.3 units above the lose line. Note there are no same-tier pairs here — touching same-tier pieces fuse, so a real board needs separation room on top of this.

Bots as the falsifier

Paper math can't settle a packing question; play can. We wrote a headless player against the real engine — same physics, same rules, 60 Hz — that aims and drops deliberately: target the highest same-tier partner (merges fire on contact), otherwise park in a size-sorted slot. One game runs in about two seconds.

First experiment, 12 runs across three strategy variants: every run reached tier 7 in ~45 drops, and every run died 10–20 drops later while building the second 7 — exactly at the predicted bottleneck. Zero wins.

So we built the real harness: 11 strategies × 100 seeds, with bug oracles watching every frame (particle-velocity explosions, NaN positions, pieces tunneling out of the cup, merge-reject storms). The strategies deliberately span competent to abusive:

STRATS=greedy,flat SEEDS=1-100 cargo run --release --example playtest

At the original growth value the verdict was unanimous: 0 wins, with a characteristic tier-histogram wall at 7. The win condition was dead code.

The fix is one constant — and the response curve is steep

The honest lever is the growth ratio itself: smaller tier-to-tier growth shrinks the late-game blobs relative to the cup. We stepped it down and re-swept. First attempt, 1.26 → 1.24: still 0 wins in 200 runs. We mispredicted this — the napkin said 1.24 would open the door. It barely moved it.

At 1.22 the first wins in the project's history appeared — and fittingly, the first strategy to crack it was noisy, the one that plays like a human, misses included. The deterministic bots kept tiling themselves into the same doomed layouts; aim error explored better boards.

That steepness deserved a proper experiment instead of one more guess. We made the growth factor an env hook and swept seven values × 4 win-capable strategies × 50 seeds — 1,400 games, ~30 minutes on a laptop, 7 parallel jobs:

0%10%20%30%40%0%1.241%1.234%1.22 ★8%1.2121%1.2021%1.1938%1.18tier-size growth factor (★ = shipped)
Bot win rate vs growth factor — 200 games per bar, roughly doubling per 0.01 step.
growthbot win rate (200 runs each)
1.240%
1.231%
1.22 (ship)4%
1.218%
1.2021%
1.1921%
1.1838%

A clean, monotonic dose-response curve, roughly doubling per 0.01 step. Two practical consequences. First, difficulty is now a calibrated dial: when human playtest data says "too hard," we know exactly what one notch buys. Second, the steepness is a warning — a constant you'd happily nudge by 0.05 "to taste" swings the win rate by an order of magnitude. Tune in 0.01 steps, against human data. Bots are a lower bound on skill: they do no lookahead and no inventory sequencing, so a deliberate human should multiply that 4% several-fold. That lands engaged players at a win every handful of good games — a repeatable climax, not a one-time trophy (in Jellygon the top piece bursts and the run continues; the score game is chasing repeat bursts).

The same harness found a real bug in this game

To check the methodology generalizes, we ported the harness to Jelly48 — the soft-body 2048 you can play right now — with swipe-policy strategies instead of aim positions: corner discipline, input-mashing every 12 frames, gravity sloshing, lateral pendulum. 350 runs.

It flagged something the test suite never had: velocity spikes of 47–96 units/s — pieces visibly snapping when pinched in packed boards under brisk input — in ~5% of runs. Steady play never exceeds ~34. The existing regression test runs 25 seconds; the spikes first appear around sim-minute four.

Two things made this finding actionable within the hour:

Takeaways

Jellygon ships on iOS soon. Its sibling — same engine, same bots watching over it — is free in your browser right now.

Play Jelly48 →