Beyond Coordinate Roulette
UI automation should not feel like a flat sequence of positional guesses. It should know what it is touching, why that action makes sense, and whether that is still true right now.
flowchart TB
subgraph A["Coordinate roulette"]
A1["Looks clickable"]
A2["Guess position"]
A3["Wrong target"]
A1 --> A2 --> A3
end
subgraph B["Beyond Coordinate Roulette"]
B1["See entities"]
B2["Affordances"]
B3["Lease trust"]
B4["Guard action"]
B5["Semantic diff"]
B1 --> B2 --> B3 --> B4 --> B5
end
A3 -. "move beyond this" .- B1
classDef old fill:#fde2e2,stroke:#c0392b,color:#5c1f1f;
classDef new fill:#e8f1ff,stroke:#3b6db3,color:#183257;
class A1,A2,A3 old;
class B1,B2,B3,B4,B5 new;
Operate on meaning, not on blind guesses over coordinates.
Beyond Coordinate Roulette is the public-facing name for a simple idea: interface automation should be grounded in entities, affordances, and semantic outcomes, not just rough positional clicks.
Too much automation still behaves like this.
Guess
Click somewhere around here.
Hope
Hope this is still the same button.
Patch
Patch the failure with another guess.
That loop can look automated, but it is not truly grounded.
A different model of interaction.
See entities
Treat the UI as a world containing things, not just a bitmap.
See affordances
Know not only where something is, but what can be done with it.
Keep trust bounded
Do not assume a target is valid forever. Trust should be temporary and revocable.
Compare semantically
After action, ask what changed in meaning, not only whether pixels moved.
Not a single feature. A direction.
Inside desktop-touch-mcp, this shows up through pieces like:
desktop_seedesktop_touch- entity leases
- guarded execution
- semantic diffs
- event-first invalidation
Do not hide coordinate guessing behind a more polished API.
Shape of interaction versus survival over time.
Beyond Coordinate Roulette
Focuses on the shape of interaction: entities, affordances, and semantic outcomes.
Reactive Perception Graph
Focuses on the time dimension: provisional state, dirty/stale tracking, leases, and guards.
Beyond Coordinate Roulette asks: what should UI interaction be grounded in?
RPG asks: how should that grounding survive time and change?
Good automation knows more than where to click.
Good automation does not just know where to click. It knows what it is touching, why it can touch it, and whether that is still true right now.