← Back to project top Meaning-first interaction

Beyond Coordinate Roulette

UI automation should not feel like a flat sequence of positional guesses. It should know what it is touching, why that action makes sense, and whether that is still true right now.

flowchart TB
    subgraph A["Coordinate roulette"]
        A1["Looks clickable"]
        A2["Guess position"]
        A3["Wrong target"]
        A1 --> A2 --> A3
    end

    subgraph B["Beyond Coordinate Roulette"]
        B1["See entities"]
        B2["Affordances"]
        B3["Lease trust"]
        B4["Guard action"]
        B5["Semantic diff"]
        B1 --> B2 --> B3 --> B4 --> B5
    end

    A3 -. "move beyond this" .- B1

    classDef old fill:#fde2e2,stroke:#c0392b,color:#5c1f1f;
    classDef new fill:#e8f1ff,stroke:#3b6db3,color:#183257;
    class A1,A2,A3 old;
    class B1,B2,B3,B4,B5 new;
In one sentence

Operate on meaning, not on blind guesses over coordinates.

Beyond Coordinate Roulette is the public-facing name for a simple idea: interface automation should be grounded in entities, affordances, and semantic outcomes, not just rough positional clicks.

The failure mode

Too much automation still behaves like this.

Guess

Click somewhere around here.

Hope

Hope this is still the same button.

Patch

Patch the failure with another guess.

That loop can look automated, but it is not truly grounded.

What this pushes toward

A different model of interaction.

See entities

Treat the UI as a world containing things, not just a bitmap.

See affordances

Know not only where something is, but what can be done with it.

Keep trust bounded

Do not assume a target is valid forever. Trust should be temporary and revocable.

Compare semantically

After action, ask what changed in meaning, not only whether pixels moved.

Inside this project

Not a single feature. A direction.

Inside desktop-touch-mcp, this shows up through pieces like:

  • desktop_see
  • desktop_touch
  • entity leases
  • guarded execution
  • semantic diffs
  • event-first invalidation
Do not hide coordinate guessing behind a more polished API.
Relationship to RPG

Shape of interaction versus survival over time.

Beyond Coordinate Roulette

Focuses on the shape of interaction: entities, affordances, and semantic outcomes.

Reactive Perception Graph

Focuses on the time dimension: provisional state, dirty/stale tracking, leases, and guards.

Beyond Coordinate Roulette asks: what should UI interaction be grounded in?
RPG asks: how should that grounding survive time and change?

One line to keep

Good automation knows more than where to click.

Good automation does not just know where to click. It knows what it is touching, why it can touch it, and whether that is still true right now.