← Back to project top Why screenshots are not enough

Reactive Perception Graph

By the time an LLM acts, the world may already be different. Reactive Perception Graph is one way to deal with that.

flowchart LR
    subgraph A["Snapshot-and-Act"]
        A1["Observe screen"]
        A2["Think"]
        A3["Act on old assumption"]
        A1 --> A2 --> A3
        AX["World changed"] -.-> A3
    end

    subgraph B["Reactive Perception Graph"]
        B1["Observe target"]
        B2["Store provisional state"]
        B3["Dirty / stale signals"]
        B4["Validate lease"]
        B5["Run guards"]
        B6["Execute or block"]
        B1 --> B2 --> B3 --> B4 --> B5 --> B6
    end

    A3 --> C["Unsafe action"]
    B6 --> D["Safer action contract"]

    classDef bad fill:#fde2e2,stroke:#c0392b,color:#5c1f1f;
    classDef good fill:#e5f6ea,stroke:#2e8b57,color:#123d28;
    class C bad;
    class D good;
The problem

Seeing and touching are separated by time.

Many LLM agents implicitly follow a loop like this: observe the interface, think, then act. That sounds harmless. In a dynamic interface, it is fragile.

What can change?

  • the user focuses another window
  • a modal appears
  • the UI re-renders
  • the target moves or disappears

What breaks?

The model still acts on the assumptions formed at observation time, even though those assumptions are no longer valid at action time.

Tiny accident story

Correct intent, wrong target.

Suppose an LLM wants to type hello into Notepad. It observes Notepad, decides where to type, another window comes to the front, and it sends hello anyway.

The agent may execute the intended action correctly, but on the wrong target.

That is not mainly an intelligence failure. It is a stale-assumption failure.

Core claim

External state should be treated as provisional.

Reactive Perception Graph is a layer that treats external state as provisional and re-checks the assumptions behind action before the action fires.

Important clarification

RPG is not a screenshot cache. It is a different contract between the agent and the world.

Four ideas

The parts that make the contract work.

flowchart TB
    L["Lens\nWhat am I watching?"]
    P["Provisional state\nWhat do I currently believe?"]
    G["Guard\nIs this action still safe?"]
    T["Lease\nCan I still trust this target?"]
    X["Action"]

    L --> P
    P --> G
    P --> T
    T --> G
    G --> X

    W["World changes"] -. "marks dirty" .-> P
    W -. "can revoke trust" .-> T

    classDef core fill:#eef4ff,stroke:#3b6db3,color:#183257;
    classDef edge fill:#fff6df,stroke:#b8860b,color:#5a4300;
    class L,P,G,T core;
    class X edge;

Provisional state

Keep not only what the agent believes, but how trustworthy that belief still is.

Lens

A watchpoint on something the agent currently cares about.

Guard

A safety check before action when the environment may have drifted.

Lease

A temporary trust contract for an external target.

Before action

Execute should be the final step, not the default step.

flowchart TD
    S["See target"] --> P["Issue lease"]
    P --> Q["Keep state as provisional"]
    Q --> R{"World changed?"}
    R -- "No" --> U["Action proposed"]
    R -- "Yes" --> T["Mark dirty / stale"]
    T --> U
    U --> V{"Lease valid?"}
    V -- "No" --> W["Refresh view"]
    V -- "Yes" --> X{"Guards pass?"}
    X -- "No" --> Y["Block or recover"]
    X -- "Yes" --> Z["Execute action"]

    classDef safe fill:#e7f8ec,stroke:#2e8b57,color:#173d29;
    classDef risk fill:#fff4d6,stroke:#b8860b,color:#5a4300;
    classDef stop fill:#fde8e8,stroke:#c0392b,color:#5c1f1f;
    class Z safe;
    class T,W risk;
    class Y stop;
const lease = issueLease(target);
const state = rememberAsProvisional(target);

if (!validateLease(lease, state)) {
  return refresh();
}

if (!guardsPass(state)) {
  return block();
}

return execute();
Beyond the desktop

This is a broader contract problem.

Browser agents

A DOM observed earlier may no longer match the live page.

Workflow or API agents

A previously fetched resource handle may no longer be valid.

Embodied agents

An object seen a moment ago may no longer be where the agent assumes it is.

Validation

The next thing that matters is evidence.

To validate this direction, the project still needs to measure:

  • unsafe action rate
  • re-observation count
  • token-heavy observation count
  • task success rate
  • recovery steps