Reactive Perception Graph
By the time an LLM acts, the world may already be different. Reactive Perception Graph is one way to deal with that.
flowchart LR
subgraph A["Snapshot-and-Act"]
A1["Observe screen"]
A2["Think"]
A3["Act on old assumption"]
A1 --> A2 --> A3
AX["World changed"] -.-> A3
end
subgraph B["Reactive Perception Graph"]
B1["Observe target"]
B2["Store provisional state"]
B3["Dirty / stale signals"]
B4["Validate lease"]
B5["Run guards"]
B6["Execute or block"]
B1 --> B2 --> B3 --> B4 --> B5 --> B6
end
A3 --> C["Unsafe action"]
B6 --> D["Safer action contract"]
classDef bad fill:#fde2e2,stroke:#c0392b,color:#5c1f1f;
classDef good fill:#e5f6ea,stroke:#2e8b57,color:#123d28;
class C bad;
class D good;
Seeing and touching are separated by time.
Many LLM agents implicitly follow a loop like this: observe the interface, think, then act. That sounds harmless. In a dynamic interface, it is fragile.
What can change?
- the user focuses another window
- a modal appears
- the UI re-renders
- the target moves or disappears
What breaks?
The model still acts on the assumptions formed at observation time, even though those assumptions are no longer valid at action time.
Correct intent, wrong target.
Suppose an LLM wants to type hello into Notepad. It observes Notepad, decides where to type, another window comes to the front, and it sends hello anyway.
The agent may execute the intended action correctly, but on the wrong target.
That is not mainly an intelligence failure. It is a stale-assumption failure.
External state should be treated as provisional.
Reactive Perception Graph is a layer that treats external state as provisional and re-checks the assumptions behind action before the action fires.
Important clarification
RPG is not a screenshot cache. It is a different contract between the agent and the world.
The parts that make the contract work.
flowchart TB
L["Lens\nWhat am I watching?"]
P["Provisional state\nWhat do I currently believe?"]
G["Guard\nIs this action still safe?"]
T["Lease\nCan I still trust this target?"]
X["Action"]
L --> P
P --> G
P --> T
T --> G
G --> X
W["World changes"] -. "marks dirty" .-> P
W -. "can revoke trust" .-> T
classDef core fill:#eef4ff,stroke:#3b6db3,color:#183257;
classDef edge fill:#fff6df,stroke:#b8860b,color:#5a4300;
class L,P,G,T core;
class X edge;
Provisional state
Keep not only what the agent believes, but how trustworthy that belief still is.
Lens
A watchpoint on something the agent currently cares about.
Guard
A safety check before action when the environment may have drifted.
Lease
A temporary trust contract for an external target.
Execute should be the final step, not the default step.
flowchart TD
S["See target"] --> P["Issue lease"]
P --> Q["Keep state as provisional"]
Q --> R{"World changed?"}
R -- "No" --> U["Action proposed"]
R -- "Yes" --> T["Mark dirty / stale"]
T --> U
U --> V{"Lease valid?"}
V -- "No" --> W["Refresh view"]
V -- "Yes" --> X{"Guards pass?"}
X -- "No" --> Y["Block or recover"]
X -- "Yes" --> Z["Execute action"]
classDef safe fill:#e7f8ec,stroke:#2e8b57,color:#173d29;
classDef risk fill:#fff4d6,stroke:#b8860b,color:#5a4300;
classDef stop fill:#fde8e8,stroke:#c0392b,color:#5c1f1f;
class Z safe;
class T,W risk;
class Y stop;
const lease = issueLease(target);
const state = rememberAsProvisional(target);
if (!validateLease(lease, state)) {
return refresh();
}
if (!guardsPass(state)) {
return block();
}
return execute();
This is a broader contract problem.
Browser agents
A DOM observed earlier may no longer match the live page.
Workflow or API agents
A previously fetched resource handle may no longer be valid.
Embodied agents
An object seen a moment ago may no longer be where the agent assumes it is.
Proving the reflex arc.
Before building the full "nervous system," we spent considerable time in a trial-and-error phase with a Minimum Viable Product (MVP). The goal was to prove the reflex arc—the immediate, low-level loop that protects an action—without the overhead of a complex graph.
The MVP Scope
- Cheap Fluents: High-signal, low-cost facts (window presence, foreground status, rect stability).
- Basic Guards: "Fail-closed" predicates for identity and coordinate validity.
Intentional Omissions
We consciously excluded "heavy" sensors like full UIA tree traversals or continuous screenshot diffing to find the right balance between latency and safety.
This stage was crucial for finding the right balance between latency and safety. It taught us that most "accidents" could be prevented by just checking a few Win32-level fluents right before the motor command fires.
Why this exists.
RPG is motivated by concrete failure modes that share the same structure: assumptions valid at observation time are no longer valid at action time.
Common issues
- Focus theft: Another window pops up.
- Modal insertion: A dialog box blocks the path.
- Window drift: The target moved slightly.
Identity risks
- Entity replacement: The process restarted.
- Delayed action: The agent waited too long to click.
The next thing that matters is evidence.
To validate this direction, the project still needs to measure:
- unsafe action rate
- re-observation count
- token-heavy observation count
- task success rate
- recovery steps