← Back to project top Project Evolution

The v1.0 Milestone

The release of v1.0 marks a significant shift in how LLMs connect to the Windows desktop. We have condensed the surface, hardened the trust contract, and moved closer to meaning-first interaction.

graph LR
    subgraph v015["v0.15: Cluttered Surface"]
        A[65 Specialized Tools]
        B[Coordinate Guessing]
        C[Manual Perceptions]
    end

    subgraph v100["v1.0: Semantic Core"]
        D[28 Unified Dispatchers]
        E[World-Graph by Default]
        F[Auto-Perception Guards]
    end

    v015 --> v100

    classDef old fill:#fde2e2,stroke:#c0392b,color:#5c1f1f;
    classDef new fill:#e8f1ff,stroke:#3b6db3,color:#183257;
    class A,B,C old;
    class D,E,F new;
The shift

From implementation to intention.

In earlier versions, the agent had to reason about the implementation of the UI. v1.0 removes this friction, allowing the agent to focus on its goal.

Key Evolution: Semantic Condensation

Shrinking the surface (65 → 28 tools) for better LLM UX.

We have merged dozens of specialized functions into semantic dispatchers. This is a major LLM UX improvement: by reducing "Tool Fatigue" and context bloat, we provide a cleaner, more intuitive interface for AI agents. This leads to more robust decision-making, fewer hallucinations, and a significantly lower token overhead for every interaction.

Dispatcher approach

keyboard, scroll, and browser_eval handle unified families of actions.

Reduced Fatigue

Fewer top-level choices mean fewer hallucinations and more precise tool selection.

Key Evolution: World-Graph Standardisation

Entity-based interaction by default.

Anti-Fukuwarai v2 (World-Graph) is now the primary surface. The agent no longer clicks coordinates; it acts on affordances discovered via semantic discovery.

desktop_discover

See the desktop as a collection of interactive entities, not just pixels.

desktop_act

Interact using short-lived, server-guarded leases.

Key Evolution: Auto-Perception

Enforcing the RPG contract.

Reactive Perception Graph (RPG) principles are now automated. The server monitors target identity and modal obstructions automatically, ensuring the agent's actions are grounded in current reality.

Operational Wins

Background Capabilities and Robustness.

v1.0 introduces background input injection (WM_CHAR), allowing agents to type and interact without stealing the foreground focus. This makes workflows smoother and more resilient to user interference.