Project Evolution

The v1.8 Milestone

v1.4 made delivery trustworthy — "did my input arrive?". The v1.5–v1.8 series push the same idea two steps further: from delivery to completion ("did the command actually finish, and did it succeed?"), and into apps that have almost no clickable surface at all.

graph LR
    subgraph v120["v1.2: Response deepened"]
        B[as_of, confidence]
    end

    subgraph v130["v1.3: Memory added"]
        D[working / episodic]
        E[semantic / procedural]
    end

    subgraph v140["v1.4: Delivery verified"]
        G[Typed error codes]
        I[Delivery verification]
    end

    subgraph v1518["v1.5–v1.8: Completion & reach"]
        J[excel VBA bridge]
        K[terminal exit code]
        L[race-free verify]
        M[idle dormancy]
    end

    v120 --> v130 --> v140 --> v1518

    classDef stable fill:#e8f1ff,stroke:#3b6db3,color:#183257;
    classDef fresh fill:#d7f2ed,stroke:#1f8a70,color:#0e3b30;
    class B,D,E,G,I stable;
    class J,K,L,M fresh;

The shift

From "my input arrived" to "the work is done".

v1.4 closed the silent-failure gaps in sending: a click reached a live listener, a fill landed its value, a background keystroke entered the target. But the input arriving is not the same as the work finishing. Sending npm run build is delivery; knowing it finished — and whether it exited 0 or 1 — is completion. And some applications, like a spreadsheet, keep their real logic in cells and macros, not in buttons you can click at all. The v1.5–v1.8 series address both.

Key Evolution: Reach (v1.5)

The `excel` tool — run VBA, not just write formulas.

v1.5 introduces the excel tool: author and run VBA macros against a live Excel instance over COM — the headline differentiator against assistants that write formulas but cannot execute VBA.

excel({ action: "run_vba",
        code: 'Sub Demo()\n  Range("A1").Value = "Hello"\nEnd Sub' })

action='run_vba' writes a Sub into a managed Trusted Location and runs it; action='check_access_vbom' is a read-only preflight for the one-time trust setup. The bridge structurally bypasses the VBA Editor UI — no UIA tree walk, no menu navigation, no coordinate clicks. After a one-time setup (node scripts/enable-access-vbom.mjs), macro execution is a single tool call. The point is reach: operate the application through the interface it actually exposes to code.

Key Evolution: Completion (v1.8)

`terminal until:{mode:'exit'}` — real completion, real exit code.

The older completion modes infer the answer: quiet waits for output to fall silent, pattern waits for a marker you embed. Both struggle with the everyday idiom some-task; echo DONE matched by DONE — the marker also appears in the echoed command line, so a run could report "done" before the command produced a single byte.

terminal({ action: "run", windowTitle: "pwsh", input: "npm run build",
           until: { mode: "exit", shell: "powershell" } })
// → completion: { reason: "exited", exitCode: 0, elapsedMs: … }

mode:'exit' removes the guesswork: the server appends a completion marker whose printed form differs from its typed form, so it can never match the echoed command (even across multiple lines), and it returns the real process exit code. Pass shell explicitly — auto-detection cannot see a shell inside SSH or WSL and returns ExitModeShellAmbiguous rather than guess wrong. On classic console (conhost) shells, exit mode now uses a native console paste instead of the clipboard, so your copied content is left intact and multi-line commands no longer drop characters.

Key Evolution: Acting reliably (v1.6–v1.7)

Verification that does not race, and a direct keyboard path.

Race-free visual verification. desktop_act confirms a touch by checking whether the target window repainted. When another visual tool ran on the same monitor, Windows could refuse the second concurrent capture and the signal degraded to indeterminate. All screen-capture now flows through one shared owner per monitor, so the verifier and browser/visual tools share a subscription instead of racing — and the observation now populates on the ordinary desktop_discover → desktop_act flow.
A first-class keyboard executor. Text inputs that expose a value pattern advertise keyboard alongside uia, so the agent can inject text directly into RichEdit / Document controls where a name-based UIA requery would fail.

Key Evolution: Operational trust (v1.7)

Calmer to leave running.

Idle-aware dormancy and a diagnostic log (v1.7.1): when nothing is happening, background sensor work winds down to keep idle CPU (and fan noise) low, and a diagnostic event log makes sudden-death and slow-path issues observable after the fact.

A deliberate-dwell emergency stop (v1.7.2): the move-to-corner failsafe now requires the cursor to dwell, so a fast drive-by no longer kills the server by accident — while the deliberate gesture still stops it instantly.

Key Evolution: A nudge toward the safer path (v1.8)

A `desktop_act` hint after typing into native fields.

Type into a native UI text field with keyboard(action='type') and the response now carries an additive advisory pointing at the lease-based desktop_act flow, with a ready-to-run example. desktop_act adds lease verification, modal-blocking detection, and an attention diff that a bare keyboard:type skips. The hint is suppressed for browser/web content and for surfaces where keyboard is genuinely the right tool, so it only appears when desktop_act is the better path — and it is purely additive.

Compatibility

Existing callers stay untouched.

Completion modes are opt-in on terminal(action='run'); the default behaviour is unchanged. The desktop_act advisory is additive — responses are identical when no hint applies. The excel tool is new surface, not a change to any existing tool. The new capabilities are paid for only by callers who ask for them.

View Full Changelog Read the v1.4 Milestone

Why these together

v1.4 made sending trustworthy. v1.5–v1.8 make finishing trustworthy and extend the agent's reach: a terminal command reports when it finished and what it returned; Excel is driven through VBA over COM, not by poking at the ribbon; visual verification stops racing; and idle dormancy plus a dwell-gated failsafe make the server calmer to leave running.

Release span

The work landed across v1.5.0, v1.6.0, the v1.7 series, and v1.8.0. The full sequence is in the changelog.