The v1.8 Milestone
v1.4 made delivery trustworthy — "did my input arrive?". The v1.5–v1.8 series push the same idea two steps further: from delivery to completion ("did the command actually finish, and did it succeed?"), and into apps that have almost no clickable surface at all.
graph LR
subgraph v120["v1.2: Response deepened"]
B[as_of, confidence]
end
subgraph v130["v1.3: Memory added"]
D[working / episodic]
E[semantic / procedural]
end
subgraph v140["v1.4: Delivery verified"]
G[Typed error codes]
I[Delivery verification]
end
subgraph v1518["v1.5–v1.8: Completion & reach"]
J[excel VBA bridge]
K[terminal exit code]
L[race-free verify]
M[idle dormancy]
end
v120 --> v130 --> v140 --> v1518
classDef stable fill:#e8f1ff,stroke:#3b6db3,color:#183257;
classDef fresh fill:#d7f2ed,stroke:#1f8a70,color:#0e3b30;
class B,D,E,G,I stable;
class J,K,L,M fresh;
From "my input arrived" to "the work is done".
v1.4 closed the silent-failure gaps in sending: a click reached a live listener, a fill landed its value, a background keystroke entered the target. But the input arriving is not the same as the work finishing. Sending npm run build is delivery; knowing it finished — and whether it exited 0 or 1 — is completion. And some applications, like a spreadsheet, keep their real logic in cells and macros, not in buttons you can click at all. The v1.5–v1.8 series address both.
The excel tool — run VBA, not just write formulas.
v1.5 introduces the excel tool: author and run VBA macros against a live Excel instance over COM — the headline differentiator against assistants that write formulas but cannot execute VBA.
excel({ action: "run_vba",
code: 'Sub Demo()\n Range("A1").Value = "Hello"\nEnd Sub' })
action='run_vba' writes a Sub into a managed Trusted Location and runs it; action='check_access_vbom' is a read-only preflight for the one-time trust setup. The bridge structurally bypasses the VBA Editor UI — no UIA tree walk, no menu navigation, no coordinate clicks. After a one-time setup (node scripts/enable-access-vbom.mjs), macro execution is a single tool call. The point is reach: operate the application through the interface it actually exposes to code.
terminal until:{mode:'exit'} — real completion, real exit code.
The older completion modes infer the answer: quiet waits for output to fall silent, pattern waits for a marker you embed. Both struggle with the everyday idiom some-task; echo DONE matched by DONE — the marker also appears in the echoed command line, so a run could report "done" before the command produced a single byte.
terminal({ action: "run", windowTitle: "pwsh", input: "npm run build",
until: { mode: "exit", shell: "powershell" } })
// → completion: { reason: "exited", exitCode: 0, elapsedMs: … }
mode:'exit' removes the guesswork: the server appends a completion marker whose printed form differs from its typed form, so it can never match the echoed command (even across multiple lines), and it returns the real process exit code. Pass shell explicitly — auto-detection cannot see a shell inside SSH or WSL and returns ExitModeShellAmbiguous rather than guess wrong. On classic console (conhost) shells, exit mode now uses a native console paste instead of the clipboard, so your copied content is left intact and multi-line commands no longer drop characters.
Verification that does not race, and a direct keyboard path.
- Race-free visual verification.
desktop_actconfirms a touch by checking whether the target window repainted. When another visual tool ran on the same monitor, Windows could refuse the second concurrent capture and the signal degraded to indeterminate. All screen-capture now flows through one shared owner per monitor, so the verifier and browser/visual tools share a subscription instead of racing — and the observation now populates on the ordinarydesktop_discover→desktop_actflow. - A first-class
keyboardexecutor. Text inputs that expose a value pattern advertisekeyboardalongsideuia, so the agent can inject text directly into RichEdit / Document controls where a name-based UIA requery would fail.
Calmer to leave running.
Idle-aware dormancy and a diagnostic log (v1.7.1): when nothing is happening, background sensor work winds down to keep idle CPU (and fan noise) low, and a diagnostic event log makes sudden-death and slow-path issues observable after the fact.
A deliberate-dwell emergency stop (v1.7.2): the move-to-corner failsafe now requires the cursor to dwell, so a fast drive-by no longer kills the server by accident — while the deliberate gesture still stops it instantly.
A desktop_act hint after typing into native fields.
Type into a native UI text field with keyboard(action='type') and the response now carries an additive advisory pointing at the lease-based desktop_act flow, with a ready-to-run example. desktop_act adds lease verification, modal-blocking detection, and an attention diff that a bare keyboard:type skips. The hint is suppressed for browser/web content and for surfaces where keyboard is genuinely the right tool, so it only appears when desktop_act is the better path — and it is purely additive.
Existing callers stay untouched.
Completion modes are opt-in on terminal(action='run'); the default behaviour is unchanged. The desktop_act advisory is additive — responses are identical when no hint applies. The excel tool is new surface, not a change to any existing tool. The new capabilities are paid for only by callers who ask for them.