dotdotduck

Press Cmd/Ctrl+K to do anything in your product.

Command palette + voice + long-press selection + inline AI + DOM-grounded agent in one SDK. The opposite of a chatbot widget — your product's verbs sit where users already work, not behind a sidebar.

繁體中文 →

01 · Command palette — every feature, behind one panel

Cmd+K palette open: /introduce, /theme, /language, /immersive_translate, #find-on-page, docs: search, Go to entries — all in one list

Ctrl/⌘+K opens it. Your registered commands sit alongside Ask AI in the same list — switch theme, change language, open billing, find a customer, all addressable from one place.
Prefix routing — /command, @entity, order:, #tag — gives the user one convenient entry point that solves whatever they're stuck on, no matter where they are in the product.
Layered customisation — CSS-variable theming, Skill SDK (Script / Prompt for most hosts; Action / Surface / Panel for advanced cases), or wire your existing host features straight in as palette items.
Zero built-in commands — what shows up in the palette is entirely yours to decide. The SDK ships infrastructure, you ship the vocabulary.

02 · WebAgent — operates the page, not a sidebar chatbot

agent narrating its next step in the subtitle bar with space-continue / double-tap-exit / esc-cancel hint and confirm buttons

DOM-grounded autonomous loop. Reads the visible page, picks one tool at a time, narrates each step in the subtitle bar before it runs.
Built-in action catalog — navigate, click, fill_input, ask_user_choice, border, highlight, and friends. Add your own; the LLM picks them.
Space-gated every step. Single tap accept · double-tap reject · Esc cancel. Users see what's about to happen before it happens.
Asks back when ambiguous. ask_user_choice for 2-4 options, ask_user for free text. No silent decisions, no guessing.
Bring your own keys. LLM via OpenAI, Google AI Studio, or a server-side ProxyProvider; per-role routing keeps cheap models on cleanup and the flagship on the agent loop. STT defaults to the browser's Web Speech for zero-setup; swap to Whisper or any vendor via one transcribe(audio) callback.

03 · Inline Agent — select text, AI without leaving the input

floating Edit with AI menu next to a textarea selection: Translate / Improve writing / Fix spelling & grammar / Make shorter / Make longer / Change to professional tone / Explain this

Highlight any text in any <input> / <textarea> / [contenteditable] — a floating toolbar appears below the selection. Pick an action, the result streams back in place of the selection.
Default action set out of the box — Translate, Improve writing, Fix spelling & grammar, Make shorter, Make longer, Change to professional tone, Explain this. Drop the defaults, add your own (/translate-with-glossary, /rewrite-as-email).
Two-column layout option for editor hosts that want a Format column next to an AI column. Optional keyboard shortcuts (e.g. Ctrl+Shift+R to rewrite without opening the menu).

04 · Direct manipulation — gestures you already know

Several physical entry points to send context into dddk. No new vocabulary to learn.

	A · Hold Space — voice in Focus inside an input → fills the input. Anywhere else → goes to the agent. Optional LLM cleanup pass — fillers + punctuation in one shot. STT swappable — defaults to the browser's Web Speech (no SLA, Firefox unsupported). One `VoiceConfig.transcribe` callback swaps in Whisper or any vendor.
	B · Long-press anything — Dwell Long-press any element for ~1s → frame pins around it. Next `Ctrl+K` opens the palette with that element as context. Visual elements (charts, images) ship an auto-screenshot.
	C · Drag a screenshot Click the camera in the palette → drag a rectangle on the page. Captured region attaches to your next Ask AI / agent question. Charts, dashboards, maps — show the AI exactly what you mean.
	D · `/introduce` — guided tour Declarative tours — list of `page` + `subtitle` + `action(tools)` steps. Space advances · Esc / double-Space exits. User reads at their pace. Write your onboarding / feature tour once · replay any time (palette command, proactive prompt, or programmatic).

05 · Mobile — FAB + your own buttons

mobile portrait view: dotdotduck FAB at bottom-right corner, 'Listening — release to send' indicator shown above the FAB while voice is active

Floating action button. Ships out of the box on mobile breakpoints — tap to open the palette, long-press to hold-talk into the agent.
Touch gestures. The same Space-hold / long-press / multi-choice patterns work via touch — tap → palette, long-press → voice, digit keys → option pick.
Your own button. Replace the duck FAB with any host element. Pass a selector or HTMLElement and dddk wires the open / hold-talk handlers onto it — the FAB lives wherever your design wants it (header bar, side rail, brand asset, etc.).
Responsive chrome. Subtitle bar auto-offsets above the on-screen keyboard; palette becomes full-width below 640px; tap targets respect the 44×44 touch-spec.

06 · Proactive — read the signal, ask the right question

subtitle bar showing yes/no prompt 'Your Monday order just shipped — want me to pull the tracking?' and multi-choice 'How should I handle this return?' with options

The agent subscribes to page signals — scroll depth, Dwell time, time-on-page, last interaction — and surfaces an offer in the subtitle bar when conditions match.
Yes / no resolves with Space. Single tap accepts, double-tap rejects. No popup chrome, no layout shift — the whole exchange stays in the subtitle bar.
Multi-choice with 1-9 number keys plus a trailing Other slot that always accepts free text. The user gets out without typing if your options covered it; if not, they can still answer.
Every response emits a typed intent (proactive_accepted / agent_choice with the picked value) so you measure what lands. Not big-data fishing — direct asking and recorded answers.
Customer-service plays out of the box — order just shipped → "Want me to pull the tracking?"; user lingers on the returns page → list three common actions.

07 · Intent stream — every yes / no is a signal, the dashboard writes itself

dotdotduck dashboard: 3 sessions, 2 visitors, 577 events, 118 palette opens, geography panel, top palette items table

dotdotduck dashboard — LLM streaming perf tile (avg TTFT, tok/s, duration) + by-model breakdown table + agent runs summary

Every interaction emits a typed event. Palette opens, voice transcripts, agent answers, accept / reject gestures, Dwell selections, multi-choice picks, even per-LLM-call streaming perf — all flow through one structured stream.
Event types include palette_activated · voice_captured · agent_asked / agent_answered (with latencyMs) · agent_run_started / completed / stopped · agent_pause_decision · agent_llm_call (TTFT, tokens/sec, model) · confirm_action · selection_used · skill_started / finished · agent_feedback. Add your own and they ride the same channel.
Clean behavioural data, no big-data fishing. You learn what users want from what they actually asked + answered, not from inferring through clickstreams.
Bundled dashboard route turns the stream into charts out of the box — yes-rate over time, TTFT / tok-per-sec by model, agent-run completion rate, top palette items, geography — or subscribe in code and pipe to Mixpanel / Amplitude / your own BI.

Why adopt — eight concrete plays

Most customer-service tickets are page-solvable. "How do I X" / "where do I Y" / "track my order" / "change my plan" — the answers all live on your site already; the gap is discoverability. A DOM-grounded agent that operates the page closes that gap. Deflect the easy 70% before they reach a human queue.
Proactive offers convert. Watching scroll · Dwell · time-on-page · last interaction lets the agent ask "want me to pull the tracking?" / "want a recommendation based on what you're looking at?" before the user thinks to ask. Subtitle-bar yes/no resolves in one keystroke — friction is the lowest physically possible. Same surface for cross-sell and upsell plays.
The palette is a UI surface, not just a text list. Each row's detail pane (and PanelSkills inside the palette) can render any Pieces tree — charts, tables, forms, mini-dashboards. That makes the palette a real productivity surface, not just a launcher:
- Finance — AAPL in the palette pulls a live price card + sparkline alongside the row.
- Customer service — type a question; the palette shows the matching FAQ entry with formatted answer inline, not a link to click.
- Tool-type SaaS — pack utilities (regex tester, JSON formatter, unit converter, internal lookup) straight into the palette so users never tab out. Same Ctrl+K, different verbs per product.
Long-press beats "screenshot + describe". With Dwell, the user holds an element, the agent gets selector + auto-screenshot in one gesture — chart, dashboard panel, table row, whatever. Users stop interrupting themselves to take a screenshot, paste it into chat, and write a paragraph explaining what they meant. Intent flows straight from finger to LLM.
Break the language wall with one palette command. Built-in immersive translate renders every paragraph of the current page bilingually side by side — one keystroke turns your English-only docs / knowledge base / product copy into a Chinese / Japanese / Korean / Spanish-readable surface. Batched into a handful of LLM calls per page (a 200-paragraph article costs ~7 calls). For cross-border SaaS, content platforms, or any product serving multiple regions, that's one fewer translation-engineering project on the roadmap.
One SDK instead of stitching six vendors. Palette + agent + inline AI + voice + Dwell + proactive + analytics + immersive translate ship as one install. The conventional alternative is Algolia for search, Intercom for chat, Mixpanel for analytics, Whisper for voice, plus the brittle glue code between them. dddk is one dependency, one theme system, one intent stream.
Yes / no / multi-choice = free RL labels. Every Space-accept and double-Space-reject is a clean, intentional signal — what the user actually wanted vs didn't, said by the user, recorded with the original prompt. No more inferring from clickstream noise. The training set for whatever you fine-tune or evaluate next is already collected.
Voice doesn't stop at the browser. The same Voice + utility LLM shape powers IoT panels, kiosk terminals, service machines, and accessibility-first surfaces for elderly users or anyone who'd rather not type. One mental model across every device that has a microphone.

Status — early stage, read before evaluating

dotdotduck is in active development. It works, but expect rough edges. A few things up front:

Clone the repo to evaluate properly. The bundled docs are useful as a map, but the source is the source of truth. git clone https://github.com/PerhapxinLab/dotdotduck into your project directory and read the code alongside the online docs — that's the recommended way to understand what's actually implemented.
The docs are AI-drafted. They're written and maintained with Claude Code. They stay close to the code by convention, but if something looks wrong, grep the repo before assuming the docs are right.
Found a bug or unclear behaviour? Open an issue at github.com/PerhapxinLab/dotdotduck/issues — one-liners help shape the roadmap.

What the live demo runs (not bundled with the package)

dddk.perhapxin.com doubles as dotdotduck's official landing page AND as the real-world test bed for the package — every release ships first to this site and gets exercised end-to-end before being tagged. The standing challenge: serve the demo well using the smallest viable model at each role, so the same recipe holds up when other teams adopt dddk on a cost budget. Expect the model picks below to keep shifting as smaller checkpoints catch up.

Current stack:

4-axis LLM router (webagent / vision / utility / plan) — host configures one model per role; the bundled demo runs OpenAI gpt-5.4-nano for the main agent loop and planner, gpt-5.4-mini for InlineAgent + voice cleanup.
Speech-to-text → the browser's Web Speech API (the SDK default; fine for demo, no SLA — production hosts wire transcribe with Whisper / Deepgram / etc.)

None of this is baked into @perhapxin/dddk. The package itself ships LLM provider adapters (OpenAI / Google / proxy, plus any OpenAI-compatible vendor via baseURL — e.g. DeepSeek, Qwen, OpenRouter) and a transcribe(audio) extension point. Bring your own keys, models, and ASR vendor — the SDK doesn't lock you in.

Documentation

What's new in v0.1.3 → release notes · migration guide
Full docs → dddk.perhapxin.com/docs
Agent (DOM-grounded loop + InlineAgent + sitemap + Memory) → /dddk/agent
LLM providers + router + adapter registry → /dddk/llm
Skills system + evals → /dddk/skills
Modules (voice / Dwell / inline / immersive translate / proactive / analytics) → /dddk/modules
Toolbox (search + recommend) → /dddk/toolbox
Theming → /dddk/theming

Install

pnpm add @perhapxin/dddk
# or: npm i @perhapxin/dddk

import { DotDotDuck, OpenAIProvider } from '@perhapxin/dddk';
import '@perhapxin/dddk/styles.css';

const dddk = new DotDotDuck({
  llm: new OpenAIProvider({
    apiKey: import.meta.env.VITE_OPENAI_KEY,
    model: 'gpt-5.4-mini',
  }),
  siteName: 'YourSaaS',
  skills: [
    {
      id: 'introduce',
      type: 'script',
      name: 'Tour the app',
      steps: [
        { subtitle: 'Welcome!', action: (t) => t.spotlight('.hero') },
        { subtitle: 'Here is pricing.', action: (t) => t.highlight('.pricing'), waitForUser: true },
      ],
    },
  ],
});

dddk.mount();

Press Ctrl/⌘+K, type /introduce, watch it run. The full quickstart guide covers React / Vue / Svelte / Solid wiring.

Theming

Everything visual reads from CSS custom properties — --dddk-bg, --dddk-accent, --dddk-radius, --dddk-font, and friends. Override at :root or scope inside any wrapper.

:root {
  --dddk-accent: #6366f1;       /* your brand colour */
  --dddk-radius: 10px;
  --dddk-font: 'Inter', system-ui, sans-serif;
}

Dark mode is automatic: [data-theme="dark"] anywhere up the tree, OR @media (prefers-color-scheme: dark) — whichever fires first. Custom modes (sepia, high-contrast, brand-specific) work by overriding the same variables under a new selector.

License

AGPL-3.0-or-later — free	Commercial — paid
✓ Open-source projects ✓ Internal tools ✓ Personal projects ✓ Anything AGPL-compatible Use it, modify it, ship it — publish your modifications under AGPL.	✓ Closed-source products ✓ Commercial SaaS ✓ Anything that can't satisfy AGPL's network-copyleft clause See LICENSE for the full text or reach the maintainer through the repo.

Built by Perhapxin Team