docx-cli

agent
Security Audit
Fail
Health Warn
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 7 GitHub stars
Code Fail
  • rm -rf — Recursive force deletion command in .claude/skills/weak-agent-test/scripts/stage-competitor.sh
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

CLI for AI agents (Claude, Codex) to read, edit, and comment on .docx files with full format fidelity.

README.md

docx-cli

Watch: Claude filling out and redlining an NDA

A .docx CLI built for AI agents. Leave comments, suggest redlines, and edit Word documents without breaking the formatting or losing content — a human accepts or rejects in Word afterward.

  • Hand a .docx to Claude or Codex and get back a redlined copy with comments — open it in Word, accept or reject as usual.
  • Agents address text by stable locators with character offsets (p3:5-20); humans see normal Word formatting on disk.
  • Custom styles, theme colors, embedded objects — all of it survives. The CLI mutates XML in place rather than re-emitting from a lossy model.

Install

Standalone binary (no Bun required):

curl -fsSL https://raw.githubusercontent.com/kklimuk/docx-cli/main/install.sh | sh

Honors PREFIX (default $HOME/.local/bin) and VERSION (default latest):

PREFIX=/usr/local sh -c "$(curl -fsSL https://raw.githubusercontent.com/kklimuk/docx-cli/main/install.sh)"
VERSION=v0.2.0 sh -c "$(curl -fsSL https://raw.githubusercontent.com/kklimuk/docx-cli/main/install.sh)"

Pre-built binaries are published for linux/x64, linux/arm64, darwin/x64, darwin/arm64, windows/x64.

npm (requires Bun >= 1.3):

bun add -g bun-docx
# or
bunx bun-docx read doc.docx

Quick example: filling out an NDA

The repo includes a Common Paper Mutual NDA template at tests/fixtures/mnda.docx. Below are the primitives an agent would compose to fill in the cover page and leave redline edits — the same flow shown in the video above. Every command was verified end-to-end against the fixture:

# Make a copy first — there's no undo (git is the history; the CLI overwrites in place)
cp tests/fixtures/mnda.docx mnda-filled.docx

# Read the cover-page table so the agent knows what placeholders exist
docx read mnda-filled.docx --from t1 --to t1

# Fill the yellow-highlighted bracketed placeholders
docx replace mnda-filled.docx "Fill in: today's date" "May 6, 2026"
docx replace mnda-filled.docx "fill in state and/or county" "California"
docx replace mnda-filled.docx "fill in state" "California"
docx replace mnda-filled.docx "Fill in, if any." "None."

# Verify nothing's left to fill (bare locator lines, one per match; nothing → exit 0)
docx find mnda-filled.docx '\[(Fill|fill)[^]]*\]' --regex --all

# Flip on tracked changes for the redline pass
docx track-changes mnda-filled.docx on

# Tighten "having a reasonable need to know" in the Use & Protection clause
docx replace mnda-filled.docx \
    "having a reasonable need to know" \
    "with a documented need to know"

# Leave a comment for the human reviewer — addresses an existing span with --at
docx comments add mnda-filled.docx --at p7:0-30 \
    --text "Should we narrow 'representatives' to a named list?"

Open mnda-filled.docx in Word: tracked changes and comments appear in the review pane, ready to accept, reject, or reply. Or run docx track-changes accept mnda-filled.docx --all to bake them in from the CLI.

Use as an agent skill

docx-cli ships as an Agent Skill — one SKILL.md that works across Claude Code, Codex, Pi, and the other harnesses that read the open skill format. The skill teaches the locator model and the redline / comment / fill workflows, then defers to docx <command> --help at runtime, so it can't go stale.

Why a skill? docx-cli is built for the weakest, cheapest agents. In our weak-agent benchmark — 6 real document tasks (fill a contract, redline, comment, restyle, author from scratch), graded against Word renders, 3 runs each — Haiku driving docx-cli completed 4.3/6 tasks versus 0.7/6 for the default Claude skill, at roughly 2.5× fewer tokens; with Sonnet it's 6/6 vs 4/6, with roughly 2x fewer tokens. And every docx-cli output opened cleanly in Word on the first try — it never emits a file the renderer rejects. (Methodology and harness: .claude/skills/weak-agent-test.)

Install

Claude Code — one-line plugin install:

/plugin marketplace add kklimuk/docx-cli
/plugin install docx-cli@docx-cli

Codex — add the marketplace (the plugin's skills auto-discover):

codex plugin marketplace add kklimuk/docx-cli

Pi — one-command install (the pi manifest in package.json pulls in the skill), then invoke /skill:docx-cli:

pi install git:github.com/kklimuk/docx-cli      # global; add -l for a project (team-shared) install
# manual alternative: pi --skill /path/to/docx-cli/skills/docx-cli

Any harness / manual — drop skills/docx-cli/ into your agent's skills directory (e.g. ~/.claude/skills/ or the cross-tool ~/.agents/skills/). On first activation the skill's scripts/bootstrap.sh installs the docx binary (and self-updates a stale one).

Keeping the skill current

The binary is the source of truth: docx info skill prints the canonical SKILL.md for the installed version, and a CI test fails if the committed copy drifts. Regenerate after any change with:

docx info skill > skills/docx-cli/SKILL.md

docx <command> --help is the authoritative contract

Agents: run docx <command> --help before composing a call. Every command's --help is the source of truth for its flags, locator forms, and exact output shape — this README is a map, not the territory. Two more must-reads:

  • docx info locators — the canonical locator grammar (--json for a machine-readable form). The top-level docx --help says it outright: "It is highly recommended to agents to run docx info locators to understand their capabilities."
  • docx info schema — the AST type definitions (--ts for TypeScript source) that read --ast emits.

Command reference

docx <verb> and docx <noun> <verb>. Every command has --help. Two groups: read/query commands print data to stdout; mutate commands change the file (and accept --dry-run, -o/--output PATH, -v/--verbose).

Read & query (print to stdout, never write the file)

docx read    FILE [--from LOC] [--to LOC] [--accepted | --baseline | --current] [--comments]
docx read    FILE --ast                  # JSON-AST instead of Markdown (disables the markdown-only flags)
docx find    FILE QUERY [--regex] [--ignore-case] [--all] [--nth N] [--current | --baseline] [--exact] [--json]
docx find    FILE (--highlight COLOR|any | --color HEX | --bold | --italic | --underline) [--all] [--json]   # find by formatting (no QUERY)
docx wc      FILE [LOCATOR] [--accepted | --baseline | --current] [--json]
docx outline FILE [--style-prefix S] [--json]
docx styles  FILE [--used] [--at STYLEID] [--json]   # the style catalog (not in the body) — what --style NAMEs exist
docx styles  --catalog [--json]                      # built-in styles you can apply on demand (Title, Heading1–9, Quote, …), no FILE needed
docx styles  set    FILE --at STYLEID [--bold --color HEX --size PT --font NAME --space-before PT --indent-left IN …]   # restyle every paragraph/run that uses the style
docx styles  create FILE STYLEID [--type paragraph|character] [--name "…"] [--based-on STYLEID] [--next STYLEID] [formatting]   # define a new custom style
docx render  FILE [--out DIR] [--engine word|libreoffice|auto] [--dpi N] [--pages 1-N] [--format png|jpg]

docx comments      list FILE [--include-resolved] [--thread cN]
docx footnotes     list FILE
docx endnotes      list FILE
docx headers       list FILE
docx footers       list FILE
docx images        list FILE
docx hyperlinks    list FILE
docx track-changes list FILE

docx info schema   [--ts]
docx info locators [--json]

docx read surfaces structural facts the Markdown body can't show as HTML-comment
annotations (<!-- docx:TYPE … -->). These are read-time visibility hints — the
agent can SEE the structure, but the importer drops them (the structure survives
normal edits in place, read --ast is the lossless view, and docx sections /
docx tables … manage it). They're emitted deviation-only
(only when a value differs from the document default, so a plain document stays
clean):

  • Per-paragraph style/spacing/indent — the most common annotation — rides a
    <!-- docx:p pN style="Caption" align="center" space-after="6pt" line-spacing="1" indent-left="0.25in" -->
    note, emitted deviation-only (only the attrs that differ from the style/document
    default). Each attribute maps to the matching edit/insert flag (--style,
    --alignment, --space-before/--space-after, --line-spacing,
    --indent-left/--indent-right/--first-line/--hanging), so an agent reads a
    value and re-applies it. The paragraph's locator rides this note as its leading pN
    token, so an annotated paragraph does NOT also get a bare <!-- pN --> (only
    undeviating paragraphs get the bare locator). Full properties are in read --ast.
  • Section breaks render as <!-- docx:section sN cols="2" type="continuous" -->
    on their own line — never a bare --- (that's a thematic break, and emitting it
    for a section silently turned layout into border paragraphs). A hand-authored
    --- now unambiguously means a thematic break.
  • Page geometry rides a leading <!-- docx:page sN orientation="landscape" size="…in" margins="…in" text-width="…in" --> note when the page deviates from
    US-Letter-portrait-1″ — text-width is the usable column width, and the leading
    sN is the section to re-apply against. A varies="by-section" attribute is added
    when a later section's page setup differs from the leading one — and in that case
    the note fires even if page 1 is plain default Letter-portrait-1″ (it then shows
    just text-width + varies="by-section"), warning that the geometry shown describes
    only the leading section; use read --ast for every section's exact geometry.
    Exact twips are in read --ast (on each
    section break: pageWidth/pageHeight/pageOrientation/margin*). Set it for the
    WHOLE document with docx sections --orientation/--size/--margins (no --at
    every section gets it, so a multi-section doc doesn't leave the trailing section
    behind), one section with docx sections --at sN …, or at create time; under
    track-changes it records as one <w:sectPrChange> per section (accept/reject in Word).
    Changing margins/size also auto-realigns right-edge tab columns (résumé
    dates/locations): a LEFT tab calibrated to the old margins would overflow and wrap
    at the new width, so page setup converts each to a RIGHT tab flush at the new
    margin and reports how many it fixed — no second --tabs right step needed.
  • Tables carry a leading <!-- docx:table t0 widths="1,2,3in" borders="double" -->
    when columns are uneven or borders deviate from the default, plus a per-cell
    <!-- docx:cell t0:r0c0 gridSpan="2" vMerge="continue" shading="FFE699" -->
    note on merged/shaded cells — so structure invisible in GFM is visible
    (Table.borders / TableCell.shading in read --ast).
  • Images trail a <!-- docx:image img0 size="6.2x4.1in" float="yes" wrap="square" align="center" overflow="yes" -->
    note: size always (the ![](hash) alone doesn't say "6in wide"), and
    float/wrap/align/overflow only when they deviate (an inline, in-bounds
    image shows just its size). overflow flags an image wider than the usable text
    column (ImageRun.floating/wrap/align + EMU extents in read --ast).
  • Headers / footers surface as <!-- docx:header text="Quarterly Report" -->
    / <!-- docx:footer text="Page {page} of {pages}" --> notes (the type attr
    appears only for first/even). Fields read as tokens — {page} {pages}
    {date} {time} {styleref:NAME} {filename} {title} {author} ({time}
    read-only). A marginal that's
    the same on every section rides the top; one that differs by section renders at
    that section's start (alongside the docx:section note, which also renders at
    the section's start with applies-to="… (below)"), so each hint reads right before
    the content it governs. The text lives in the comment attribute
    so the importer drops it (it can't re-inject into the body); full entries are in
    read --ast under headers/footers (Marginal[]). Set with docx headers/docx footers.
  • Track-changes state rides a head <!-- docx:track-changes on --> line when the
    document's tracking toggle is enabled (deviation-only — off emits nothing), so an
    agent sees that subsequent edits will be redlined without inspecting settings.xml.
    Toggle it with docx track-changes FILE on|off; the three tracked-change read views
    (--accepted/--current/--baseline) are covered under the review loop below.

Mutate (change FILE in place; --dry-run, -v everywhere; -o PATH on every mutator except create, whose positional FILE is already the output)

docx create FILE [--title T] [--author A] [--text "..." | --text-file PATH | --from PATH.md | --from -] [--orientation O] [--size SIZE] [--margins M] [--header "..."] [--footer "..." | --page-numbers] [--force]
docx insert FILE (--after | --before) LOCATOR <content>   # LOCATOR = pN | tN | sN | tN:rRcC:pK
docx insert FILE (--at-start | --at-end) <content>        # no locator — prepend / append to the document
docx edit   FILE --at LOCATOR <content>                   # LOCATOR = pN | pN:S-E | pN-pM | sN | eqN | tN:rRcC:pK[:S-E]
docx delete FILE --at LOCATOR                             # LOCATOR = pN | pN-pM | tN | sN | tN:rRcC:pK (cell paragraph)
docx sections FILE [--at LOCATOR] [--columns N] [--type T] [--orientation O] [--size SIZE] [--margins M]   # LOCATOR = pN-pM | pN (wrap a range in N columns) | sN (edit one section's columns/type/page geometry). Multi-column layout AND page setup live HERE. PAGE GEOMETRY (margins/orientation/size) with NO --at applies to the WHOLE document (every section); --at sN targets one. Columns/type need --at.
docx styles set-default-font FILE "Font Name" [--size N] [--all]   # document-wide font: sets styles.xml docDefaults + theme major/minor; --all also repoints styles/runs that pin their own font
docx replace FILE PATTERN REPLACEMENT [--at pN] [--regex] [--ignore-case] [--all] [--limit N] [--current | --baseline] [--exact] [--track] [--dry-run]
#   Keeps the run's formatting (bold/font) and any tabs — the no-rebuild way to fill a
#   formatted/tabbed template line (e.g. "**Org Name**⇥Date"); don't hand-build --runs to refill it.
#   --at pN (or a cell paragraph tT:rRcC:pN) CONFINES the replace to one paragraph — use it when the
#   SAME placeholder repeats across the doc (a résumé's "City, State" in every entry) and you want THE
#   one in a specific paragraph, instead of find → edit --at pN:S-E span surgery. Batch entries take "at" too.

# Batch — apply many changes from ONE read (no re-reading between edits). Keys
# on each JSONL line mirror the command's flags; all locators address the doc as
# read. insert/edit also accept --batch - to read JSONL from stdin.
docx edit    FILE --batch fills.jsonl       # { at, <one of: text|clear|markdown|runs|code|task>, style?, … }
docx insert  FILE --batch additions.jsonl   # { after|before, <content>, style?, color?, … }
docx replace FILE --batch script.jsonl      # { pattern, replacement, at?, regex?, all?, limit?, … } applied in order ("at" scopes that entry to one paragraph)
docx delete  FILE --batch drop.jsonl        # { at } per line — whole blocks (pN/tN/cell), resolved live-first

# All four of insert/edit/delete/replace accept --track to record that one
# invocation as a tracked change even when the doc's track-changes toggle is off.
#
# insert/edit content selectors (run "docx insert --help" / "docx edit --help" for the full list):
#   --text "..." [--style NAME] [--alignment A] [--color HEX] [--bold] [--italic] [--url URL]
#       (a newline in --text becomes a line break <w:br/>, a tab becomes <w:tab/> — verse/addresses stay line-per-line)
#   paragraph spacing/indent (insert + edit, alone or with content, per-entry in --batch, across a range):
#       --space-before PT --space-after PT --line-spacing N(=1|1.5|2|single|double, or 15pt / "15pt atLeast")
#       --indent-left IN --indent-right IN --first-line IN --hanging IN  (points / inches; first-line ⊥ hanging;
#       left/right/first-line accept a negative value to outdent into the margin; hanging stays non-negative)
#       Under track-changes these record a tracked <w:pPrChange> (accept/reject in Word) — even when they ride
#       along with --text; read surfaces them as a deviation-only <!-- docx:p … space-after="6pt" --> hint.
#   edit --tabs right   fix a line whose tabbed-over content WRAPS (read flags it as `docx:layout … warn`,
#       and prints ONE consolidated fix-all summary at the top): swaps the fragile LEFT tab for a RIGHT tab
#       flush at the margin so a long value (e.g. a city) never wraps. Rides along with --text, works
#       per-entry in --batch, and on a RANGE (edit --at pN-pM --tabs right) cures every tab line at once.
#   edit --text ""      REMOVES the line (same as `delete`; a table cell's last paragraph is blanked, not
#       deleted, so the cell stays valid). In --batch, `{"at":"pN","text":""}` or `{"at":"pN","delete":true}`
#       removes a line — so a form-fill is ONE sweep: fill the cells with values, drop the leftover
#       placeholder lines. Use `--runs '[]'` to blank a paragraph but keep an empty spacer. (Empty
#       `--text` can't ride along with --clear/run-formatting/--style/--alignment/--tabs — those exit
#       with a USAGE error; use `--runs '[]'` to keep a formatted empty spacer instead.) A SPAN's
#       `--text ""` (pN:S-E) still deletes just those characters.
#   --runs '[{"type":"text","text":"X","bold":true}]'
#   --text-file PATH                               # (insert/create) LITERAL multi-paragraph text, NOT parsed — every char verbatim,
#       each newline = a new paragraph. For prose GFM would corrupt: "3. note" stays "3.", *x* / [t](u) / bare URLs / {++x++} untouched.
#   --markdown "..." | --markdown-file PATH        # GFM + math + CriticMarkup + inline HTML formatting → blocks
#   --code "..." | --code-file PATH [--language LANG]
#   --equation "x^2 + y^2" [--display]   (insert; edit also accepts --inline)
#   --clear bold,italic,highlight,color,size,font,…|all   (edit; strip run formatting, keep text)
#   --bold --italic --underline --strike --color HEX --highlight NAME --shade HEX --font NAME --size PT
#       --caps --smallcaps --superscript --subscript   (edit; SET run formatting on EXISTING text —
#       the inverse of --clear. Alone they format a span/paragraph/range in place; with --text they
#       fill AND format. Like --clear, applied directly — not recorded as a tracked change.)
#       NOTE: in a single no-content call (or one --batch entry) these run-format SET flags and the
#       paragraph properties (--style/--alignment/--space-*/--line-spacing/--indent-*/--first-line/
#       --hanging/--tabs) can't ride together — use separate calls/entries, or add --text to set both.
#   --task checked|unchecked | --list bullet|ordered [--list-level N]   (insert)
#   --task checked|unchecked                                            (edit, flip in place)
#   --table --rows N --cols N [--widths "A,B,C"] [--table-width V] [--borders S] [--layout L]   (insert)
#   --image SRC [--alt T] [--width IN] [--height IN] [--caption "Figure 1: …"]   (insert; SRC = path, data: URI, or http(s) URL; --caption adds a Word "Caption"-styled line under the figure)
#   --page-break | --column-break | --section [--columns N] [--type T]   (insert)

docx comments add     FILE --at LOCATOR --text "..." [--author NAME] [--current | --baseline]
docx comments add     FILE --anchor "phrase" --text "..." [--occurrence N]
docx comments add     FILE --batch reviews.jsonl                    # JSONL: { at | anchor (+occurrence), text, author? }
docx comments reply   FILE --at cN --text "..."
docx comments resolve FILE --at cN [--at cM ...] [--unset] | --batch resolutions.jsonl
docx comments delete  FILE --at cN [--at cM ...]          | --batch removals.jsonl

docx footnotes add    FILE --at pN[:offset] (--text "..." | --runs JSON | --markdown TEXT)
docx footnotes edit   FILE --at fnN (--text "..." | --runs JSON | --markdown TEXT)
docx footnotes delete FILE --at fnN
docx endnotes  add    FILE --at pN[:offset] (--text "..." | --runs JSON | --markdown TEXT)
docx endnotes  edit   FILE --at enN (--text "..." | --runs JSON | --markdown TEXT)
docx endnotes  delete FILE --at enN

# Headers & footers (one shared impl — "marginals"). Default placement is every
# page, all sections (--at sN targets one). Content: ONE primary source, except
# --text + one field = two-zone (text left, field right at a content-edge tab).
docx headers set   FILE [--at sN] [--type default|first|even | --first-page | --even | --odd] \
                        [--text "..."] [--align left|center|right] \
                        [--page-number [--of-pages] | --date [--date-format FMT] | --style-ref STYLE | --field filename|title|author] \
                        [--track] [--author NAME]
docx headers clear FILE [--at sN] [--type T | --first-page | --even | --odd]
docx footers set   FILE …   # identical flags, kind=footer (e.g. --page-number --of-pages → "Page X of Y")
docx footers clear FILE …

docx images extract FILE --to DIR [--at imgN]            # --to = output directory; --at picks one image
docx images replace FILE --at imgN --with ./new.png
docx images delete  FILE --at imgN

docx hyperlinks add     FILE --at pN:S-E --url URL
docx hyperlinks replace FILE --at linkN --with URL
docx hyperlinks delete  FILE --at linkN

docx tables insert-row    FILE --at tN [--position INDEX] [--cells "a,b,c"]
docx tables delete-row    FILE --at tN:rR
docx tables insert-column FILE --at tN [--position INDEX] [--width TWIPS]
docx tables delete-column FILE --at tN:cC
docx tables set-widths    FILE --at tN --widths "25%,25%,50%" | "1440,..." | auto
docx tables merge         FILE --at tN:rR1cC1-rR2cC2
docx tables unmerge       FILE --at tN:rRcC
docx tables borders       FILE --at tN [--style single|double|none] [--size N] [--color HEX]
docx tables format        FILE --at LOCATOR [--shade HEX|NAME] [--valign top|center|bottom]
                          [--halign left|center|right|justify] [--cell-borders SIDES]
                          [--align left|center|right] [--style ID] [--row-height M] [--repeat-header]

docx lists set            FILE --at pN [--start N] [--format FMT] [--restart] [--continue]
                          # renumber a NUMBERED list (--at = any item; applies to the whole list).
                          # FMT = decimal | lower-alpha | upper-alpha | lower-roman | upper-roman.
                          # --restart splits a fresh list off here; --continue picks up the previous
                          # list's numbering instead of restarting. Untracked (Word records no revision).

docx track-changes on|off FILE
docx track-changes list   FILE [--json]
docx track-changes accept FILE (--at tcN [--at tcM ...] | --at revN | --all)
docx track-changes reject FILE (--at tcN [--at tcM ...] | --at revN | --all)
docx track-changes apply  FILE [--accept H ...] [--reject H ...]
# `list` defaults to a text table, one LOGICAL change per line (revN collapses a del+ins
# pair onto one line); `--json` for the raw array. A del+ins REPLACE pair shares a
# "group": "revN"; `--at revN` accepts/rejects both halves in one call.
# To FINALIZE a review (accept some, reject the rest), use `apply` — it takes both decision
# lists in ONE call, resolved against the original ids, so nothing renumbers mid-operation
# and the file is never left half-finalized. Doing it as separate accept then reject calls
# renumbers the ids between them. After a subset accept/reject/apply, the confirmation
# re-lists what remains with its renumbered handles.

One rule to memorize: addressing an existing thing is always --at.
comments reply/resolve/delete, footnotes/endnotes edit/delete, images extract/replace/delete, hyperlinks replace/delete, tables *, track-changes accept/reject, edit, and delete all take --at LOCATOR. The exceptions are positional or directional by nature: insert uses --after/--before LOCATOR (or --at-start/--at-end for the document boundaries, no locator); read slices with --from/--to LOCATOR; wc takes a positional [LOCATOR]; find/replace take a positional QUERY/PATTERN (and replace accepts an optional --at pN to confine the substitution to one paragraph). images extract --to DIR is an output directory, not a locator.

Output contract

The CLI is built for non-interactive agents. Exit code is the success signal, output is data:

Exit Meaning Error codes
0 success
2 usage / bad locator USAGE, INVALID_LOCATOR
3 addressed thing not found FILE_NOT_FOUND, PART_NOT_FOUND, BLOCK_NOT_FOUND, COMMENT_NOT_FOUND, IMAGE_NOT_FOUND, HYPERLINK_NOT_FOUND, TRACKED_CHANGE_NOT_FOUND, MATCH_NOT_FOUND
1 general failure NOT_A_ZIP, TRACKED_CHANGE_CONFLICT, TABLE_STRUCTURE, IMAGE_SOURCE, RENDER_ENGINE, RENDER_FAILED, UNHANDLED

Errors print {code, error, hint?} JSON to stdout with a nonzero exit — note there is no ok field; the exit code plus code are the unambiguous signal.

The ok field appears in exactly one place: the --verbose success ack ({ok:true, operation, path, …}). Without -v, success output is shaped for the next command:

Command class Default stdout on success --verbose
Mutator that mints a new handlecomments addcN, comments replycN, footnotes/endnotes addfnN/enN, hyperlinks addlinkN, insert→the new pN the bare locator(s), one per line (a multi-block --markdown insert prints several) full {ok:true,…} ack
Mutator with no new handleedit, delete, replace, create, comments resolve/delete, images replace/delete, hyperlinks replace/delete, footnotes/endnotes edit/delete, headers/footers set/clear, tables *, track-changes accept/reject & toggle one-line confirmation<operation> <target> (e.g. edit t1:r0c1:p0, edit 7 changes, replace 0 occurrences replaced) (exit 0) full {ok:true,…} ack
find matched span locators, one per line (no matches → nothing, exit 0) --json{ totalMatches, query, view, matches:[…], normalizedQuery? }
wc the bare count (whole-doc adds a tab-separated sections column, like wc) --json{ words, scope, view, sections? }
outline indented LOCATOR⇥TEXT tree (two spaces per level) --json → nested [{ id, locator, level, style, text, children }]
read GFM Markdown; each paragraph carries its pN locator once — a trailing bare <!-- pN --> on plain paragraphs, or the leading token of its <!-- docx:p pN … --> note when one is emitted --ast → the JSON AST body (docx info schema)
render image paths, one per line --verbose{ok, operation, path, engine, output, pages}
* list (all eight list verbs) a bare JSON array; each item's id is its --at handle

--dry-run always prints a preview object (no ok) and writes nothing; it wins over -o/--output.

Discovering ids

Locators come in two flavors. Positional block ids (pN, tN, sN) are derived from document order and shift after structural edits — re-read between non-trivial mutations. Entity ids (cN, imgN, linkN, fnN, enN, tcN, eqN) are surfaced by a list verb (or read --ast) and are what you pass to --at:

Id Discover with Used by
pN / tN / sN (block ids) docx read FILE (the <!-- pN --> trailers), docx read FILE --ast, docx outline FILE (heading pNs), or docx render page images read, edit, insert, delete, wc, find results
cN (comment) docx comments list FILE comments reply/resolve/delete --at
fnN / enN (foot/endnote) docx footnotes list FILE / docx endnotes list FILE footnotes/endnotes edit/delete --at
hdrN / ftrN (header/footer) docx headers list FILE / docx footers list FILE (or read --ast) addressed by section+type, not the id: headers/footers set --at sN --type T
imgN (image) docx images list FILE images extract/replace/delete --at
linkN (hyperlink) docx hyperlinks list FILE hyperlinks replace/delete --at
tcN (tracked change) docx track-changes list FILE track-changes accept/reject --at
eqN (equation) docx read FILE --ast (run latex field) edit --at eqN --equation

Each list verb prints a bare JSON array where every item's id is exactly the handle you feed back to --at — pipe through jq to filter (docx comments list doc.docx | jq '.[] | select(.author=="Jane")').

Locators

docx info locators (--json for machine-readable) is the canonical reference. The grammar in brief:

pN              paragraph N                  pN:S-E          chars S..E within paragraph N
pN-pM           whole-paragraph range        pN:S-pM:E       cross-paragraph character range
sN              section break N              tN              table N
tN:rRcC         cell at row R, col C         tN:rRcC:pK      paragraph K of that cell (chainable)
tN:rR / tN:cC   table row R / column C       tN:rR1cC1-rR2cC2  rectangular cell region (merge)
cN  imgN  linkN  fnN  enN  tcN  eqN          entity ids (comment / image / hyperlink /
                                             footnote / endnote / tracked-change / equation)

Offset semantics: character offsets are 0-based, start-inclusive, end-exclusivep3:5-20 is the 15 characters at indices 5..19 of paragraph 3. Offsets count the visible text of the paragraph in the selected view (accepted by default).

Nested tables chain the same syntax arbitrarily deep — t0:r2c1:t0:r0c0:p0 is the first paragraph of the (0,0) cell of the first table nested inside the (2,1) cell of the document's first table.

Not every command accepts every form — each command's --at/--from/positional help lists exactly what it takes. The shapes:

Form Accepted by
pN, tN, sN, tN:rRcC:pK (blocks) read --from/--to, insert --after/--before, wc, comments add
pN, pN:S-E, pN-pM, sN, eqN, tN:rRcC:pK, tN:rRcC:pK:S-E edit --at (span/cell forms strip or replace just that range)
pN, pN-pM, tN, sN, tN:rRcC:pK delete --at
pN:S-E, pN:S-pM:E, tN:rRcC:pK:S-E (spans) comments add --at, hyperlinks add --at (single paragraph), find/wc results
pN[:offset] (point) footnotes/endnotes add --at
cN / fnN / enN / imgN / linkN / tcN (entities) the matching noun's --at (the c/fn/en/img/link/tc prefix is optional)
tN, tN:rR, tN:cC, tN:rRcC, tN:rR1cC1-rR2cC2 the tables verbs

Common workflows

find → comment. find emits bare locators that drop straight into comments add --at (same default view, so offsets line up — no coordinate translation):

docx comments add doc.docx --at "$(docx find doc.docx 'fatally flawed' | head -1)" \
    --text "Cite a source here?"
# or anchor by phrase directly:
docx comments add doc.docx --anchor "fatally flawed" --text "Cite a source here?"

read → edit markdown round-trip. read emits a markdown dialect that edit --markdown re-parses, so render → LLM-rewrite → splice-back is lossless for paragraphs/lists/quotes:

docx read doc.docx --from p3 --to p3              # → markdown (with <!-- p3 --> trailer)
# … hand to an LLM, get a revised block back …
docx edit doc.docx --at p3 --markdown-file revised.md   # multi-block source expands naturally

Use --markdown-file (not --markdown TEXT) when the source starts with - — Node's parseArgs rejects leading-dash flag values.

track-changes review loop. Toggle tracking on, make edits (they auto-emit <w:ins>/<w:del>), then inventory and resolve:

docx track-changes doc.docx on
docx replace doc.docx "old phrasing" "new phrasing" --all
docx track-changes list doc.docx                  # → JSON array of { id:tcN, kind, author, text, … }
docx read doc.docx --current                       # → CriticMarkup {++ins++}[^tcN] / {--del--}[^tcN]
docx track-changes accept doc.docx --at tc0 --at tc2   # or --all

read has three tracked-change views: default --accepted renders clean text — drops subtractive edits and inlines additive ones (the post-accept document); --current shows CriticMarkup with [^tcN] footnotes; --baseline does the reverse of accepted (the pre-change document). find, replace, wc, and comments add honor the same --accepted/--baseline/--current flags so offsets stay consistent across commands. Add --comments to read to append [^cN] footnotes for comment spans.

How It Works

In-place XML mutation. The AST returned by read is a view over the parsed XML tree, not a separate model. When you edit or comments add, the CLI mutates the underlying XML nodes directly and serializes back. Anything not modeled in the AST (custom styles, theme colors, schema extensions) survives because untouched regions are never re-emitted. Never delete a relationship something still references — that corrupts the file — so part/relationship pruning is gated on a reference scan; unreferenced orphans are left in place.

JSX for emitters. Constructing OOXML fragments imperatively (<w:rPr><w:b/><w:color w:val="800080"/>) is verbose, so fresh XML is authored in JSX with a custom factory: <w.rPr><w.b/><w.color w-val="800080"/></w.rPr> becomes the right XmlNode tree.

Span-aware comments & hyperlinks. comments add --at p3:5-20 (and hyperlinks add) find the runs containing offsets 5 and 20, split them at the boundaries (preserving <w:rPr> on both halves), and insert markers between the slices. Comments authored by older tools that lack w14:paraId (required by commentsExtended.xml) get a fresh paraId injected automatically on resolve/reply.

Tracked changes. With <w:trackChanges/> set, insert/edit/delete/replace emit native <w:ins>/<w:del> (attributed via --author, $DOCX_AUTHOR, or Reviewer); pass --track to one of those commands (or the tables verbs / images delete) to track just that invocation even when the doc toggle is off. edit --at pN --text runs a word-level diff so unchanged words keep their formatting and only changed words are wrapped — the same shape Word produces mid-tracking. accept/reject handle run-level ins/del/moveFrom/moveTo, sectPrChange, paragraph-mark ins/del, and the table-structural revisions (rowIns/rowDel, cellIns/cellDel, tblGridChange, tcPrChange). OOXML has no tracked-change construct for hyperlink edits or image swaps, so under tracking those emit a [docx-cli] audit comment instead of a fake revision (image deletion is honest removal — it wraps a real <w:del>).

Rich content. Images insert from a path, data: URI, or http(s) URL (bounded fetch; HEIC→JPEG transcode; SVG sanitized; non-public/metadata addresses refused at every redirect hop). Equations round-trip OOXML <m:oMath> ↔ LaTeX (reconstructed, not legacy plaintext) — authored via temml (LaTeX→MathML) plus an in-house MathML→OMML adapter, no LGPL deps. Code blocks emit one CodeBlock-styled paragraph per line with optional lowlight syntax highlighting (37 bundled languages); they collapse back to a GFM fenced block on read. GFM task lists round-trip Word's checkbox content control (and the Word-for-Web Wingdings-glyph variant), surfacing as taskState in the AST. Tables operate on a merge-aware logical grid so gridSpan/vMerge cells map onto physical <w:tc>, and structural edits refuse to bisect an existing merge.

Markdown dialect. create --from, insert/edit --markdown, and the note bodies all parse the same GFM + math + CriticMarkup + inline-HTML-formatting dialect (remark + remark-gfm + remark-math + an in-house inline-surgery transform), composing the existing OOXML emitters. read emits a compatible dialect, so the read → edit → write loop round-trips (lossless for paragraphs, lists, and nested blockquotes; code/tables/math/headings inside a blockquote intentionally escape to top level on import). read --ast is the fully lossless JSON form.

Literal text — the parser-free channel. When you want prose inserted exactly — reviewer notes, quoted excerpts, anything where Markdown would misfire — use create --text-file PATH / insert --text-file PATH (or - for stdin). Every character lands verbatim and each newline starts a new paragraph; nothing is interpreted, so 3. note stays 3. (no ordered-list renumber), and *x*, [t](u), bare URLs, and {++x++} are kept as written. This exists because GFM corruption isn't always escapable: bare URLs autolink with no escape sequence at all, and CriticMarkup eats {++…++} regardless of backslashes — so a literal path is the only safe way to author untouched prose.

Document-wide font. docx styles set-default-font FILE "Times New Roman" sets the font in the two places a font actually lives — word/styles.xml <w:docDefaults> (the formal default) and the theme font scheme (word/theme/theme1.xml, major + minor <a:latin>), since real Word docs resolve their fonts through the theme and touching only one silently loses to the other. Body text and theme-following headings both adopt it; styles or runs that pin their own font (a code block's monospace, a deliberately-Arial run) are preserved and named in the ack, with --all to repoint even those. --size N sets the default size on the same write.

Edit & create styles. docx styles set FILE --at Heading1 --color 1F4E79 --size 16 --bold rewrites the style definition in word/styles.xml, so every paragraph or run that uses the style picks up the change at once ("make all Heading 1s green") — the same run-/paragraph-formatting flags as edit (color/font/size/highlight/underline/caps + alignment/spacing/indentation for paragraph styles). docx styles create FILE Callout --color C00000 --bold mints a new paragraph or character style that insert/edit --style Callout can then apply. Editing an un-materialized built-in (--at Heading3 on a doc that never used it) provisions it first; a paragraph with its own direct formatting keeps it (the override wins — the definition edit never touches the body). Style edits are not tracked changes even under track-changes — matching Word, which applies style-definition edits to styles.xml directly with no redline.

List numbering. docx lists set FILE --at p12 --start 5 makes a numbered list begin at 5; --format upper-roman (or lower-alpha/upper-alpha/lower-roman) switches the glyph; --restart splits a fresh list off at that item; --continue makes a list pick up the previous list's numbering instead of restarting. --at names any item — the change applies to the whole list. The start round-trips through the markdown ordinal (the body reads 5. 6. 7.); the glyph and any continue link, which GFM can't express, surface as a deviation-only <!-- docx:list p12 format="upper-roman" --> / <!-- docx:list p20 continues --> hint (dropped on import, like every docx: note — read --ast carries list.start/list.format losslessly). Untracked, matching Word, which records no revision for a list-numbering change.

Run formatting beyond bold/italic. Properties markdown has no native syntax for — text color, theme color, highlight, shading, underline (all 18 styles + color), super/subscript, small/all caps, font, and size — are emitted as the HTML a markdown reader actually renders, so the output looks right in GitHub, VS Code, Obsidian, and browsers (Pandoc [text]{…} spans render as literal brackets in all of those). read emits semantic tags where they exist — <mark>overdue</mark>, <sup>x</sup>, <sub>2</sub> — a <span style="color:#C00000">…</span> for the CSS-expressible properties, and data-* attributes for the OOXML-only ones CSS can't express (theme colors, underline styles); insert/edit --markdown parses them back losslessly, and a leading <!-- docx:base font="Arial" size="8pt" --> note declares the document's dominant font/size once so the body isn't buried in per-run repetition. Bold/italic/strike/code/links stay native (**/*/~~/`/[](…)). Because the inline-surgery transform scans whole sibling sequences, a CriticMarkup marker or span can straddle other formatting — {++**bold insertion**++} is tracked correctly. An invalid enum value (e.g. a bogus highlight name) fails with a clear error rather than silently vanishing. Inserted plain content inherits the surrounding paragraph's font/size so it blends in.

Visual verification. docx render is the only command that needs an external runtime: it drives Microsoft Word (macOS via osascript, Windows via PowerShell COM — the ground-truth renderer) or LibreOffice (soffice, cross-platform) to produce a PDF, then rasterizes in-process via the bundled @hyzyla/pdfium WASM package — no poppler/pdftoppm/ImageMagick needed. Agents that consume PNGs use this to verify edits, diff accept/reject before-vs-after, or generate screenshots.

Stack

  • Runtime: Bun (node:util parseArgs, JSX with custom factory, native zlib)
  • Parser: jszip + fast-xml-parser + fast-xml-builder
  • Markdown: unified + remark-parse + remark-gfm + remark-math
  • Math: temml (MIT) compiles LaTeX → MathML; an in-house MathML → OMML adapter handles the OOXML side bidirectionally
  • Render: @hyzyla/pdfium (MIT wrapper + Apache-2.0 PDFium-as-WASM) for the PDF → PNG/JPG step, plus pngjs / jpeg-js for image encoding
  • Images: heic-convert (wasm libheif) transcodes HEIC/HEIF input to JPEG on insert
  • Quality: Biome + Knip + tsc; LibreOffice headless for round-trip integration tests
  • Standard: ECMA-376 Part 1 §17 (WordprocessingML), Transitional profile

Contributing

See CONTRIBUTING.md for development setup, architecture overview, and CI.

Reviews (0)

No results found