file-search-on

mcp
Security Audit
Warn
Health Warn
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Pass
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

File content type aware search with attribute and cel support

README.md

file-search-on

Content-type aware file search with CEL-powered attribute filtering.

file-search-on walks a directory tree and returns files matching a CEL expression evaluated over each file's metadata and content-type-specific attributes. Instead of grepping by name, ask things like:

file-search-on 'is_pdf && page_count > 10 && author == "Jane Doe"'
file-search-on 'is_image && gps_lat > 51.4 && gps_lat < 51.6'        # photos near home
file-search-on 'is_audio && artist == "Radiohead" && year < 2000'
file-search-on 'is_video && video_height >= 2160 && video_codec == "h265"'
file-search-on 'is_office && language == "fr"'
file-search-on 'is_markdown && "longread" in tags && word_count > 1000'

# Or match fuzzily — typos in the data are no longer fatal:
file-search-on 'is_audio && levenshtein(artist, "Radiohead") <= 2'                # catches "Radiohad", "Radiohea"
file-search-on 'is_image && soundex(camera_make) == soundex("Nikon")'             # phonetic match across capitalisation / spelling
file-search-on 'is_markdown && ngram_similarity(title, "kubernetes", 2) > 0.6'    # substring-tolerant title match

Across 74 file formats organised into thirteen content-type families (documents, data, images, audio, video, office, ebooks, plain text, archives, compiled binaries, email, source code, notebooks), with format-specific metadata extraction.

Built in the open — issues, PRs, and feature requests warmly welcomed. See Contributing.

Features

  • Pluggable content-type detection — extension-first with magic-byte fallback. New formats are a single registration call.

  • Thirteen content-type families, each with its own metadata extractors:

    Family Formats Bundle of attributes
    Documents PDF, EPUB title, author, language, page_count
    Markup Markdown, HTML, XML title, word_count, frontmatter, language, root_element
    Data JSON, YAML, TOML, CSV, TSV json_kind, yaml_kind, yaml_document_count, column_count, csv_columns
    Plain text TXT, log, … line_count, word_count
    Images JPEG, PNG, GIF, WebP, TIFF, BMP, SVG, HEIC, RAW (Canon CR2 / CR3, Nikon NEF, Sony ARW, Adobe DNG, Fujifilm RAF, Olympus ORF, Panasonic RW2) — predicates is_raw_photo, is_cr2, is_cr3, is_nef, is_arw, is_dng, is_raf, is_orf, is_rw2. HEIC + sibling MOV → Apple Live Photo pairing (is_live_photo, is_live_photo_video). dimensions + EXIF: camera, lens, GPS, ISO, focal_length, taken_at; RAW adds raw_kind, raw_vendor; Live Photo adds live_photo_video_path, live_photo_video_size, live_photo_image_path
    Audio MP3, M4A, FLAC, OGG tags (artist, album, genre, year, …) + duration, bitrate / nominal_bitrate, sample_rate, channels, bit_depth, ReplayGain
    Video MP4, MOV, MKV, WebM, AVI duration, bitrate / nominal_bitrate, video_codec, audio_codec, video_width/height, frame_rate, rotation, HDR / colour-space, subtitles
    Office DOCX, XLSX, PPTX, ODT title, author, language (Dublin Core)
    Archives ZIP (incl. JAR / WAR / EAR), TAR, TAR.GZ, GZIP entry_count, uncompressed_size, top_level_entries, has_root_dir
    Binaries ELF (Linux/BSD), Mach-O (macOS, incl. universal), PE (Windows). Mach-O code signature parsing surfaces team ID + entitlements. architectures, bitness, binary_format, binary_type, is_dynamically_linked, is_stripped, entry_point, is_codesigned, is_apple_signed, is_third_party_signed, codesign_identifier, codesign_team_id, codesign_hash_type, codesign_hardened_runtime, codesign_library_validation, codesign_killed, codesign_adhoc, entitlements, entitlement_app_sandbox, entitlement_full_disk_access, entitlement_network_client, entitlement_network_server
    Email RFC 5322 (.eml), Unix mbox (.mbox) title (subject), author (from), email_to, email_cc, sent_at, attachment_count, email_count
    Source code Go, Python, JS/TS, Rust, C/C++, Java, Ruby, Swift, Kotlin, Scala, Shell, Lua, Elixir, Clojure, Haskell, OCaml, Zig, C#, PHP, Perl, R, Ada, SQL, Visual Basic, Fortran, MATLAB, Assembly, Pascal/Delphi language, line_count, loc, comment_loc, blank_loc, functions / type_names / imports (Go / Python / Java / C# / PHP / Perl / R / MATLAB only)
    Notebooks Jupyter .ipynb, Apache Zeppelin .zpln cell_count, code_cell_count, markdown_cell_count, kernel, language, title
    Disk images DMG (UDIF), ISO 9660, VHD, VHDX, VMDK (sparse), QCOW2, WIM disk_image_format, virtual_size, disk_type, volume_label, disk_image_created_at, cluster_bits, is_encrypted, image_count
    Install packages macOS .pkg (XAR), Debian .deb, Red Hat .rpm, Linux .appimage package_format, package_name, package_version, package_release, package_arch, package_kind, appimage_version
    VM bytecode Java .class (JVM), Python .pyc / .pyo, WebAssembly .wasm bytecode_format, runtime_version, class_name (JVM), super_class (JVM), interfaces (JVM), method_count (JVM), field_count (JVM), access_flags (JVM), python_version, source_mtime, wasm_version, section_count, import_count, export_count
    Science data FITS (Flexible Image Transport System), VOTable (IVOA astronomical tables), HDF5 (Hierarchical Data Format v5 — LSST, LIGO, NetCDF4, scientific simulations), PDS3 + PDS4 (NASA Planetary Data System — Voyager, Mars rovers, Perseverance, Lucy), CDF (NASA Common Data Format — heliophysics: ACE, Wind, MMS, Parker Solar Probe) science_format, telescope, instrument, object (→ title), observer (→ author), date_obs (→ taken_at), exptime, filter, airmass, ra, dec, bitpix, naxis, naxis1, naxis2, hdu_count, fits_kind, votable_version, table_count, total_rows, field_names, field_units, field_ucds, votable_data_format, hdf5_format_version, hdf5_size_of_offsets, hdf5_size_of_lengths, pds_version, mission_name, spacecraft_name, instrument_name, target_name, product_id, start_time (→ taken_at), cdf_version, cdf_encoding, cdf_majority, variable_count, attribute_count
    Databases SQLite v3 + WAL / SHM sidecars + FTS3/4/5 body extraction (the most-deployed database in the world — every iOS / Android app, every browser, every CLI with a local store) database_format, sqlite_page_size, sqlite_format_version, sqlite_page_count, sqlite_schema_version, sqlite_text_encoding, sqlite_user_version, sqlite_application_id, sqlite_application_name, sqlite_fts_table_count, sqlite_fts_table_names, sqlite_wal_format_version, sqlite_wal_page_size, sqlite_wal_checkpoint_seq, sqlite_wal_frame_count, sqlite_wal_byte_order
    Apple property lists Binary (bplist00) + XML .plist — Info.plist, LaunchAgents, LaunchDaemons, Preferences, .webloc plist_format, plist_root_kind, plist_kind, plist_bundle_identifier, plist_bundle_name, plist_bundle_version, plist_bundle_short_version, plist_executable, plist_min_os_version, plist_label, plist_program, plist_program_arguments, plist_run_at_load, plist_keep_alive
    Browser bookmarks Chromium-family (Chrome / Brave / Edge / Chromium / Opera / Vivaldi / Arc) Bookmarks JSON + Safari Bookmarks.plist bookmark_count, bookmark_folder_count, bookmark_folders, bookmark_urls, bookmark_titles, browser_vendor, bookmark_profile
    Chat exports Slack workspace exports, Discord (DiscordChatExporter) dumps, signal-cli --json — detected by JSON shape (is_chat_export / is_slack_export / is_discord_export / is_signal_export) chat_message_count, chat_participants, chat_channel, chat_workspace, chat_start_at, chat_end_at
    Fonts TTF, OTF, TTC / OTC collections, WOFF1, WOFF2 (brotli decompression — full attribute extraction) font_format, font_outline_kind, font_family, font_subfamily, font_full_name, font_version, font_postscript_name, font_manufacturer, font_designer, font_license, font_license_url, font_typographic_family, font_weight, font_width, font_embedding, font_panose, font_unicode_ranges, font_revision, font_units_per_em, font_mac_style, font_italic_angle, font_glyph_count, font_axis_count, font_axes, font_collection_count, font_collection_families, woff2_total_sfnt_size, woff2_total_compressed_size
    3D models STL (ASCII + binary), Wavefront OBJ, glTF 2.0 (.gltf + .glb) — predicates is_3d_model, is_stl, is_obj, is_gltf model3d_format, vertex_count, face_count, has_normals, has_textures, materials, bounding_box

    Type predicates (is_pdf, is_image, is_audio, is_video, is_office, is_epub, …) light up automatically from the registered content type. See examples/ for recipes by family.

  • Exact-name content types for common repo files — Dockerfile, Makefile, LICENSE, .gitignore, go.mod, package.json, Cargo.toml, Pipfile, Gemfile, requirements.txt, Procfile, Vagrantfile, and more — with per-type predicates (is_dockerfile, is_gomod, is_node_manifest, …) plus family predicates (is_build, is_repo_meta, is_ignore, is_manifest, is_platform). Predicates cross-fire: package.json is both is_node_manifest and is_json. See examples/repo-files.md.

  • OS-generated metadata files.DS_Store / .localized (macOS), Thumbs.db / Desktop.ini (Windows), .directory (KDE) — with per-type predicates (is_ds_store, is_localized, is_thumbs_db, is_desktop_ini, is_kde_directory), OS-specific family predicates (is_macos_metadata, is_windows_metadata, is_linux_metadata), and the cross-OS is_system_metadata. Lets agents answer "find every macOS leftover under ~/Code" or "what platform-cruft is in this archive?" in one query.

  • Apple property lists (.plist) — binary (bplist00) and XML variants. Surfaces is_plist plus a typed attribute set (plist_format, plist_root_kind, plist_kind, plist_bundle_identifier, plist_bundle_name, plist_bundle_version, plist_bundle_short_version, plist_executable, plist_min_os_version, plist_label, plist_program, plist_program_arguments, plist_run_at_load, plist_keep_alive). Path-based plist_kind registry labels Info.plist / LaunchAgents / LaunchDaemons / Preferences / .webloc files. Lets agents answer "which LaunchAgents run on login?", "what apps require macOS 14+?", or "find the Info.plist for com.example.bundle" in one query.

  • Browser bookmarks — Chromium-family Bookmarks (Chrome / Brave / Edge / Chromium / Opera / Vivaldi / Arc) and Safari Bookmarks.plist. Surfaces is_bookmark_file / is_chromium_bookmarks / is_safari_bookmarks plus bookmark_count, bookmark_folder_count, bookmark_folders, bookmark_urls, bookmark_titles, browser_vendor (chrome / chromium / edge / brave / opera / vivaldi / arc / safari), and bookmark_profile. With --body, the body CEL variable carries title\turl lines so body.contains("kubernetes") answers "did I bookmark anything about kubernetes?" across every profile in one query.

  • Chat exports — offline Slack workspace exports, Discord (DiscordChatExporter) JSON dumps, and signal-cli --json output. All three are plain .json files with arbitrary names, so they're detected by a streaming top-level-JSON-shape discriminator rather than by extension. Surfaces is_chat_export plus per-format is_slack_export / is_discord_export / is_signal_export, and a shared attribute set: chat_message_count, chat_participants (distinct authors), chat_channel, chat_workspace (guild for Discord; empty for Signal), chat_start_at, and chat_end_at. With --body, the body CEL variable carries one {timestamp}\t{author}\t{text} line per message so is_chat_export && body.contains("kubernetes") greps the conversation text across an entire export. See examples/chat-exports.md.

  • Screenshot OCR--ocr runs OCR over image/* files via the registered provider (macOS Vision today; Linux Tesseract / Windows.Media.Ocr deferred under the same hook). The recognized text populates the body CEL variable so body.contains("kubernetes") queries work over ~/Desktop screenshots the same way they do over markdown files. Plus three new attributes: ocr_confidence (0..1 average across recognized lines), ocr_language (BCP-47 dominant language), ocr_provider (registered engine name). On macOS the OCR helper is bundled in the Homebrew cask; for local dev make ocr-helper builds it. On platforms without a registered provider, --ocr is a clean no-op. Cached in the body cache (bodies_v1) so subsequent walks are free. See examples/ocr.md.

  • Fonts — TrueType (.ttf), OpenType (.otf), TTC / OTC collections, WOFF1 (.woff), and WOFF2 (.woff2). WOFF2 attribute extraction runs the brotli decompression hop, then slices the metadata tables (name / OS/2 / head / post / maxp / fvar) from the decompressed stream and dispatches to the same per-table decoders as the bare-sfnt path — font_family, font_designer, font_weight, font_axes all populate for .woff2 collections in modern frontend projects. Surfaces format-family predicates (is_font, is_ttf, is_otf, is_font_collection, is_woff, is_woff2) plus trait predicates (is_variable_font, is_color_font, is_monospace_font, is_italic_font, is_bold_font). Extracted attributes cover the name table (family, designer, version, manufacturer, license), OS/2 (weight, width, embedding permissions, panose, Unicode ranges), head (revision, units-per-em, mac style), post (italic angle), maxp (glyph count), and fvar (variable-font axes — wght / wdth / slnt / ital / opsz). Lets agents answer "find every variable font with an optical-size axis", "license audit — fonts without OFL", or "find Adobe-designed bold fonts" in one query. See examples/fonts.md.

  • Project-type detectiondetect-project / find-projects / which-project subcommands identify Go / Node / Rust / Python / Ruby / Java / .NET / Terraform / Docker Compose / Hugo / Jekyll / Eleventy / Astro / Gatsby / MkDocs / Docusaurus / Pelican projects (8 SSG types + 10 others). Pair with --resolve-projects (file-level project_type filter) and --prune-build-artefacts (skip vendor/node_modules/target/__pycache__/public/_site etc. automatically). The is_static_site CEL predicate addresses any SSG as a group. Define custom project types via CEL in YAML — see examples/projects.md.

  • First-class Markdown front-matter — YAML (---), TOML (+++), and JSON ({ ... }) are recognised by leading bytes. Common keys (title, author, language, tags, categories, draft, date) become top-level CEL variables; everything else lives in a generic frontmatter map. See examples/markdown.md.

  • CEL expressions — the full Common Expression Language: comparisons, &&/||, string functions, list membership, timestamp arithmetic. Composes naturally with structural attributes.

  • Fuzzy, phonetic, and geographic matching — built-in levenshtein, soundex, ngrams, ngram_similarity, and point_in_polygon (for GPS bboxes / city outlines) let you write typo-tolerant and "sounds-like" queries against any string attribute. EXIF camera make in Nikkon instead of Nikon? Artist tag mistyped as Radiohad? Same query catches all of them. See examples/fuzzy-search.md.

  • Multiple output formatsbare (paths only), default, verbose (multi-line), json (NDJSON), or a Go text/template via --format.

  • MCP server mode — same binary doubles as a Model Context Protocol server (stdio, HTTP, or SSE). Twenty tools exposed: search, search_semantic, read_attributes, read_lines, stats, find_duplicates, find_near_duplicates, diff_trees, find_matches, watch_search, list_archive_contents, read_file_in_archive, detect_project, find_projects, resolve_project_for_path, list_attributes, list_presets, query_preset, index_stats, monitor_info.

  • Pure Go, no CGO — cross-compiles cleanly to all six release targets. No image/audio/video decoder dependencies.

  • Parallel walking — files are evaluated across a worker pool (defaults to NumCPU).

Install

Homebrew (macOS / Linux)

brew install richardwooding/tap/file-search-on

The cask is published from this repo on every tagged release to richardwooding/homebrew-tap.

macOS note: the binary isn't signed with an Apple Developer ID (yet — happy to accept a sponsor!). The Homebrew cask's post-install hook strips the quarantine xattr automatically. If macOS still blocks it on first run:

sudo xattr -dr com.apple.quarantine $(brew --prefix)/bin/file-search-on

Container (Docker / Podman)

OCI images are published to GitHub Container Registry on every tag, with linux/amd64 and linux/arm64 manifests:

docker run --rm -v "$PWD:/work" ghcr.io/richardwooding/file-search-on:latest \
  'is_markdown && draft' -d /work

Pin to a specific version with :vX.Y.Z. The base image is cgr.dev/chainguard/static, so the container has the binary and nothing else (no shell).

Pre-built binaries

Pre-built archives for Linux, macOS, and Windows on amd64 and arm64 are attached to every GitHub Release, along with a checksums.txt you should verify.

From source

Requires Go 1.26.2 or newer.

go install github.com/richardwooding/file-search-on/cmd/file-search-on@latest

Or build from a clone:

git clone https://github.com/richardwooding/file-search-on.git
cd file-search-on
go build -o file-search-on ./cmd/file-search-on

Usage

search is the default subcommand. Pass a CEL expression and a directory:

file-search-on 'is_markdown && word_count > 500' -d ./docs
file-search-on 'is_image && iso > 1600' -d ~/Pictures -o json
file-search-on 'is_video && duration > 1800 && video_height >= 2160' -d ~/Movies
file-search-on -d .                                   # empty expression matches every file

Subcommands

Command Purpose Deep dive
search (default) CEL expression over file metadata every page in examples/
preset [name] Run a named search recipe — recent_changes, large_files, suspicious_files, etc. Without args, lists all presets. examples/presets.md
attrs <path> Print attributes for one file (no walk, no CEL) examples/cookbook.md
stats [expr] Histogram + totals, bucketed by group_by examples/group-by.md
duplicates [expr] Byte-identical files by sha256 examples/duplicates.md
near-duplicates [expr] Similar files by SimHash fingerprint of extracted body examples/near-duplicates.md
archive-contents <path> [--expr] List or filter entries inside ZIP / TAR / TAR.GZ / GZIP — full CEL vocabulary on per-entry attributes examples/archive-search.md
archive-read <path> <entry> Read a single entry's bytes out of an archive without extracting examples/archive-search.md
find-matches <re> --expr <cel> -C N Line-level regex hits with context examples/find-matches.md
watch [expr] -d <dir> Continuously watch directories; emit each new / changed file that matches — the inverse of search examples/watch.md
diff <tree-a> <tree-b> --op <set-op> Cross-tree set operations by sha256 — what's in A but not B, the intersection, content drift between same-named files examples/diff.md
organize <expr> --link-into <template> Build a templated symlink / copy tree from results — {raw_vendor}/{taken_at_year}/{basename} etc. examples/organize.md
lines <path> --start --end Print a line range examples/read-lines.md
detect-project [dir] Identify project type(s) of a directory examples/projects.md
find-projects [root] Walk a tree listing every project subdirectory examples/projects.md
which-project <path> Walk UP from a file/dir to its nearest enclosing project root examples/projects.md
config-paths Print platform-specific project-type config paths examples/projects.md
monitors List the dashboard URLs of every running instance (mcp / watch started with --monitor) examples/monitoring.md
mcp Run as a Model Context Protocol server MCP server mode

file-search-on --list prints the canonical schema (every attribute, every built-in function, every registered content type) — useful for "what can I filter on?" exploration.

Output formats

file-search-on '...' -o bare        # paths only — pipes well into xargs / fzf
file-search-on '...' -o default     # path \t [content-type] \t size
file-search-on '...' -o verbose     # multi-line per match with every attribute
file-search-on '...' -o json        # NDJSON, one match per line
file-search-on '...' --format '{{.Path}} ({{.WordCount}} words)'

Content search

CEL's standard string methods (contains, startsWith, endsWith, matches) work on every string attribute. Pass --body to populate the body variable from text-based files (markdown, source, csv, json, xml, html, plus is_text) and filter on full file content:

file-search-on 'is_source && body.contains("panic")' --body -d ./internal
file-search-on 'is_source && body.matches("(?i)\\bTODO\\b")' --body
file-search-on '...' --sort word_count --order desc --limit 5

Top-K queries (--sort + --limit) buffer the full result set, sort, then truncate. Without --sort, --limit returns the first N in walk order.

For custom ranking — combining multiple attributes or semantic similarity into a single score — pass a CEL expression to --rank:

# Hybrid semantic + recency: weight similarity at 70%, fresh files at 30%
file-search-on 'is_pdf' \
  --semantic-query "Q4 revenue forecast" \
  --embedding-model nomic-embed-text \
  --rank 'similarity * 0.7 + (mod_time > timestamp("2025-01-01T00:00:00Z") ? 0.3 : 0.0)' \
  --limit 10

# Promote PDFs to the top of a mixed result set
file-search-on 'is_pdf || is_office || is_markdown' --rank 'is_pdf' --limit 20

The rank expression evaluates per file (after the filter). Higher values rank first; --order asc flips. See examples/ranking.md for the full cookbook.

Stats and reconnaissance

file-search-on stats -d ~/Downloads                                    # by content_type (default)
file-search-on stats 'is_image' -d ~/Pictures --group-by camera_make
file-search-on stats 'is_source' -d ./src --group-by language
file-search-on stats 'is_image' -d ~/Pictures --group-by taken_at_year
file-search-on stats --dir ~/docs --dir ~/posts --group-by ext         # multi-root aggregation

group_by keys: content_type (default), ext, dir, language, camera_make, camera_model, lens, artist, album, genre, kernel, binary_format, binary_type, frontmatter_format, plus time-bucket keys (mtime_year/month/day, taken_at_*, sent_at_*, date_*). Unrecognised keys silently fall back to content_type.

Project-type detection

file-search-on detect-project ~/my-app
file-search-on find-projects ~/Code --type go --type rust
file-search-on 'is_source && project_type == "go"' \
    --resolve-projects --prune-build-artefacts -d ~/Code
file-search-on config-paths                       # where to drop user-wide / per-project YAML

--resolve-projects walks up from each file's directory to the nearest project root and sets project_type (string), project_types (list), and is_static_site (bool — fires for hugo / jekyll / eleventy / astro / gatsby / mkdocs / docusaurus / pelican). --prune-build-artefacts does a pre-walk to discover all project subdirectories under the search root and skips their canonical artefact directories (vendor, node_modules, target, __pycache__, .venv, bin, obj, .terraform, public, _site, dist, …). Custom project types are user-definable via CEL — drop a YAML at the path printed by config-paths. Full guide: examples/projects.md.

Duplicates and disk-eaters

file-search-on duplicates -d ~/Pictures                   # all duplicates under a tree
file-search-on duplicates 'is_image' -d ~/Pictures        # scope to photos
file-search-on duplicates -d /Volumes/backup --min-size 1048576  # skip files < 1 MiB
file-search-on duplicates -d ~/Downloads -o json

Two-pass: files with unique sizes are skipped before any hashing. With --index-path, hashes are cached alongside (size, mtime) so repeat runs are free.

For SIMILAR (not identical) files — catching typo edits, regenerated headers, template copies that exact-hash dedup misses — use the SimHash-based near-duplicates subcommand:

file-search-on near-duplicates -d ~/notes                          # 0.85 similarity default
file-search-on near-duplicates 'is_markdown' -d ~/notes --threshold 0.95   # whitespace/typo only
file-search-on near-duplicates 'is_source && language == "go"' -d ./src --threshold 0.75

Fingerprints cache via --index-path alongside the exact hash; repeat runs skip body extraction AND SimHash compute. See examples/near-duplicates.md.

Common flags

-d <dir> (repeatable for multi-root walks), --exclude <glob> (basename, repeatable), --respect-gitignore, --timeout 30s (partial results returned on expiry), --workers N, --index-path <file.db> (override the per-cwd default index — see examples/indexing.md), --no-index (opt out of on-disk caching for hermetic runs).

Pointing at a non-default Ollama

For semantic search and search_semantic (MCP), the embedding HTTP endpoint resolves in this order:

  1. --embedding-server <url> flag (CLI or mcp subcommand)
  2. $OLLAMA_HOST environment variable
  3. http://localhost:11434 (built-in default)

So a remote Ollama box on the LAN works without a per-invocation flag: export OLLAMA_HOST=http://gpu-box:11434. See examples/semantic-search.md for the full setup.

Recipes

Focused recipe collections live under examples/:

Recipe file What's in it
examples/markdown.md Front-matter (YAML / TOML / JSON), draft flags, tag membership, custom keys
examples/images.md EXIF camera/lens, GPS bounding boxes, ISO / aperture / focal length, taken-at ranges
examples/ocr.md Screenshot OCR via macOS Vision — body.contains(...) queries against screenshots (macOS only; Linux / Windows providers are deferred under the same hook)
examples/audio.md Artist / album / genre / year, bitrate, sample rate, hi-res filtering
examples/video.md Codec, resolution, frame rate, duration, MKV vs MP4
examples/3d-models.md STL / OBJ / glTF — vertex / face counts, materials, bounding box, printability triage
examples/office.md DOCX / XLSX / PPTX / ODT — title, author, language
examples/epub.md EPUB books — title, author, language; XMP fallback
examples/data.md JSON arrays vs objects, CSV column membership, XML root elements
examples/text.md Plain text / log files — line count, word count, big-line caps
examples/notebooks.md Jupyter (.ipynb) and Apache Zeppelin (.zpln) — cell_count, code_cell_count, kernel, language
examples/projects.md Project type detection — detect-project / find-projects for go / node / rust / python / terraform / docker-compose / …
examples/cookbook.md Cross-cutting recipes — dedupe, mixed media filters, pipeline integration
examples/fuzzy-search.md Fuzzy / phonetic / n-gram similarity matching — levenshtein, soundex, ngrams, ngram_similarity; perceptual image similarity (image_similar_to)
examples/secret-scan.md Credential / token triage — has_secrets(body) + secret_kinds(body) over file content
examples/indexing.md Persistent attribute index (--index-path) — cold/warm CLI runs, MCP auto-on cache, refresh + inspection
examples/timeouts.md Timeouts and partial results — CLI --timeout, MCP timeout_seconds, exit codes, cancellation semantics
examples/top-k.md Top-K queries — --sort + --limit for "biggest 5 videos", "10 most recent photos", etc.
examples/snippets.md Body previews — --snippet returns the first N lines of text files alongside metadata
examples/exclude.md Pruning the walk — --exclude basename globs and --respect-gitignore
examples/body-search.md Content filters — --body exposes file body to CEL; pair with contains / matches (RE2) / startsWith
examples/stats.md Directory reconnaissance — file-search-on stats aggregates a content-type histogram with totals
examples/group-by.md Stats bucketed by any attribute — --group-by camera_make, --group-by language, --group-by taken_at_year, etc.
examples/read-lines.md Print a specific line range from a file — pairs with search to fetch match context
examples/duplicates.md Find byte-identical files by sha256 — file-search-on duplicates [--min-size N]
examples/near-duplicates.md Find SIMILAR files by SimHash fingerprint — file-search-on near-duplicates --threshold 0.85
examples/organize.md Organize by query — templated symlink / copy trees from search results (organize … --link-into '{raw_vendor}/{taken_at_year}/{basename}')

A handful of representative one-liners:

# All Markdown files larger than 500 words
file-search-on 'is_markdown && word_count > 500' -d ./docs

# 4K HEVC videos longer than 30 minutes
file-search-on 'is_video && video_height >= 2160 && video_codec == "h265" && duration > 1800' -d ~/Videos

# Photos taken in 2024 with a Sony camera at high ISO
file-search-on 'is_image && camera_make == "SONY" && iso > 1600 && taken_at > timestamp("2024-01-01T00:00:00Z")' -d ~/Pictures

# CSVs with a "revenue" column
file-search-on 'is_csv && csv_columns.exists(c, c == "revenue")' -d ./reports

# French-language office documents
file-search-on 'is_office && language == "fr"' -d ~/Documents

# Audio tracks ≥ 96 kHz (hi-res)
file-search-on 'is_audio && sample_rate >= 96000' -d ~/Music

# Fuzzy: artist tag within 2 edits of "Radiohead" (catches typos)
file-search-on 'is_audio && levenshtein(artist, "Radiohead") <= 2' -d ~/Music

# Phonetic: any author whose name sounds like "Smith"
file-search-on 'is_markdown && soundex(author) == soundex("Smith")' -d ./posts

Combine paths and types — find HTML files inside a build/ directory:

file-search-on 'is_html && dir.contains("build")'

Available attributes

file-search-on --list prints the canonical schema with descriptions. The summary below names every attribute so you know what you can reach in a CEL expression; for recipes and detailed semantics see the per-family pages under examples/.

On every file

name, path, dir, size, ext, content_type, mod_time, created_at (filesystem birth time / btime — modern fs only), metadata_changed_at (ctime — last permission / ownership change), is_btime_anomaly (true when created_at > mod_time).

Type predicates

By formatis_markdown, is_json, is_yaml, is_toml, is_xml, is_html, is_pdf, is_csv, is_text, is_image, is_audio, is_video, is_office, is_epub, is_archive, is_binary, is_email, is_source, is_notebook, is_disk_image, is_dmg, is_iso, is_vhd, is_vhdx, is_vmdk, is_qcow2, is_wim, is_install_package, is_pkg, is_deb, is_rpm, is_appimage, is_test_file, is_generated_code, is_symlink, is_broken_symlink, is_bytecode, is_class, is_pyc, is_wasm, is_science_data, is_fits, is_votable, is_hdf5, is_pds, is_pds3, is_pds4, is_cdf, is_database, is_sqlite, is_sqlite_wal, is_sqlite_shm.

By exact filenameis_dockerfile, is_makefile, is_justfile, is_rakefile, is_license, is_changelog, is_contributing, is_codeowners, is_gitignore, is_dockerignore, is_gomod, is_node_manifest, is_cargo_manifest, is_pipfile, is_python_reqs, is_gemfile, is_procfile, is_vagrantfile, is_ds_store, is_localized, is_thumbs_db, is_desktop_ini, is_kde_directory, is_plist.

By familyis_build, is_repo_meta, is_ignore, is_manifest, is_platform, is_macos_metadata, is_windows_metadata, is_linux_metadata, is_system_metadata. Fire alongside the per-type predicate (a Dockerfile is both is_dockerfile and is_build; a .DS_Store is is_ds_store, is_macos_metadata, AND is_system_metadata). Same shape as is_image covering every image/* subtype.

Cross-firing: a package.json matches is_node_manifest AND is_json; Cargo.toml matches is_cargo_manifest AND is_toml; LICENSE / CHANGELOG / CONTRIBUTING / requirements.txt match their per-type predicate AND is_text.

Per-family attributes

Family Attributes
Documents / markup title, author, language, word_count, line_count, page_count, column_count
Data json_kind, yaml_kind, yaml_document_count, csv_columns, root_element
Markdown frontmatter tags, categories, draft, date, frontmatter, frontmatter_format (plus the document title/author/language keys are promoted)
Body filter body (text content types; opt-in via --body CLI / include_body MCP). Use CEL string methods: body.contains(...), body.matches(...) (RE2), body.startsWith(...), size(body). With --ocr (CLI) / ocr_images: true (MCP), body is also populated for image/* files via the registered OCR provider (macOS Vision); see ocr_confidence, ocr_language, ocr_provider below.
OCR (image text) ocr_confidence (0..1 average per-line confidence), ocr_language (BCP-47 detected dominant language), ocr_provider (registered provider name: vision-macos today). Populated only when --ocr is set AND an OCR provider is available on the platform. macOS Vision via a bundled Swift helper; Linux Tesseract / Windows.Media.Ocr are future providers under the same hook. Issue #189.
Images img_width, img_height, camera_make, camera_model, lens, taken_at, orientation, gps_lat, gps_lon, iso, focal_length, f_stop, exposure_time. RAW photos additionally stamp raw_kind (cr2 / cr3 / nef / arw / dng / raf / orf / rw2) and raw_vendor (canon / nikon / sony / adobe / fujifilm / olympus / panasonic) — the camera EXIF fields populate via the same imagemeta path as JPEG / TIFF. HEIC files paired with a sibling MOV (Apple Live Photos) surface live_photo_video_path + live_photo_video_size; the MOV side surfaces live_photo_image_path and is_live_photo_video. With --with-phash (CLI) or with_phash: true (MCP) — auto-enabled when image_similar_to(...) appears in the expression — every image gets a 16-char hex phash attribute for perceptual-similarity queries.
Audio artist, album, album_artist, composer, year, track, genre, duration, bitrate, nominal_bitrate, sample_rate, channels, bit_depth, replaygain_track_gain, replaygain_album_gain
Video video_codec, audio_codec, video_width, video_height, frame_rate, rotation, duration, bitrate, nominal_bitrate, is_hdr, color_primaries, color_transfer, subtitles, subtitle_languages
Archives entry_count, uncompressed_size, top_level_entries, has_root_dir
Binaries architectures, bitness, binary_format, binary_type, is_dynamically_linked, is_stripped, entry_point. Mach-O code signature (macOS-specific): is_codesigned, is_apple_signed, is_third_party_signed, codesign_identifier, codesign_team_id, codesign_hash_type, codesign_hardened_runtime, codesign_library_validation, codesign_killed, codesign_adhoc, entitlements, entitlement_app_sandbox, entitlement_full_disk_access, entitlement_network_client, entitlement_network_server
Email email_to, email_cc, email_message_id, email_in_reply_to, sent_at, attachment_count, email_count (plus shared title / author)
Source code language, line_count, loc, comment_loc, blank_loc, functions, type_names, imports (last three populated for Go via stdlib AST + Python / Java / C# / PHP / Perl / R / MATLAB via regex — agents querying "where is X defined?" / "which files import Y?" hit cached attributes instead of grep)
Notebooks cell_count, code_cell_count, markdown_cell_count, kernel (plus shared language / title)
Disk images disk_image_format, virtual_size, disk_type (VHD / VMDK), volume_label (ISO), disk_image_created_at (VHD / ISO; in-header creation time, distinct from filesystem created_at), cluster_bits (QCOW2), is_encrypted (QCOW2), image_count (WIM)
Install packages package_format, package_name (RPM), package_version (RPM), package_release (RPM), package_arch (RPM), package_kind, appimage_version
Repo metadata license_id (SPDX id detected from LICENSE / LICENCE / COPYING / UNLICENSE body)
Symlinks is_symlink, is_broken_symlink, target_path (raw ln -s target; relative or absolute as recorded on disk)
Forensic hashes md5, sha1, sha256 — populated only when --with-hashes (CLI) or compute_hashes: true (MCP) is set. Single io.MultiWriter pass over the file; cached alongside (size, mtime). Forensic / NSRL / VirusTotal / threat-intel-feed interop.
Disguise detection magic_content_type, extension_content_type, is_disguised — populated only when --check-disguised (CLI) or check_disguised: true (MCP) is set. is_disguised fires when the bytes disagree with the extension (classic "this .txt contains a PE binary" indicator). Cached alongside (size, mtime).
Hash allowlist / denylist is_known_good, is_known_bad — populated when --hash-allowlist / --hash-denylist (CLI) or hash_allowlist_path / hash_denylist_path (MCP) is set. Both auto-detect text vs pre-built bbolt format. NSRL / VirusTotal / threat-intel-feed interop; combine with !is_known_good && is_binary to cut forensic disk-image review surfaces by 80-95%.
Extended attributes (macOS) xattr_keys, xattr_count, is_xattr_rich, is_quarantined, quarantine_agent, quarantine_event_id, quarantine_source_url, quarantine_referrer_url, quarantine_download_date, quarantine_user_approved, finder_tags, finder_color, has_finder_comment — populated only when --with-xattrs (CLI) or with_xattrs: true (MCP) is set. Darwin-only; non-Darwin walks silently leave these empty. Forensic-grade — quarantine carries the source URL + download date + Gatekeeper approval state for every file downloaded from the web. Compose with is_codesigned for malware-triage one-liners: binary_format == "mach-o" && !is_codesigned && is_quarantined.
Semantic similarity similarity (double, 0-1) — populated when --semantic-query (CLI) / search_semantic tool (MCP) is set. Cosine similarity between the file's body embedding and the query embedding, computed via local Ollama. Compose with type predicates: is_pdf && similarity > 0.7 finds PDFs conceptually related to the query. Vectors cache in the index alongside (size, mtime).
VM bytecode bytecode_format, runtime_version, class_name (JVM), super_class (JVM), interfaces (JVM), method_count (JVM), field_count (JVM), access_flags (JVM), python_version, source_mtime, wasm_version, section_count, import_count, export_count
Science data — FITS science_format, telescope, instrument, object, observer, date_obs, exptime, filter, airmass, ra, dec, bitpix, naxis, naxis1, naxis2, hdu_count, fits_kind (plus shared titleOBJECT, authorOBSERVER, taken_at ← parsed DATE-OBS)
Science data — VOTable votable_version, table_count, total_rows, field_names, field_units, field_ucds, votable_data_format (plus shared title ← root DESCRIPTION, authorINFO[@name='creator'])
Science data — HDF5 hdf5_format_version, hdf5_size_of_offsets, hdf5_size_of_lengths (v1 scope is superblock-only; recursive hierarchy walk — group_count, dataset_count, top_level_groups — is a follow-up)
Science data — PDS pds_version (PDS3 or PDS4), mission_name, spacecraft_name, instrument_name, target_name, product_id, start_time (plus shared title ← composed from instrument + target, or PDS4 explicit title; taken_at ← parsed start_time)
Science data — CDF cdf_version, cdf_encoding, cdf_majority (row / column), variable_count (NrVars + NzVars), attribute_count. v1 surfaces CDR + GDR header fields; the ISTP global-attribute walk for title / author / taken_at is a follow-up.
Fonts font_format (ttf / otf / ttc / otc / woff / woff2), font_outline_kind (truetype / cff / cff2), font_family, font_subfamily, font_full_name, font_version, font_postscript_name, font_manufacturer, font_designer, font_license, font_license_url, font_typographic_family, font_weight (100–900), font_width (1–9), font_embedding (installable / restricted / preview-print / editable — informational, not enforced), font_panose (10-byte hex), font_unicode_ranges, font_revision, font_units_per_em, font_mac_style, font_italic_angle, font_glyph_count, font_axis_count, font_axes (variable-font axes — wght / wdth / slnt / ital / opsz), font_collection_count, font_collection_families. WOFF2 surfaces the full set above plus the header byte counts woff2_total_sfnt_size, woff2_total_compressed_size for compression-ratio queries. The shared font_family and font_designer also dual-surface to the cross-family title and author variables.
Databases — SQLite Header: database_format, sqlite_page_size, sqlite_format_version (1 legacy / 2 WAL), sqlite_page_count, sqlite_schema_version, sqlite_text_encoding (utf-8 / utf-16le / utf-16be), sqlite_user_version, sqlite_application_id, sqlite_application_name (curated human-readable label from a known-app registry — firefox-places, chrome-history, apple-imessage, apple-keychain, macos-libcache, fossil-scm, …). Schema (via hand-rolled sqlite_master b-tree walker): sqlite_table_count, sqlite_view_count, sqlite_index_count, sqlite_trigger_count, sqlite_table_names (sorted, capped at 100), sqlite_schema_fingerprint (SHA256 of sorted CREATE statements). FTS3/4/5 detection: sqlite_fts_table_count, sqlite_fts_table_names. With --body, the body CEL variable is populated with the concatenated text from every FTS _content shadow table — body.contains("transformer") works inside browser history, chat archives, and any other FTS-backed store. Pure-Go via the modernc.org/sqlite driver in read-only immutable=1 mode (no journal / WAL touches). WAL sidecar (is_sqlite_wal): sqlite_wal_format_version, sqlite_wal_page_size, sqlite_wal_checkpoint_seq, sqlite_wal_frame_count, sqlite_wal_byte_order (be / le — checksum byte order). SHM sidecar (is_sqlite_shm): extension-only detection, no extra fields. Sidecars deliberately do NOT fire is_sqlite / is_database — they accompany a database, they aren't one.
3D models model3d_format (stl / obj / gltf), vertex_count, face_count, has_normals, has_textures, materials (list — OBJ usemtl / glTF materials[].name), bounding_box ([minX, minY, minZ, maxX, maxY, maxZ]). Binary STL reads counts O(1) from the header; glTF reads counts + bbox from the accessor table (no buffer decode). Predicates: is_3d_model (umbrella), is_stl, is_obj, is_gltf.
Project context module, go_version, base_image, project_types, project_type (the last two populated by --resolve-projects)
Git metadata git_last_commit_time, git_last_commit_author, git_last_commit_subject, git_first_seen, git_commit_count, is_git_tracked, is_git_ignored — populated when --with-git (CLI) / with_git: true (MCP) is set AND the walk root is inside a git working tree. One git log pass per walk root via the gitmeta package — cheap up front, free per-file lookup. Use for repo-aware queries that filesystem mod_time can't answer on a fresh clone (every file's mtime is checkout time). Examples: git_last_commit_time > timestamp("2026-05-01T00:00:00Z") (recently edited), is_source && git_commit_count > 50 (high-churn / hot files), is_source && is_git_tracked && !is_test_file (production code only). Silent no-op when the root isn't a git tree or when git isn't on PATH. Issue #271.

Built-in CEL functions

Function Returns What it does
levenshtein(a, b) int Edit distance, rune-aware
soundex(s) string NARA-standard phonetic 4-char code
ngrams(s, n) list<string> Character n-grams as a list
ngram_similarity(a, b, n) double Jaccard similarity over n-gram sets, 0.0–1.0
point_in_polygon(lat, lon, polygon) bool Ray-casting; polygon is a flat lat,lon,lat,lon,… list
image_similar_to(phash, ref_path, threshold) bool Perceptual image similarity via pHash Hamming distance; auto-enables --with-phash
has_secrets(body) bool True when the body contains a credential / token / key (AWS, GitHub, Slack, Stripe, PEM, JWT, …). Requires --body
secret_kinds(body) list<string> The secret categories matched in the body — ["aws-access-key", "private-key-pem", …]. Requires --body

CEL's standard string methods (contains, startsWith, endsWith, matches, size) work on every string attribute. Recipes: examples/fuzzy-search.md.

MCP server mode

The same binary can run as a Model Context Protocol server, exposing the search to any MCP-compatible client (Claude Desktop, IDE plugins, agents). Three transports:

file-search-on mcp                                       # stdio (default; for desktop clients)
file-search-on mcp --transport http --addr :8080         # Streamable HTTP (MCP 2025-03-26)
file-search-on mcp --transport sse  --addr :8080         # HTTP+SSE (DEPRECATED — MCP 2024-11-05)
file-search-on mcp --timeout 90s                         # raise the per-call default (60s out of the box)
Transport Spec version When to use
stdio all Desktop clients (Claude Desktop, IDE plugins) — the agent spawns the binary as a subprocess.
http 2025-03-26 Network-accessible servers, multi-client, or Docker deployments.
sse 2024-11-05 Legacy clients only. The HTTP+SSE transport was deprecated in the 2025-03-26 spec; new deployments should pick http.

For HTTP and SSE, --addr (default :8080) is the bind address and --path (default /) is the URL prefix. --timeout (default 60s) sets the per-tool-call deadline; per-call timeout_seconds on the search tool input overrides it.

Twenty tools are exposed, grouped by family:

Search & inspect

Tool What it does
search CEL expression over a directory tree. Supports sort_by / limit (top-K), rank (custom CEL sort key), include_body (full body filter), include_snippet (preview), ocr_images (run OCR before evaluating), with_phash (perceptual hash + image_similar_to function), compute_hashes, check_disguised, with_xattrs, resolve_projects, prune_build_artefacts, fields (token-saving projection — path / content_type / size always-on). Returns matches with the full attribute set + partial-result fields.
search_semantic Natural-language similarity search via local Ollama embeddings. Pre-prunes with an optional expr, embeds the query, ranks files by cosine similarity, applies a threshold cap. Embeddings cache per file.
read_attributes Attributes for a single path — same shape as one search match. Accepts fields for token-saving projection.
read_lines A specific line range of a file — pairs with search for context around matches.

Aggregate

Tool What it does
stats Histogram + totals for a directory tree, bucketed by group_by (default content_type; recognised: ext, dir, language, camera_make, camera_model, lens, artist, album, genre, time buckets like taken_at_month, …).

Dedup & diff

Tool What it does
find_duplicates Byte-identical files keyed by sha256 — two-pass (size-bucket then hash). Sorted by wasted_bytes desc.
find_near_duplicates Similar files by SimHash fingerprint of extracted body. Catches typo edits, regenerated headers, template copies. Configurable similarity threshold (default 0.85).
diff_trees Cross-tree set operations by sha256 content hash — a-minus-b, b-minus-a, intersect, union, mismatch (same relative path, different content). Read-only; never mutates either tree.

Archive

Tool What it does
list_archive_contents Per-entry CEL filtering inside ZIP / TAR / TAR.GZ / GZIP without extracting. Same vocabulary as top-level search; cache-aware.
read_file_in_archive Read one named entry's bytes out of an archive. Returns content + content_type + attributes.

Pattern + watch

Tool What it does
find_matches Line-level regex (RE2) hits across a tree with context_before / context_after windows. CEL pre-prune (e.g. is_source && language == "go") keeps the regex pass narrow. Replaces the search-then-read_lines dance with one call.
watch_search Bounded "tell me when X appears" subscription — block up to duration_seconds (default 30, capped at 600), return every new / changed file that matches the CEL filter.

Project + introspection + monitoring

Tool What it does
detect_project Project type(s) of one directory.
find_projects Walk a tree, list every project subdirectory.
resolve_project_for_path Walk UP from a file/dir path to the nearest enclosing project root. Useful when an agent has a stray path and needs to know the project context.
list_attributes The full canonical schema (common, type_specific, frontmatter, functions) plus registered content types.
list_presets Discover the eight built-in named search recipes (recent_changes, recent_photos, old_drafts, large_files, large_binaries, suspicious_files, failed_tests, system_metadata).
query_preset Run a named preset; per-call overrides for dir, limit, excludes, etc.
index_stats Cache counters for the running server (hits, misses, puts, stales, errors; same for body + embedding caches).
monitor_info This server's monitoring-dashboard URL + the registry of sibling instances. Pass enable: true to start the dashboard on demand if it isn't already running.

Every walking tool (search, stats, find_duplicates, find_near_duplicates, find_matches, find_projects, diff_trees) honours the same partial-result contract: on timeout the call returns cancelled=true with the results gathered so far, never an error. Agents inspect the flag rather than catching exceptions.

Since v0.64.0 the on-disk index is on by default. The MCP server (like every other long-running subcommand) auto-creates a per-cwd bbolt cache at <UserCacheDir>/file-search-on/indexes/<basename>-<sha1[:6]>.db — repeated search / read_attributes calls against unchanged files skip parsing entirely. The default path is per-cwd so concurrent agents in different projects never collide; same-cwd contention falls back gracefully to in-memory (logged on stderr, surfaced on the dashboard as index_fallback_reason: "lock_contention"). Override with --index-path; opt out with --no-index for hermetic CI runs:

file-search-on mcp                                                       # default: per-cwd persistent cache
file-search-on mcp --index-path /var/lib/fso.db                          # explicit path (e.g. shared across cwd)
file-search-on mcp --no-index                                            # in-memory only (process lifetime)
file-search-on mcp --transport http --addr :8080

Example Claude Desktop entry in claude_desktop_config.json (stdio):

{
  "mcpServers": {
    "file-search-on": {
      "command": "file-search-on",
      "args": ["mcp"]
    }
  }
}

For HTTP-based clients, point at http://<host>:<port>/ after starting the server with --transport http.

Built on github.com/modelcontextprotocol/go-sdk.

Monitoring dashboard

Both long-running modes (mcp and watch) expose a read-only monitoring dashboard. Since v0.65.0 it's on by default on a dynamic OS-assigned localhost port — many concurrent stdio agents each get their own dashboard without colliding. The server binds 127.0.0.1 only (the host part of any address is ignored — only the port is used), needs no auth, and adds no dependencies — the UI is a single embedded page that polls a small JSON API.

file-search-on mcp                                            # default: dashboard on dynamic port
file-search-on mcp --monitor-addr :9090                       # pin a fixed port instead
file-search-on mcp --no-monitor                               # opt out (hermetic CI / sandboxed runs)
file-search-on mcp --transport http --addr :8080              # dashboard still auto-starts
file-search-on watch 'is_image' -d ~/Screenshots              # default: dashboard auto-starts
file-search-on monitors                                       # list active dashboards across all instances

Find the URL in the stderr log line (monitor dashboard: http://127.0.0.1:<port>/), via the monitors subcommand, or — for an mcp server — by calling the monitor_info MCP tool, which also reports sibling instances. The legacy --monitor bool is kept as a no-op for back-compat (same effect as no flag).

Open the URL. Five panels:

  • Overview — version, uptime, run mode, PID / Go version / GOMAXPROCS, default worker count, index backend (🔒 persistent path / 🧠 in-memory with reason — --no-index opt-out or lock-contention fallback), body-cache cap.
  • Cache — the attribute / body / embedding cache counters as live cards with derived hit-rate % and sparklines; body evictions / oversize rejects / embed model-mismatches flagged.
  • Activity — live MCP tool-call feed (tool, elapsed, outcome, result count), per-tool call / error / cancel counts and p50 / p95 / max latency, and an in-flight gauge. (Watch mode has no MCP calls, so this panel shows a notice.)
  • Capabilities — registered content types grouped by family, project types, OCR provider availability, embedder model / server + a reachability check.
  • Peer switcher — when more than one instance is running, a header dropdown lists every sibling dashboard (mode · working dir · port) and switches to it. Instances discover each other through a shared registry under the user cache dir; crashed instances self-prune.

Multiple concurrent instances

Each instance with a dashboard registers itself, so they're mutually discoverable. For mcp servers, the monitor_info tool is the entry point: it returns this server's dashboard URL + the peer list, and monitor_info{enable:true} starts the dashboard on demand (a dynamic port) even if the server was launched without a monitor flag. That makes monitoring reachable per-agent without editing every launch config.

The JSON API is scriptable too: curl -s localhost:<port>/api/cache | jq, plus /api/overview, /api/activity, /api/capabilities, /api/peers, and /healthz (liveness). See examples/monitoring.md.

Contributing

The project is small enough to read in an afternoon and welcoming to first-time contributors. See CONTRIBUTING.md for setup, branch/commit conventions, the local CI matrix, and PR expectations. A few quick entry points:

  • Open issues filtered by good first issue, help wanted, enhancement.
  • New content type or CEL function? CLAUDE.md has step-by-step recipes — search for "Adding a new content type" and "Adding a CEL function".
  • Security issue? Please don't open a public issue — see SECURITY.md for the private reporting channel.

Local CI matrix:

go build ./...
go test -race ./...
go vet ./...
golangci-lint run
go fix -diff ./...   # CI enforces empty diff

That's the whole CI matrix locally. Tests run in under 10 seconds; the race detector is on by default.

Architecture map

CLAUDE.md is the canonical architecture map — five internal packages, the CEL evaluator's data shape, the walker's cancellation contract, the MCP server's tool surface, the release pipeline, and where every gotcha is documented. Written for both human and LLM contributors; either audience should find it readable.

The repo also ships with .claude/skills/ — step-by-step templates for the repetitive contributions: adding a content type, extending the CEL schema, adding an MCP tool, cutting a release. Useful whether you're working solo or pairing with an LLM agent.

Releases

Tag-driven via GoReleaser v2 + ko. Pushing vX.Y.Z to main triggers six platform archives, an OCI image at ghcr.io/richardwooding/file-search-on:X.Y.Z, and an auto-commit to the Homebrew tap. Full pipeline documented in CLAUDE.md § Releases.

License

MIT

Reviews (0)

No results found