monkeSearch

Offline-First Semantic Search for Your Local Files

Abstract

monkeSearch is an open-source, offline-first desktop search tool that lets you find local files using natural language. Instead of relying on exact filenames, regex, or folder browsing, you just describe what you're looking for and when — the system finds it. Nothing leaves your machine.

Any natural language file search query can be broken down into three constituents: file type (what kind of file — pdf, image, code, etc.), temporal context (when — 3 days ago, last week, etc.), and misc keywords (remaining context — project name, topic, etc.). The first implementation used a local LLM to extract these constituents and convert them directly into macOS Spotlight query arguments. The current main branch achieves the same task using vector databases instead, making monkeSearch cross-platform. Both approaches are fully offline.

The First Implementation: LLM → Spotlight

The original vision behind the project: use a local LLM to convert a natural language query directly into arguments for macOS's built-in Spotlight search — no vector database, no embeddings index, no metadata dump. Just natural language in, structured OS-level query out, instant results back.

A user writes a query like "python scripts from 3 days ago". Stop words are stripped, then a local LLM (Qwen3-0.6B running via llama.cpp) parses the cleaned query and extracts structured components using constrained JSON output:

{
  "file_types": ["py"],
  "time_unit": "days",
  "time_unit_value": "3",
  "is_specific": true,
  "source_text": {
    "file_types": "python scripts",
    "time_unit": "3 days ago"
  }
}

The LLM understands that "python scripts" → .py, "images" → jpg,png, "yesterday" → days,1, "last week" → weeks,1, etc. These extracted components are then converted into NSMetadataQuery predicates — the same API that powers Spotlight and mdfind. File types are mapped to UTIs via utitools and used as kMDItemContentTypeTree predicates; temporal data becomes kMDItemFSContentChangeDate date predicates; remaining keywords match against kMDItemTextContent and kMDItemFSName. All predicates are combined with NSCompoundPredicate and the compound query runs against Spotlight's existing index — results come back instantly since macOS already maintains the index. There's no index to build, no embeddings to generate. The LLM is the only "intelligence" layer, and the approach is safe by design (read-only, scoped through Spotlight's own access controls).

Two LLM-based branches exist: the LangExtract implementation using LangExtract with a local Llama server for structured extraction, and the llama.cpp direct implementation (legacy-main-llm-implementation branch) using llama_cpp.Llama directly with constrained JSON output via Pydantic schema. Both use the same parser.py to convert structured output into NSMetadataQuery predicates.

Current Implementation: Vector DB

The current main branch achieves the same functionality using vector databases instead of a live LLM at query time. This was built to make monkeSearch cross-platform (the LLM → Spotlight approach is macOS-only) and to make search faster since it doesn't need an LLM running. The tradeoff: you need to build and maintain an index, but search is sub-second. Platform-specific metadata extraction feeds into embedding generation (default: facebook/contriever), with Mac/Linux using LEANN (graph-based vector index with 97% storage savings) and Windows using ChromaDB. Temporal expressions are parsed via regex into ISO timestamp ranges and applied as filters on the semantic search results.

For Agentic Use: The LLM-based implementations are particularly suitable for integration into larger AI pipelines and agentic systems. They provide a direct LLM-to-filesystem bridge through natural language without modifying any files, leveraging OS-level scoped safety through Spotlight. If you're building autonomous agents or LLM orchestration systems that need file discovery capabilities, these branches give you that without the overhead of maintaining a separate index.

Implementation Versions: Multiple implementations exist across different branches for evaluation purposes. Rigorous testing will be done before finalizing a single approach for the main release.

Cross-Platform Architecture

Performance: macOS vs. Windows

Query Time Comparison Chart

Query Time: macOS demonstrates significantly faster average search times due to its LEANN backend.

Index Size Comparison Chart

Index Size: On-disk size of the embedding database for different numbers of files.

Indexing Speed Comparison Chart

Indexing Speed: How many files per second each system can process during the initial build.

macOS Deep Dive: The Recompute Trade-Off

Query Time with Recompute Chart

Search Speed: Disabling recompute yields lightning-fast searches (milliseconds), while enabling it dramatically slows down queries (seconds).

Index Size with Recompute Chart

Space Savings: Enabling recompute creates a tiny index (over 97% smaller), saving significant disk space.

Indexing Speed with Recompute Chart

Build Speed: The initial indexing speed is nearly identical, as the main workload (embedding generation) is the same in both modes.

The "Recompute" feature on macOS offers a clear trade-off: enable it to save a massive amount of disk space at the cost of much slower search performance. Disable it for instant results, but with a larger storage footprint.

Support the Project

monkeSearch is an open source project. If you find it useful, please consider starring our repository on GitHub to show your support!

https://github.com/monkesearch/monkeSearch