monkeSearch is an open-source, offline-first desktop search tool that lets you find local files using natural language. Instead of relying on exact filenames, regex, or folder browsing, you just describe what you're looking for and when — the system finds it. Nothing leaves your machine.
Any natural language file search query can be broken down into three constituents:
file type (what kind of file — pdf, image, code, etc.),
temporal context (when — 3 days ago, last week, etc.), and
misc keywords (remaining context — project name, topic, etc.).
The first implementation used a local LLM to extract these constituents and convert them directly into macOS Spotlight
query arguments. The current main branch achieves the same task using vector databases instead, making monkeSearch
cross-platform. Both approaches are fully offline.
The original vision behind the project: use a local LLM to convert a natural language query directly into arguments for macOS's built-in Spotlight search — no vector database, no embeddings index, no metadata dump. Just natural language in, structured OS-level query out, instant results back.
A user writes a query like "python scripts from 3 days ago". Stop words are stripped, then a local LLM
(Qwen3-0.6B running via llama.cpp) parses the cleaned query and extracts structured components using constrained JSON output:
{
"file_types": ["py"],
"time_unit": "days",
"time_unit_value": "3",
"is_specific": true,
"source_text": {
"file_types": "python scripts",
"time_unit": "3 days ago"
}
}
The LLM understands that "python scripts" → .py, "images" → jpg,png,
"yesterday" → days,1, "last week" → weeks,1, etc. These extracted components are then
converted into NSMetadataQuery predicates — the same API that powers Spotlight and mdfind.
File types are mapped to UTIs via utitools and used as kMDItemContentTypeTree predicates;
temporal data becomes kMDItemFSContentChangeDate date predicates; remaining keywords match against
kMDItemTextContent and kMDItemFSName. All predicates are combined with
NSCompoundPredicate and the compound query runs against Spotlight's existing index — results come
back instantly since macOS already maintains the index. There's no index to build, no embeddings to generate.
The LLM is the only "intelligence" layer, and the approach is safe by design (read-only, scoped
through Spotlight's own access controls).
Two LLM-based branches exist: the
LangExtract implementation
using LangExtract with a local Llama server for structured extraction,
and the llama.cpp direct implementation (legacy-main-llm-implementation branch) using
llama_cpp.Llama directly with constrained JSON output via Pydantic schema. Both use the same
parser.py to convert structured output into NSMetadataQuery predicates.
The current main branch achieves the same functionality using vector databases instead of a live LLM
at query time. This was built to make monkeSearch cross-platform (the LLM → Spotlight approach is macOS-only) and
to make search faster since it doesn't need an LLM running. The tradeoff: you need to build and maintain an index,
but search is sub-second. Platform-specific metadata extraction feeds into embedding generation
(default: facebook/contriever), with Mac/Linux using LEANN (graph-based vector index with 97% storage savings)
and Windows using ChromaDB. Temporal expressions are parsed via regex into ISO timestamp ranges and applied as filters
on the semantic search results.
For Agentic Use: The LLM-based implementations are particularly suitable for integration into larger AI pipelines and agentic systems. They provide a direct LLM-to-filesystem bridge through natural language without modifying any files, leveraging OS-level scoped safety through Spotlight. If you're building autonomous agents or LLM orchestration systems that need file discovery capabilities, these branches give you that without the overhead of maintaining a separate index.
Implementation Versions: Multiple implementations exist across different branches for evaluation purposes. Rigorous testing will be done before finalizing a single approach for the main release.
llama.cpp rewrite (legacy-main-llm-implementation) — deprecated but useful for direct LLM integration
llama.cpp feature branch — variation with a detailed response model
os.walk file system crawl and ChromaDB for vector storage and retrieval.
Query Time: macOS demonstrates significantly faster average search times due to its LEANN backend.
Index Size: On-disk size of the embedding database for different numbers of files.
Indexing Speed: How many files per second each system can process during the initial build.
Search Speed: Disabling recompute yields lightning-fast searches (milliseconds), while enabling it dramatically slows down queries (seconds).
Space Savings: Enabling recompute creates a tiny index (over 97% smaller), saving significant disk space.
Build Speed: The initial indexing speed is nearly identical, as the main workload (embedding generation) is the same in both modes.
The "Recompute" feature on macOS offers a clear trade-off: enable it to save a massive amount of disk space at the cost of much slower search performance. Disable it for instant results, but with a larger storage footprint.
monkeSearch is an open source project. If you find it useful, please consider starring our repository on GitHub to show your support!
https://github.com/monkesearch/monkeSearch