Simeon
Trading Learned Embeddings for Determinism and Speed in YAMS #
YAMS started with a goal, experiment with a local memory system that was cli first, that reduced the amount of tokens it took models to find materials for a task. Then it evolved into a learning endeavor as I attempted to ship a working hybrid search engine that is fast enough to run locally, predictable enough to tune, and work generally with how I used models. To get there quickly, we initially leaned on a conventional learned-embedding stack via ONNX Runtime for embedding generation and re-ranking. It worked, and it allowed us to prove out ingestion, storage, and retrieval end-to-end with a simple cli and mcp scheme.
However, as recently started to experiment with topology and our benchmark harnesses matured, a pattern became unavoidable: the learned embedding path was the least deterministic, the heaviest operational dependency, and the slowest lever in our retrieval tuning loop. This was easily noticeable when we ingested large corpuses.
In parallel, new research posted on arxiv —specifically NUMEN—demonstrated that “training-free n-gram hashing into high-dimensional space” can provide a legitimate retrieval signal. The counter argument, similar to learned systems, is that vector retrieval does not scale, but we did not need it to as we hopefully have answer to this problem. This hit exactly the shape of the problem YAMS cares about: local-first retrieval where throughput-per-watt and operational simplicity are paramount.
Our first internal attempts to implement this were “ship fast” prototypes—useful, but lacking the measurement discipline needed for a default. We iterated hard, extracted the result into a vendorable library, and built Simeon.
Simeon is now the default retrieval embedding backend in YAMS, or at least be after its better tested for v0.14.0.
Quick Links:
- Simeon Repository: github.com/trvon/simeon
- YAMS Repository: github.com/trvon/yams
What is Simeon? (And what it is not) #
Simeon maps arbitrary text to a fixed-width vector without any learned model weights. The pipeline looks like this:
text →→ char/word n-grams →→ hashed count-sketch →→ (optional) random projection →→ L2 normalize →→ (optional) PQ/ADC
This pipeline is entirely deterministic. Given the same config, seed, and text, it produces identical bytes across architectures, utilizing NEON/AVX2 kernels when available.
The Honest Positioning #
Simeon captures lexical and topical similarity, not paraphrase-level semantics.
It is not a drop-in semantic replacement for a heavy learned bi-encoder. Instead, it is a training-free retrieval backend that composes exceptionally well with BM25-style signals and serves as a high-performance first-stage recall layer. More details can we found in our github repository.
Why the Switch? #
The move to Simeon isn’t a “models are bad” take; it’s an engineering decision driven by three practical realities of local-first search.
1. Determinism for Benchmark Discipline #
YAMS is pushing toward deterministic search: fewer “it depends” outcomes and reproducible benchmark runs. This is critical for topology-aware retrieval and multi-run ablation studies. When your embedding backend is a learned model loaded via a runtime (subject to provider selection, threading, and GPU EPs), reproducibility is friction. Simeon removes that friction entirely.
2. Throughput-per-Watt vs. nDCG #
YAMS’ “accuracy-first” tuning loop requires running controlled experiments and promoting winners. You cannot iterate quickly if every single run takes minutes of embedding time. During our experiments, we roughly measured the throughput of embeddings generation for simple configuration to be the following on a M4 Max:
- ONNX Path: ~25–100ms per document (CPU).
- Simeon Path: 3k+ documents per second. Even if your final config requires a learned model for paraphrase-heavy queries, using Simeon in the inner loop changes the pace of generating embeddings usable in hybrid system.
3. Operational Simplicity as a Feature #
By making Simeon the default, the “normal” semantic search path now frees up VRAM to run local models. In yams, the ONNX plugin remains available for tasks where learned models are worth the complexity (e.g., GLiNER, ColBERT, or custom rerankers), but it is no longer a prerequisite for basic vector retrieval.
Simeon’s research notes are intentionally explicit about negative results and transfer limits. We will continue to post notes as we work through bug fixes and profiling. Please reach out if you have any interest in our project.