Trading Learned Embeddings for Determinism and Speed in YAMS # YAMS started with a goal, experiment with a local memory system that was cli first, that reduced the amount of tokens it took models to find materials for a task. Then it evolved into a learning endeavor as I attempted to ship a working hybrid search engine that is fast enough to run locally, predictable enough to tune, and work generally with how I used models. To get there quickly, we initially leaned on a conventional learned-embedding stack via ONNX Runtime for embedding generation and re-ranking. It worked, and it allowed us to prove out ingestion, storage, and retrieval end-to-end with a simple cli and mcp scheme.
It’s been about seven months since I last posted here (about YAMS), so I wanted to share a quick recap of what I’ve been up to. If you measure that gap in AI progress, we’ve seen an incredible acceleration in both performance and the pace of advancement.
In the time that I’ve been absent, I’ve been working on pushing YAMS forward. I’ve learned a lot of hard-won lessons about building a memory system for AI in C++. In terms of where I want to be, I’m about 75% of the way to getting the project to a place where it’s both usable and beneficial for others.
YAMS (Yet Another Memory System) started as a practical need: I wanted a dead-simple way to store and retrieve files, snippets, and research for LLM-driven workflows — without losing context between sessions. What I use daily is now ready to share.
Note: YAMS is v0.7.x - experimental software under active development, not yet production-ready.
Updated docs (2025-10-13):
Site: https://yamsmemory.ai CLI: https://yamsmemory.ai/user_guide/cli/ MCP server: https://yamsmemory.ai/user_guide/mcp/ Deployment: https://yamsmemory.ai/operations/deployment/ Why YAMS? # Persistent memory for LLMs and tools Content-addressed storage (SHA-256) with block-level deduplication (Rabin) Compression (zstd/LZMA), crash safety (WAL) Fast search: SQLite FTS5 and semantic vector search Simple CLI and TUI; MCP server for Claude/Desktop and other MCP clients Now with alpha plugin support Quick install # Docker:
So I wanted to post a follow-up to my previous introduction to the umbrix platform, where I was using DSPy for entity extraction in cyber threat intelligence. If you missed that, you can find it here . This recap comes a week after I launched the platform.
Two days into getting Umbrix running optimally, I started running into many problems. I have been extracting a significant volume of entities, relationships, and nodes from my agents to populate in the graph. However, I quickly realized my system was burning through resources - both in terms of rate limits and agent costs.
Over the past month, I’ve been working on a project for the Google ADK agent hackathon. This post provides an overview of my current multi-agent system, used for threat intelligence gathering, processing, and analysis.
The motivation for Umbrix emerged when I was using a small language model to find A LOT of pcaps. I was attempting to seed the model with instructions on how to google dork, and feeded it with search terms to expand find publically reachable network security datasets. From that experiement, it dawned on me, there are many interesting applications for LLM’s. From there, timing and motivation was on my side. I set out to build this system with a simple thesis : *if the future is truly agentic, there are small building blocks and systems that need to be built to improve the efficiently gather and organize sources into a graph. From here I set out to design / vibe-code a system, able to improve how we fundemntally access information from security feeds by creating agents to efficiently gathering and organized sources into a graph.