2025-08-13

Yams

YAMS (Yet Another Memory System) started as a practical need: I wanted a dead-simple way to store and retrieve files, snippets, and research for LLM-driven workflows — without losing context between sessions. What I use daily is now ready to share.

Note: YAMS is v0.3.x - experimental software under active development, not yet production-ready.

Why YAMS?

Persistent memory for LLMs and tools
Content-addressed storage (SHA-256) with block-level deduplication (Rabin)
Compression (zstd/LZMA), crash safety (WAL)
Fast search: SQLite FTS5 and semantic vector search
Simple CLI and TUI; MCP server for Claude/Desktop and other MCP clients

Quick install

Docker:

docker run --rm -it ghcr.io/trvon/yams:latest --version

Or download native binary:

# Example for macOS ARM64 (see releases for other platforms)
curl -L https://github.com/trvon/yams/releases/latest/download/yams-macos-arm64.zip -o yams.zip
unzip yams.zip && sudo mv yams /usr/local/bin/

Five-minute tour

Initialize storage and add content from stdin:

yams init --non-interactive
echo "Important: vector search todo" | yams add - --tags notes,todo

Search and retrieve with clean outputs for scripting:

yams search "vector search" --limit 5 --json | jq .
yams list --format minimal | head -1 | xargs yams get

Batch operations with standard tools:

yams list --format minimal | tail -5 | while read h; do yams get "$h"; done

What’s under the hood

Content-Addressed Storage with SHA-256

The core storage layer implements immutable content addressing where every piece of data is identified by its SHA-256 hash. Here’s how the storage result tracking works:

// From include/yams/api/content_store.h
struct StoreResult {
    std::string contentHash;               // SHA-256 hash of the content
    uint64_t bytesStored = 0;             // Total bytes stored
    uint64_t bytesDeduped = 0;            // Bytes saved by deduplication
    std::chrono::milliseconds duration{0}; // Operation duration

    // Calculate deduplication ratio
    [[nodiscard]] double dedupRatio() const noexcept {
        return bytesStored > 0 ?
            static_cast<double>(bytesDeduped) / bytesStored : 0.0;
    }
};

This approach guarantees content integrity and enables automatic deduplication—identical content gets the same hash, stored only once.

Rabin Fingerprinting for Block-Level Deduplication

Beyond document-level deduplication, YAMS uses Rabin fingerprinting to identify common subsequences across different documents. The implementation delivers ~187 MB/s chunking throughput (benchmarked on 1MB files), achieving 30-40% deduplication on typical development datasets while enabling real-time chunking at gigabit ingestion speeds.

// From src/chunking/rabin_chunker.cpp
static constexpr uint64_t POLYNOMIAL = 0xbfe6b8a5bf378d83ULL;
static constexpr size_t MIN_CHUNK_SIZE = 2048;    // 2KB minimum
static constexpr size_t AVG_CHUNK_SIZE = 8192;    // 8KB average
static constexpr size_t MAX_CHUNK_SIZE = 65536;   // 64KB maximum

The chunker creates stable boundaries across document revisions, ensuring that small edits don’t invalidate large portions of stored chunks. This is particularly effective for academic papers, code repositories, and documentation that evolves over time.

Vector Search Architecture

The semantic search system uses a modular pipeline that processes text into vector embeddings:

┌─────────────────────────────────────────────────────────────┐
│                     Application Layer                         │
├─────────────────────────────────────────────────────────────┤
│                    Semantic Search API                        │
├────────────┬────────────┬────────────┬────────────┬─────────┤
│  Document  │ Embedding  │   Vector   │   Model    │ Hybrid  │
│  Chunker   │ Generator  │  Database  │  Registry  │ Search  │
├────────────┴────────────┴────────────┴────────────┴─────────┤
│                    Vector Index Manager                       │
├───────────────────────────────────────────────────────────────┤
│               Storage Layer (LanceDB/Arrow)                   │
└───────────────────────────────────────────────────────────────┘

The hybrid search engine balances full-text and semantic results:

// From include/yams/search/hybrid_search_engine.h
struct HybridSearchConfig {
    // Weight distribution (should sum to 1.0)
    float vector_weight = 0.6f;      // Semantic similarity
    float keyword_weight = 0.35f;    // Keyword/FTS5 matches  
    float kg_entity_weight = 0.03f;  // Knowledge graph entities
    float structural_weight = 0.02f; // Graph structure score
    
    // Search parameters
    size_t vector_top_k = 50;        // Number of vector results
    size_t keyword_top_k = 50;       // Number of keyword results
};

Compression and Storage Efficiency

The system employs zstd for fast compression (achieving up to 20 GB/s throughput for 1MB blocks at level 1) and LZMA for archival paths where maximum compression ratio matters. Level 3 provides the optimal balance at 13.5 GB/s with excellent compression ratios. Combined with block-level deduplication, this creates a storage pipeline optimized for both speed and efficiency.

Write-Ahead Logging for Durability

Every operation goes through a WAL that ensures crash safety:

// From include/yams/wal/wal_entry.h
struct WALEntry {
    uint64_t sequence_number;        // Unique sequence ID
    OperationType operation;         // Type of operation
    std::string content_hash;        // Content identifier
    std::vector<uint8_t> data;      // Operation data
    std::chrono::time_point timestamp; // When logged
    
    // Ensures durability before returning to caller
    bool sync_required = true;
};

Operations are logged before execution, ensuring that incomplete writes can be rolled back or completed during recovery.

The codebase is C++20 with a focus on performance. Conan + CMake drive portable builds across Linux, macOS, and Windows.

For LLMs and agents

MCP Server Integration

YAMS includes a Model Context Protocol (MCP) server that provides standardized tool interfaces for AI agents. The transport layer supports both stdio and WebSocket connections:

// From include/yams/mcp/mcp_server.h
/**
 * Transport interface for MCP communication
 */
class ITransport {
public:
    virtual ~ITransport() = default;
    virtual void send(const json& message) = 0;
    virtual json receive() = 0;
    virtual bool isConnected() const = 0;
    virtual void close() = 0;
};

/**
 * Standard I/O transport (default for MCP)
 */
class StdioTransport : public ITransport {
    // Implementation for Claude Desktop integration
};

This enables seamless integration with Claude Desktop and other MCP-compatible tools, allowing agents to search, store, and retrieve knowledge without manual CLI operations.

Workflow Patterns

In practice, YAMS becomes the central memory system for development workflows. Here’s how I use it daily:

Knowledge-First Development

# Always search YAMS before external research
yams search "vector database performance" --limit 20
yams search "optimization techniques" --fuzzy --similarity 0.7

# Cache external findings immediately
curl -s https://docs.example.com/api | yams add - \
  --tags "api-docs,external" \
  --metadata "source=example-api,fetched=$(date -Iseconds)"

Code Evolution Tracking

# Remember decisions and diffs with context
git diff | yams add - --tags "git-diff,$(date +%Y%m%d)" \
  --metadata "branch=$(git branch --show-current)"

echo "Chose WAL over fsync for perf; revisit thresholds" | \
  yams add - --tags decision,storage

# Re-index after major changes (until folder watching is available)
yams add src/ --recursive --include="*.cpp,*.hpp,*.h" --tags "code,source"
yams add include/ --recursive --include="*.hpp,*.h" --tags "code,headers"

Task and PBI Management

# Track project backlog items with structured metadata
yams add pbi-001-enhanced-search.md \
  --tags "pbi,backlog,feature" \
  --metadata "status=active" \
  --metadata "priority=high" \
  --metadata "sprint=2024-Q1"

# Find related work before starting new features
yams search "search optimization" --paths-only
yams grep "class.*SearchEngine" --include="**/*.hpp,**/*.cpp"

Efficient LLM Context Management

# Get only file paths for context efficiency
yams search "SearchCommand" --paths-only
yams search "#include" --paths-only | head -10

# Restore specific versions by content hash
yams get --name src/indexing/pipeline.cpp -o ./restored/pipeline.cpp

The hybrid search (60% semantic, 35% keyword, 5% knowledge graph by default) adapts to different query types automatically. Whether you’re looking for “performance optimization ideas” or specific function signatures like dedupRatio(), the system finds relevant content without requiring perfect keyword matches.

Using YAMS from code

C++ API Integration

The content store API provides both synchronous and asynchronous interfaces for embedding YAMS into applications:

#include <yams/api/content_store.h>

auto store = yams::api::createContentStore(getenv("YAMS_STORAGE"));

// Store with deduplication tracking
yams::api::ContentMetadata meta{.tags = {"code", "v1.0"}};
auto result = store->store("file.txt", meta);

// Monitor storage efficiency
std::cout << "Stored: " << result.bytesStored << " bytes\n";
std::cout << "Deduped: " << result.bytesDeduped << " bytes\n";
std::cout << "Ratio: " << result.dedupRatio() << "\n";

// Hybrid search with configurable weights
auto searchConfig = yams::search::HybridSearchConfig{
    .vector_weight = 0.6f,       // Semantic similarity
    .keyword_weight = 0.35f,     // Keyword matches
    .kg_entity_weight = 0.03f,   // Knowledge graph
    .structural_weight = 0.02f   // Graph structure
};
auto results = store->search("query", 10, searchConfig);
if (!results) {
    std::cerr << "Search failed: " << results.error().message << "\n";
    return;
}

// Retrieve with integrity verification
store->retrieve(result.contentHash, "output.txt");

Performance Characteristics

The system is designed for real-world workloads (benchmarked on Apple M3 Max):

Hashing: 2.66 GB/s sustained for large files (SHA-256 with hardware acceleration)
Chunking: 187 MB/s with Rabin fingerprinting for 1MB files
Compression: 20 GB/s (zstd level 1), 13.5 GB/s (level 3 balanced)
Parallel: Linear scaling to 16 threads, 41.2 GB/s peak throughput
Storage: 30-40% deduplication on typical development datasets
Search: Sub-second response with FTS5 and vector search

Performance Context: Benchmarks from v0.1.5+ testing (August 2025) on Apple M3 Max with 16 cores and hardware SHA acceleration. Real-world performance varies by hardware and workload.

Or just script via CLI — it’s designed to be composable and plays well with existing Unix toolchains.

Roadmap

Stable v1.0 release with production hardening
Fix vector database operations (currently improving test coverage)
Better semantic indexing defaults
Folder watching and incremental indexing
Multi-server sync and replication
Comprehensive test coverage (target: >90%)

Thanks

This project grew from day-to-day usage needs. If you find it useful, stars and issues help shape the roadmap. PRs welcome.

Repo: https://github.com/trvon/yams
Docs: https://yamsmemory.ai
License: Apache-2.0