HybridTM - Semantic Translation Memory Engine

HybridTM logo

HybridTM is a semantic translation memory engine that stores bilingual content in LanceDB and scores matches by combining semantic embeddings (Xenova/Transformers.js) with the built-in MatchQuality fuzzy metric.

The source code is available on GitHub under the Eclipse Public License v1.0. Developers can clone, adapt, and ship the library under the terms of that license, or contact Maxprograms for commercial arrangements.

Highlights

  • Imports XLIFF 2.x, TMX 1.4b, and SDLTM files, preserving metadata, notes, and custom properties
  • Generates semantic vectors with any Xenova-compatible text model (default: HybridTM.QUALITY_MODEL, LaBSE)
  • Provides semanticTranslationSearch, semanticSearch, and concordanceSearch APIs with metadata-aware filtering
  • Streams data into LanceDB through a JSONL-based batch importer to keep memory usage predictable
  • Prevents duplicate segments by rewriting entries with deterministic IDs (fileId:unitId:segmentIndex:lang)

Models download automatically the first time you initialize an instance and are cached in the standard Hugging Face directory.

Requirements

  • Node.js 22 LTS or later
  • npm 11+
  • Disk space for both the LanceDB directory you choose and the embedding model cache