HybridTM - Semantic Translation Memory Engine

HybridTM is a semantic translation memory engine that stores bilingual content in LanceDB and scores matches by combining semantic embeddings (Xenova/Transformers.js) with the built-in MatchQuality fuzzy metric.

The source code is available on GitHub under the Eclipse Public License v1.0. Developers can clone, adapt, and ship the library under the terms of that license, or contact Maxprograms for commercial arrangements.

Highlights

Imports XLIFF 2.x, TMX 1.4b, and SDLTM files, preserving metadata, notes, and custom properties
Generates semantic vectors with any Xenova-compatible text model (default: HybridTM.QUALITY_MODEL, LaBSE)
Provides semanticTranslationSearch, semanticSearch, and concordanceSearch APIs with metadata-aware filtering
Streams data into LanceDB through a JSONL-based batch importer to keep memory usage predictable
Prevents duplicate segments by rewriting entries with deterministic IDs (fileId:unitId:segmentIndex:lang)

Models download automatically the first time you initialize an instance and are cached in the standard Hugging Face directory.

Requirements

Node.js 22 LTS or later
npm 11+
Disk space for both the LanceDB directory you choose and the embedding model cache

Installation

HybridTM is available as an NPM package:

npm install hybridtm

HybridTM - Semantic Translation Memory Engine

Highlights

Requirements

Installation

Related Links