Multimodal Embedding & Reranker Models with Sentence Transformers
Sentence Transformers v5.4 introduces native multimodal embedding support, enabling text, images, audio, and video to share a unified vector space for cross-modal retrieval.
Hugging Face Blog · Thu, 09 Apr 2026 00:00:00 GMT