Availble models
Voyage has a collection of specialized models for embedding text from different domains: financial, legal (and large documents), code and medical. It also has highly ranked general embedding model that can be used for a variety of tasks, a general model that is optimized for retrieval, and a smaller cost-efficient retrieval model. We included a subset here for reference:Model | Dimensions | Max Tokens | Cost | MTEB Avg Score | Similarity Metric |
---|---|---|---|---|---|
voyage-large-2-instruct | 1024 | 16000 | $0.12 / 1M tokens | 68.28 | cosine, dot product, L2 |
voyage-2 | 1024 | 4000 | $0.1 / 1M tokens | cosine, dot product, L2 | |
voyage-code-2 | 1536 | 16000 | $0.12 / 1M tokens | cosine, dot product, L2 | |
voyage-law-2 | 1024 | 16000 | $0.12 / 1M tokens | cosine, dot product, L2 |
Usage
Voyage has a Python, but not a Javascript, SDK. Their REST API is almost compatible with OpenAI’s API, but unfortunately, their powerful general purpose model,voyage-large-2-instruct
requires inputType
parameter,
which is not supported by OpenAI’s SDK. Fortunately, LangChain has a nice community-contributed JS library for Voyage,
which supports the inputType
parameter. So we are going to use the LangChain community library in the example below.