pgvector
extension in PostgreSQL is used to efficiently store and query vector data. The pgvector
extension provides
PostgreSQL with the ability to store and perform operations on vectors directly within the database.
Nile supports pgvector
out of the box on the latest version - 0.8.0
.
Pgvector lets you store and query vectors directly within your usual Postgres database - with the rest of your data. This is both convenient and efficient. It supports:
- Exact and approximate nearest neighbor search (with optional HNSW and IVFFlat indexes)
- Single-precision, half-precision, binary, and sparse vectors
- L2 distance, inner product, cosine distance, L1 distance, Hamming distance, and Jaccard distance
- Any language with a Postgres client Plus ACID compliance, point-in-time recovery, JOINs, and all of the other great features of Postgres
Create tenant table with vector type
Vector types work like any other standard types. You can make them the type of a column in a tenant table and Nile will take care of isolating the embeddings per tenant.Store vectors per tenant
Once you have the table defined, you would want to populate the embeddings. Typically, this is done by querying a large language model (eg. OpenAI, HuggingFace), retrieving the embeddings and storing them in the vector store. Once stored, the embeddings follow the standard tenant rules. They can be isolated, sharded and placed based on the tenant they belong to.Query vectors
Pgvector supports 6 types of vector similarity operators:Operator | Name | Description | Use Cases |
---|---|---|---|
<-> | vector_l2_ops | L2 distance. Measure of the straight-line distance between two points in a multi-dimensional space. It calculates the length of the shortest path between the points, which corresponds to the hypotenuse of a right triangle. | Used in clustering, k-means clustering, and distance-based classification algorithms |
<#> | vector_ip_ops | Inner product. The inner product, also known as the dot product, measures the similarity or alignment between two vectors. It calculates the sum of the products of corresponding elements in the vectors. | Used in similarity comparison or feature selection. Note that for normalized vectors, inner product will result in the same ranking as cosine distance, but is more efficient to calculate. So this is a good choice if you use an embedding algorith that produces normalized vectors (such as OpenAI’s) |
<=> | vector_cosine_ops | Cosine distance. Cosine distance, often used as cosine similarity when measuring similarity, quantifies the cosine of the angle between two vectors in a multi-dimensional space. It focuses on the direction rather than the magnitude of the vectors. | Used in text similarity, recommendation systems, and any context where you want to compare the direction of vectors |
<+> | vector_l1_ops | L1 distance. The L1 distance, also known as the Manhattan distance, measures the distance between two points in a grid-like path (like a city block). It is the distance between two points measured along axes at right angles. | Less sensitive to outliers than L2 distance and according to some research, better for high-dimensional data. |
<~> | bit_hamming_ops | Hamming distance. The Hamming distance measures the number of positions at which the corresponding symbols are different. | Used with binary vectors. Mostly for discrete data like categories. Also used for error-correcting codes and data compression. |
<%> | bit_jaccard_ops | Jaccard distance. Measures similarity between sets by calculating the ratio of the intersection to the union of the two sets (how many positions are the same out of the total positions). | Used with binary vectors. Useful for comparing customers purchase history, recommendation systems, similarites in terms used in different texts, etc. |
<#>
returns the negative inner product)
Vector Indexes
pgvector
supports two types of indexes:
- HNSW
- IVFFlat
HNSW
An HNSW index creates a multilayer graph. It has slower build times and uses more memory than IVFFlat, but has better query performance (in terms of speed-recall tradeoff). There’s no training step like IVFFlat, so the index can be created without any data in the table. When creating HNSW, you can specify the maximum number of connections in a layer (m
) and the number of candidate vectors considered
when building the graph (ef_construction
). More connections and more candidate vectors will improve recall but will increase build time and memory.
If you don’t specify the parameters, the default values are m = 16
and ef_construction = 64
.
Add an index for each distance function you want to use.
hnsw_ef
):
IVFLAT
An IVFFlat index divides vectors into lists, and then searches a subset of those lists that are closest to the query vector. It has faster build times and uses less memory than HNSW, but has lower query performance (in terms of speed-recall tradeoff). Three keys to achieving good recall are:- Create the index after the table has some data
- Choose an appropriate number of lists - a good place to start is
rows / 1000
for up to 1M rows andsqrt(rows)
for over 1M rows. - When querying, specify an appropriate number of probes (higher is better for recall, lower is better for speed) - a good place to start is
sqrt(lists)
Filtering
Typically, vector search is used to find the nearest neighbors, which means that you would limit the number after ordering by distance:0.8.0
, you can enable iterative index scans, which will automatically scan more of the index when needed.
Iterative Index scans
Using iterative index scans, Postgres will scan the approximate index for nearest neighbors, apply additional filters and, if the number of neighbors after filtering is insufficient, it will continue scanning until sufficient results are found. Each index has its own configuration (GUC) for iterative scans:hnsw.iterative_scan
and ivfflat.iterative_scan
.
By default both configurations are set to off
.
HNSW indexes support both relaxed and strict ordering for the iterative scans. Strict order guarantees that the returned results are ordered by exact distance.
Relaxed order allows results that are slightly out of order, but provides better recall (i.e. fewer missed results due to the approximate
nature of the index).
Quantization
Introduced in pgvector 0.7.0 Quantization is a technique of optimizing vector storage and query performance by using fewer bits to store the vectors. By default, pgvector’s Vector type is in 32-bit floating point format. Thehalfvec
data type uses 16-bit floating point format, which has the following benefits:
- Reduced storage requirements (half the memory)
- Faster query performance
- Reduced index size (both in disk and memory)
- Can index vectors with up to 4096 dimensions (which covers the most popular embedding models)
halfvec
, you can create a table with the halfvec
type:
halfvec
in the query:
halfvec
, you need to specify the distance function as halfvec_l2_ops
or halfvec_cosine_ops
.
Sparse Vectors
Introduced in pgvector 0.7.0 Sparse vectors are vectors in which the values are mostly zero. These are common in text search algorithms, where each dimension represents a word and the value represents the relative frequency of the word in the document - BM25, for example. Some embedding models, such as BGE-M3, also use sparse vectors. Pgvector supports sparse vector typesparsevec
and the associated similarity operators.
Because sparse vectors can be extremely large but most of the values are zero, pgvector stores them in a compressed format.
{index1:value1,index2:value2,...}/N
, where N is the number of dimensions and the indices start from 1 (like SQL arrays).
Because the format is a bit unusual, it is recommended to use pgvector’s libraries for your favorite language
to insert and query sparse vectors.