向量数据库:PGVector

锦通  论坛元老 | 2024-6-13 09:39:00 | 显示全部楼层 | 阅读模式
打印 上一主题 下一主题

主题 1023|帖子 1023|积分 3069

马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。

您需要 登录 才可以下载或查看,没有账号?立即注册

x
一、PGVector 介绍

        PGVector 是一个基于 PostgreSQL 的扩展插件,为用户提供了一套强盛的向量存储和查询的功能:


  • 准确和近似近来邻搜刮
  • 单精度(Single-precision)、半精度(Half-precision)、二进制(Binary)和稀疏向量(Sparse Vectors)
  • L2 隔断(L2 Distance)、内积(Inner Product)、余弦隔断(Cosine Distance)、L1 隔断(L1 Distance)、汉明隔断(Hamming Distance)和 Jaccard 隔断(Jaccard Distance)

  • 支持 ACID 事件、点时间规复、JOIN 操作,以及 Postgres 所有的其他优秀特性
二、安装 PGVector

2.1 安装 PostgreSQL

        PGVector是基于PostgreSQL的扩展插件,要利用PGVector需要先安装PostgreSQL(支持Postgres 12以上),PostgreSQL具体安装操作可参考:PostgreSQL根本操作。
2.2 安装 PGVector

   # 1.下载
  git clone --branch v0.7.0 https://github.com/pgvector/pgvector.git
  # 2.进入下载目录
cd pgvector
  # 3.编译安装
make && make install
  2.3 启用 PGVector

        登录PostgreSQL数据库,执行以下下令启用PGVector:
   CREATE EXTENSION IF NOT EXISTS vector;
  

三、PGVector 日常利用

3.1 存储数据

        创建向量字段:
   #建表时,创建向量字段
  CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
  #已有表,新增向量字段
  ALTER TABLE items ADD COLUMN embedding vector(3);
          插入向量数据:
   INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
          更新向量数据:
   UPDATE items SET embedding = '[1,2,3]' WHERE id = 1;
          删除向量数据:
   DELETE FROM items WHERE id = 1;
  3.2 查询数据

隔断函数  操作符函数隔断类型<-> l2_distance两个向量相减得到的新向量的长度<#>vector_negative_inner_product两个向量内积的负值<=>cosine_distance两个向量夹角的cos值<+> Get the nearest neighbors to a vector
   SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
  Get the nearest neighbors to a row
   SELECT * FROM items WHERE id != 1 ORDER BY embedding <-> (SELECT embedding FROM items WHERE id = 1) LIMIT 5;
  Get rows within a certain distance
   SELECT * FROM items WHERE embedding <-> '[3,1,2]' < 5;
  Get the distance
   SELECT embedding <-> '[3,1,2]' AS distance FROM items;
  For inner product, multiply by -1 (since <#> returns the negative inner product)
   SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items;
  For cosine similarity, use 1 - cosine distance
   SELECT 1 - (embedding <=> '[3,1,2]') AS cosine_similarity FROM items;
  Average vectors
   SELECT AVG(embedding) FROM items;
  Average groups of vectors
   SELECT category_id, AVG(embedding) FROM items GROUP BY category_id;
  3.3 HNSW 索引

        HNSW索引创建了一个多层图。在速度-召回权衡方面,它的查询性能优于IVFFlat,但构建时间较慢且占用更多内存。另外,由于没有像IVFFlat那样的训练步骤,可以在表中没有数据的环境下创建索引。
        Supported types are:


  • vector - up to 2,000 dimensions
  • halfvec - up to 4,000 dimensions (added in 0.7.0)
  • bit - up to 64,000 dimensions (added in 0.7.0)
  • sparsevec - up to 1,000 non-zero elements (added in 0.7.0)
        L2 distance
   CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
          Inner product
   CREATE INDEX ON items USING hnsw (embedding vector_ip_ops);
          Cosine distance
   CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);
          L1 distance - added in 0.7.0
   CREATE INDEX ON items USING hnsw (embedding vector_l1_ops);
          Hamming distance - added in 0.7.0
   CREATE INDEX ON items USING hnsw (embedding bit_hamming_ops);
          Jaccard distance - added in 0.7.0
   CREATE INDEX ON items USING hnsw (embedding bit_jaccard_ops);
  3.4 IVFFlat 索引

        IVFFlat索引将向量划分为列表,然后搜刮最接近查询向量的那些列表的子集。它的构建时间比HNSW快,且占用更少内存,但查询性能(就速度-召回权衡而言)较低。
        Supported types are:


  • vector - up to 2,000 dimensions
  • halfvec - up to 4,000 dimensions (added in 0.7.0)
  • bit - up to 64,000 dimensions (added in 0.7.0)
        L2 distance
   CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);
  Inner product
   CREATE INDEX ON items USING ivfflat (embedding vector_ip_ops) WITH (lists = 100);
          Cosine distance
   CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
          Hamming distance - added in 0.7.0
   CREATE INDEX ON items USING ivfflat (embedding bit_hamming_ops) WITH (lists = 100);

免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。
回复

使用道具 举报

0 个回复

倒序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

锦通

论坛元老
这个人很懒什么都没写!
快速回复 返回顶部 返回列表