【大模子开发指南】llamaindex配置deepseek、jina embedding及chromadb实现 ...

打印 上一主题 下一主题

主题 935|帖子 935|积分 2805

说一些坑,原来之前准备用milvus,但是发现win搞不了(docker都配好了)。然后转头搞chromadb。这里面尚有就是embedding一般都是本地摆设,但我电脑是cpu的没法玩,我就选了jina的embedding性能较优(也可以换glm的embedding但是要改代码)。最后问题出在deepseek与llamaindex的适配,因为接纳openai的接口,这里面改了openai库的源码然后对llamaindex加了配置项才完全跑通。国内小伙伴如果使用我这套方案可以抄,给我点个赞谢谢。
  主要环境:
  1. os:win11
  2. python3.10
  3. llamaindex  0.11.20
  4. chromadb   0.5.15
  5. 这个文件是官方例子,自己弄个也成
复制代码
源码如下:
  1. # %%
  2. from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
  3. from llama_index.vector_stores.chroma import ChromaVectorStore
  4. from llama_index.core import StorageContext
  5. from llama_index.embeddings.huggingface import HuggingFaceEmbedding
  6. from IPython.display import Markdown, display
  7. from llama_index.llms.openai import OpenAI
  8. import chromadb
  9. # %%
  10. import openai
  11. openai.api_key = "sk"
  12. openai.api_base = "https://api.deepseek.com/v1"
  13. llm = OpenAI(model='deepseek-chat',api_key=openai.api_key, base_url=openai.base_url)
  14. from llama_index.core import Settings
  15. # llm = OpenAI(api_key=openai.api_key, base_url=openai.base_url)
  16. Settings.llm = OpenAI(model="deepseek-chat",api_key=openai.api_key, base_url=openai.base_url)
  17. # %%
  18. import os
  19. jinaai_api_key = "jina"
  20. os.environ["JINAAI_API_KEY"] = jinaai_api_key
  21. from llama_index.embeddings.jinaai import JinaEmbedding
  22. text_embed_model = JinaEmbedding(
  23.     api_key=jinaai_api_key,
  24.     model="jina-embeddings-v3",
  25.     # choose `retrieval.passage` to get passage embeddings
  26.     task="retrieval.passage",
  27. )
  28. # %%
  29. # create client and a new collection
  30. chroma_client = chromadb.EphemeralClient()
  31. chroma_collection = chroma_client.create_collection("quickstart")
  32. # %%
  33. # define embedding function
  34. embed_model = text_embed_model
  35. # load documents
  36. documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
  37. # save to disk
  38. db = chromadb.PersistentClient(path="./chroma_db")
  39. chroma_collection = db.get_or_create_collection("quickstart")
  40. vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
  41. storage_context = StorageContext.from_defaults(vector_store=vector_store)
  42. index = VectorStoreIndex.from_documents(
  43.     documents, storage_context=storage_context, embed_model=embed_model
  44. )
  45. # load from disk
  46. db2 = chromadb.PersistentClient(path="./chroma_db")
  47. chroma_collection = db2.get_or_create_collection("quickstart")
  48. vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
  49. index = VectorStoreIndex.from_vector_store(
  50.     vector_store,
  51.     embed_model=embed_model,
  52. )
  53. # Query Data from the persisted index
  54. query_engine = index.as_query_engine()
  55. response = query_engine.query("What did the author do growing up?")
  56. print('response:',response)
复制代码
1.llamaindex如何配置deepseek


找到llama_index下面的openai的utils配置里,参加"deepseek-chat":128000,
路径C:\Users\USER.conda\envs\workspace\lib\site-packages\llama_index\llms\openai\utils.py
  1. from llama_index.llms.openai import OpenAI
  2. llm = OpenAI(model="deepseek-chat", base_url="https://api.deepseek.com/v1", api_key="sk-")
  3. response = llm.complete("见到你很高兴")
  4. print(str(response))
复制代码
2.llama使用jina

  1. # Initilise with your api key
  2. import os
  3. jinaai_api_key = "jina_"
  4. os.environ["JINAAI_API_KEY"] = jinaai_api_key
  5. from llama_index.embeddings.jinaai import JinaEmbedding
  6. text_embed_model = JinaEmbedding(
  7.     api_key=jinaai_api_key,
  8.     model="jina-embeddings-v3",
  9.     # choose `retrieval.passage` to get passage embeddings
  10.     task="retrieval.passage",
  11. )
  12. embeddings = text_embed_model.get_text_embedding("This is the text to embed")
  13. print("Text dim:", len(embeddings))
  14. print("Text embed:", embeddings[:5])
  15. query_embed_model = JinaEmbedding(
  16.     api_key=jinaai_api_key,
  17.     model="jina-embeddings-v3",
  18.     # choose `retrieval.query` to get query embeddings, or choose your desired task type
  19.     task="retrieval.query",
  20.     # `dimensions` allows users to control the embedding dimension with minimal performance loss. by default it is 1024.
  21.     # A number between 256 and 1024 is recommended.
  22.     dimensions=512,
  23. )
  24. embeddings = query_embed_model.get_query_embedding(
  25.     "This is the query to embed"
  26. )
  27. print("Query dim:", len(embeddings))
  28. print("Query embed:", embeddings[:5])
复制代码
3.llamaindex 使用chromadb

  1. # %%
  2. from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
  3. from llama_index.vector_stores.chroma import ChromaVectorStore
  4. from llama_index.core import StorageContext
  5. from llama_index.embeddings.huggingface import HuggingFaceEmbedding
  6. from IPython.display import Markdown, display
  7. from llama_index.llms.openai import OpenAI
  8. import chromadb
  9. # %%
  10. import openai
  11. openai.api_key = "sk-"
  12. openai.api_base = "https://api.deepseek.com/v1"
  13. from llama_index.core import Settings
  14. # llm = OpenAI(api_key=openai.api_key, base_url=openai.base_url)
  15. Settings.llm = OpenAI(model="deepseek-chat",api_key=openai.api_key, base_url=openai.base_url)
  16. # %%
  17. import os
  18. jinaai_api_key = "jina_"
  19. os.environ["JINAAI_API_KEY"] = jinaai_api_key
  20. from llama_index.embeddings.jinaai import JinaEmbedding
  21. text_embed_model = JinaEmbedding(
  22.     api_key=jinaai_api_key,
  23.     model="jina-embeddings-v3",
  24.     # choose `retrieval.passage` to get passage embeddings
  25.     task="retrieval.passage",
  26. )
  27. # %%
  28. # create client and a new collection
  29. chroma_client = chromadb.EphemeralClient()
  30. chroma_collection = chroma_client.create_collection("quickstart")
  31. # %%
  32. # define embedding function
  33. embed_model = text_embed_model
  34. # load documents
  35. documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
  36. # %%
  37. # set up ChromaVectorStore and load in data
  38. vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
  39. # %%
  40. storage_context = StorageContext.from_defaults(vector_store=vector_store)
  41. # %%
  42. index = VectorStoreIndex.from_documents(
  43.     documents, storage_context=storage_context, embed_model=embed_model
  44. )
  45. # Settings.llm = llm
  46. # Query Data
  47. query_engine = index.as_query_engine()
  48. response = query_engine.query("What did the author do growing up?")
  49. print('response:',response)
复制代码
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

x
回复

使用道具 举报

0 个回复

倒序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

温锦文欧普厨电及净水器总代理

金牌会员
这个人很懒什么都没写!
快速回复 返回顶部 返回列表