大号在练葵花宝典 发表于 2024-6-23 17:36:16

langchain 模子加载HuggingFaceEmbeddings、文本切割RecursiveCharacterTex

参考:
https://github.com/TommyTang930/LangChain_LLM_ChatBot
https://python.langchain.com/docs/integrations/vectorstores/faiss
1、文本切割RecursiveCharacterTextSplitter

这里对着类举行了改写,对中文切分更友好
import re
from typing import List, Optional, Any
from langchain.text_splitter import RecursiveCharacterTextSplitter
import logging

logger = logging.getLogger(__name__)


def _split_text_with_regex_from_end(
      text: str, separator: str, keep_separator: bool
) -> List:
    # Now that we have the separator, split the text
    if separator:
      if keep_separator:
            # The parentheses in the pattern keep the delimiters in the result.
            _splits = re.split(f"({separator})", text)
            splits = ["".join(i) for i in zip(_splits, _splits)]
            if len(_splits) % 2 == 1:
                splits += _splits[-
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。
页: [1]
查看完整版本: langchain 模子加载HuggingFaceEmbeddings、文本切割RecursiveCharacterTex