参考:
https://github.com/TommyTang930/LangChain_LLM_ChatBot
https://python.langchain.com/docs/integrations/vectorstores/faiss
1、文本切割RecursiveCharacterTextSplitter
这里对着类举行了改写,对中文切分更友好
- import re
- from typing import List, Optional, Any
- from langchain.text_splitter import RecursiveCharacterTextSplitter
- import logging
- logger = logging.getLogger(__name__)
- def _split_text_with_regex_from_end(
- text: str, separator: str, keep_separator: bool
- ) -> List[str]:
- # Now that we have the separator, split the text
- if separator:
- if keep_separator:
- # The parentheses in the pattern keep the delimiters in the result.
- _splits = re.split(f"({separator})", text)
- splits = ["".join(i) for i in zip(_splits[0::2], _splits[1::2])]
- if len(_splits) % 2 == 1:
- splits += _splits[-
复制代码 免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |