useful_content = [doc.page_content for doc in docs]
return "\nRetrieved documents:\n" + "".join(
[
f"\n\n===== Document {str(i)} =====\n" + doc
for i, doc in enumerate(useful_content)
]
)
def rag(query):
docs = retriever.invoke(query)
documents = format_docs(docs)
answer = generate_answer(query, documents)
return documents, answer
复制代码
将全部内容组合在一起,我们得到:
query = "How did the response lengths change with training?"
docs, answer = rag(query)
print(answer)
复制代码
和一个相应:
Based on the provided context, the response lengths during training for the DeepSeek-R1-Zero model showed a clear trend of increasing as the number of training steps progressed. This is evident from the graphs described in Document 0 and Document 1, which both depict the "average length per response" on the y-axis and training steps on the x-axis.
### Key Observations:
1. **Increasing Trend**: The average response length consistently increased as training steps advanced. This suggests that the model naturally learned to allocate more "thinking time" (i.e., generate longer responses) as it improved its reasoning capabilities during the reinforcement learning (RL) process.
2. **Variability**: Both graphs include a shaded area around the average response length, indicating some variability in response lengths during training. However, the overall trend remained upward.
3. **Quantitative Range**: The y-axis for response length ranged from 0 to 12,000 tokens, and the graphs show a steady increase in the average response length over the course of training, though specific numerical values at different steps are not provided in the descriptions.
### Implications:
The increase in response length aligns with the model's goal of solving reasoning tasks more effectively. Longer responses likely reflect the model's ability to provide more detailed and comprehensive reasoning, which is critical for tasks requiring complex problem-solving.
In summary, the response lengths increased during training, indicating that the model adapted to allocate more resources (in terms of response length) to improve its reasoning performance.
复制代码
Elasticsearch 提供了多种加强搜索的战略,包罗混合搜索,这是近似语义搜索和基于关键字的搜索的结合。
这种方法可以进步作为上下文利用的 RAG 架构中的 top 文档的干系性。要启用此功能,您必要按照以下方式修改 vector_store 初始化:
from langchain_elasticsearch import DenseVectorStrategy
vector_store = ElasticsearchStore(
es_url=os.environ['es_host'],
index_name="demo-index",
embedding=embeddings,
es_api_key=os.environ['es_api_key'],
query_field="text",
vector_query_field="embeddings",
strategy=DenseVectorStrategy(hybrid=True) // <-- here the change