论文[《Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond》])通过图1所示的树状图详细枚举了自2018年以来自然语言大模型(LLM)这一范畴的发展路线和相应的各大模型,其中一部分是在Transformer出现之前、不基于Transformer的大模型,例如AI2的ELMo,另一大部分是在Transfomer出现之后、基于Transformer的大模型,其又分为三个发展路线:
对于向量索引引擎,笔者使用Chroma;对于大语言模型,笔者使用之前已定义的ChatGLM2。对于问题和从向量索引返回的相关文本段,RetrievalQA按下述提示模板拼接提示:
Use the following pieces of context to answer the question at the end. If you don’t know the answer, just say that you don’t know, don’t try to make up an answer.
{context}
Question: {question} Helpful Answer:
retrieval_qa_demo.py运行结果如图10所示。
其中,对于大语言模型,先尝试使用之前已定义的ChatGLM2,背面会分析,从执行结果看,ChatGLM2-6B-INT4和ChatGLM2-6B并不能输出符合格式的答案,从而无法进一步从中提取出查询SQL,以是通过FakeListLLM直接使用固定的答案,而这些答案事先根据提示由OpenAI ChatGPT3.5给出。 对于数据库引擎,使用SQLite3(Macbook原生支持),对于数据库实例,使用[Chinook],可按照上述链接中的阐明下载“Chinook_Sqlite.sql”并在本地创建数据库实例。Chinook表示一个数字多媒体商店,包罗了顾客(Customer)、雇员(Employee)、歌曲(Track)、订单(Invoice)及其相关的表和数据,如图12所示。问题是“How many employees are there?”,即有多少雇员,期望模型先给出查询Employee表记录数的SQL,再根据查询结果给出最终的答案。
实际执行时,SQLDatabaseChain起首根据问题和数据库Schema天生如下的提示:
You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question. Unless the user specifies in the question a specific number of examples to obtain, query for at most 5 results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database. Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers. Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table. Pay attention to use date(‘now’) function to get the current date, if the question involves “today”. Use the following format: Question: Question here SQLQuery: SQL Query to run SQLResult: Result of the SQLQuery Answer: Final answer here Only use the following tables: {数据库Schema,包罗全部表的建表语句和数据示例,受限于篇幅,这里略去} Question: How many employees are there? SQLQuery:
其中,提示的第一部分是指示,期望模型作为SQLite的专家,按照一定的要求举行推理,并按照一定的格式输出,第二部分是数据库Schema,第三部分是问题以及期望输出的开头“SQLQuery:”,预期模型按照提示续写,给出查询SQL。 若将提示输入ChatGPT3.5,可以返回预期的答案,SQLDatabaseChain进一步提取答案中“\nSQLResult”之前的部分,从而得到查询SQL:
SELECT COUNT() FROM Employee SQLResult: COUNT() 8 Answer: There are 8 employees.
若将提示输入自定义的ChatGLM2(使用ChatGLM2-6B-INT4),则无法返回预期的答案(答案合理、但不符合格式要求):
SQLite is a language for creating and managing databases. It does not have an SQL-specific version for getting the number of employees. However, I can provide you with an SQL query that you can run using a SQLite database to get the number of employees in the “Employee” table.
SQLite:
SELECT COUNT(*) as num_employees FROM Employee;
复制代码
This query will return the count of employees in the “Employee” table. The result will be returned in a single row with a single column, labeled “num_employees”.
SQLDatabaseChain的提示是针对ChatGPT逐步优化、确定的,因此适用于ChatGPT,LangChain官方示例中使用的大语言模型是OpenAI,即底层调用ChatGPT,而ChatGLM2-6B-INT4、ChatGLM2-6B相对于ChatGPT,模型规模较小,仅有60亿参数,对于上述的长文本提示无法给出预期的答案。由于没有OpenAI的Token,因此示例代码通过FakeListLLM直接使用由ChatGPT3.5给出的答案。 在获取查询SQL后,SQLDatabaseChain会执行该SQL获取查询结果,并继续根据问题、数据库Schema、查询SQL和查询结果天生如下的提示:
You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question. Unless the user specifies in the question a specific number of examples to obtain, query for at most 5 results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database. Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers. Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table. Pay attention to use date(‘now’) function to get the current date, if the question involves “today”. Use the following format: Question: Question here SQLQuery: SQL Query to run SQLResult: Result of the SQLQuery Answer: Final answer here Only use the following tables: {数据库Schema,包罗全部表的建表语句和数据示例,受限于篇幅,这里略去} Question: How many employees are there? SQLQuery:SELECT COUNT(EmployeeId) FROM Employee SQLResult: [(8,)] Answer:
相比上次提示,本次提示只是在末了追加了查询SQL和查询结果,若将提示输入ChatGPT3.5,则可以续写“Answer”,给出精确的答案:
There are 8 employees.
这里也通过FakeListLLM直接使用由ChatGPT3.5给出的答案,从而在本地跑通SQLDatabaseChain的流程,运行结果如图13所示。