一文速学-零资本与数据沟通NL2SQL的概念和实现技术

打印 上一主题 下一主题

主题 854|帖子 854|积分 2562

前言

关于NL2SQL的技术,如果大家最近有关注AI圈的话,或多或少都有所了解。其实很多业务场景下,于用户而言更多的是想要获取到最终数据的出现效果,关于数据是怎样获取得到的学习资本,是尽可能越少越好。众所周知当学习资本越低,那么产品的获客率也越高,固然对于我们技术职员来说,更多的还是研发思维。最终我们开发的服务主要还是为了业务服务,NL2SQL肯定是以后数据开发的趋势地点,因此我们数据开发职员来说,临时不谈掌握这门技术,清楚理念还是十分必要的。
NL2SQL技术概念

NL2SQL(Natural Language to SQL)是自然语言处理和数据库查询相结合的一项技术,旨在将用户以自然语言输入的查询转换为SQL查询语句,从而实现自然语言问答与数据库之间的主动交互。就按照企业一样平常报表业务,按照研发思绪,我们起首可以通过UI大概页面前端获取到客户的文本信息,传输到后端举行落库,然后工单展示大概是直接举行数据库查询,由数据开发职员编写SQL语言,最终查询得到结果再举行数据可视化展示。
那么按照NL2SQL的理念来说,整个数据查询流程可以被大大简化。起首用户不必要再通过复杂的页面选择或表单提交,甚至不必要对数据库结构和SQL语言有任何了解。用户可以直接在前端输入自然语言的题目,例如“查询上个月的销售报表”或"显示2023年所有产品的库存情况",系统会主动将这些自然语言题目传递给后端。
在后端,NL2SQL技术会剖析用户输入的自然语言,辨认其中的意图、关键词,并结合详细的数据库模式,主动生成相应的SQL查询语句。这个过程涉及自然语言处理(NLP)、语义剖析、数据库模式映射等多项核心技术。通过这些技术,系统能够理解用户的需求,并生成正确的SQL语句,从而直接查询数据库,获取数据。
查询结果同样可以直接返回给前端举行展示,或通过数据可视化工具举行图表化出现。这种技术避免了传统方法中数据开发职员手动编写SQL查询的繁琐步骤,提升了响应速率。更紧张的是,用户不再必要依赖技术职员举行数据查询,而是可以及时获得自己想要的结果。这不但低落了学习资本,还能明显的提高客户的满意度和体验感。

可能光说概念大家还是有所不解,我们以一个现实场景来看:假设一位销售司理必要在月度会议前预备一份关于公司销售情况的陈诉,通常,整个过程可能会涉及多次向数据分析师或IT团队请求资助,以编写SQL查询,导出数据并生成陈诉。那么传统方法一般为:
传统方法


  • 自然语言需求:销售司理提出题目:“我想知道上个月每个产品的销售额是多少,并按销售额排序。”
  • 沟通与转化:销售司理将这一需求发给数据分析师或IT职员。
  • SQL查询编写:数据分析师根据需求,手动编写SQL查询:
  1. SELECT product_name, SUM(sales_amount) AS total_sales
  2. FROM sales
  3. WHERE sale_date BETWEEN '2023-07-01' AND '2023-07-31'
  4. GROUP BY product_name
  5. ORDER BY total_sales DESC;
复制代码

  • 数据提取与导出:查询结果被导出并整理为Excel或其他格式,交给销售司理。
  • 陈诉制作:销售司理将导出的数据整合进陈诉中,进一步处理可视化图表。
这种传统方法必要多个步骤,涉及差别脚色的协作,尤其是在多个查询需求下,可能会导致反复沟通和修改SQL查询的过程。
NL2SQL技术


  • 自然语言查询:销售司理直接在系统中输入题目:“显示上个月每个产品的销售额,并按销售额排序。”
  • NL2SQL技术处理:系统通过自然语言处理,将该需求剖析为SQL查询,主动生成如下SQL语句:
  1. SELECT product_name, SUM(sales_amount) AS total_sales
  2. FROM sales
  3. WHERE sale_date BETWEEN '2023-07-01' AND '2023-07-31'
  4. GROUP BY product_name
  5. ORDER BY total_sales DESC;
复制代码

  • 主动执行查询并显示结果:系统执行查询,并将结果以表格或图表形式直接出现给销售司理。
  • 即时反馈与可视化:销售司理立刻获取数据,并可以根据必要进一步调整查询,如“按产品类别分类显示销售额”或“检察去年同期的销售数据”。
效果对比就十分明显了,使用NL2SQL后查询过程主动化,大幅紧缩了数据获取的时间。销售司理无需学习SQL语言,仅凭自然语言就可以完成复杂查询,低落了技术门槛。而且可以基于效果及时验证,查询可以随时举行,不再依赖技术团队的及时对接,查询结果直接以可视化形式出现,用户能够更直观理解数据,从而快速做出业务决议。
NL2SQL技术支持

NL2SQL工作流程


用户输入自然语言题目,例如“查找2020年销售额最高的产品。系统起首必要理解用户输入的查询意图。自然语言本质上是含糊其词的,使其容易受到多种表明的影响。解决这种歧义是自然语言处理中的一项关键使命,以确保人与机器之间的正确通讯。在实践中,情况可能并非总是如此。最终用户可能不知道(全部或部门)列或表的语义。因此最大的题目在于怎样让AI能够正确定位到数据的详细表字段信息。以是说做NL2SQL不可避免的必要业务强绑定,肯定要对地点范畴的数据集有清楚的认知,才气训练好语义大模型。
目前比力火的英文数据集有WikiSQL、Spider、WikiTableQuestions、ATIS等,中文数据集有刚刚竣事的中文首届NL2SQL挑衅赛公开的数据,各个数据集都有各自的特点,这里不详细开展论述,感爱好的可以自行探索。

根据数据集中SQL涉及到的数据库表的个数差别,分为单标和多表;根据所生成的SQL结构中是否包含嵌套查询,将数据集分为有嵌套和无嵌套。有个十分有意思的比赛
大家感爱好的可以去看看这个比赛的获奖作品和数据集,会对NL2SQL工作和研究有较为清楚的了解。

我们以一句现实语言来看整个SQL生成逻辑。
比如自然语言输入
“查找2023年所有销售额超过1000万的产品,并按销售额降序分列。”
目标SQL查询
  1. SELECT product_name, sales_amount
  2. FROM sales
  3. WHERE year = 2023 AND sales_amount > 10000000
  4. ORDER BY sales_amount DESC;
复制代码
1. 文本预处理

起首,系统对输入的自然语言举行预处理,预备后续的语义剖析步骤。


  • 分词(Tokenization):将句子拆分为单独的词或词组。

    • 结果:[“查找”, “2023年”, “所有”, “销售额”, “超过”, “1000万”, “的”, “产品”, “并”, “按”, “销售额”, “降序”, “分列”]
    • 算法:基于规则或深度学习模型的分词工具,如SpaCy、NLTK。

  • 去除停用词:去掉“的”、“并”等不影响语义的词汇。

    • 结果:[“查找”, “2023年”, “销售额”, “超过”, “1000万”, “产品”, “按”, “销售额”, “降序”, “分列”]
    • 算法:通过停用词辞书或手工定义的停用词表举行过滤。

2. 定名实体辨认(NER, Named Entity Recognition)

系统必要辨认出句子中的关键实体,如时间、数字和产品等。


  • 辨认内容

    • 时间实体:2023年
    • 数值实体:1000万
    • 产品实体:产品
    • 操纵实体:销售额,降序分列
    • 算法:NER模型,如基于BiLSTM-CRF或预训练模型(如BERT)的NER工具。

  • 输出

    • 时间:2023年
    • 数值:10000000
    • 操纵:查找、超过、降序分列
    • 实体:产品、销售额

3. 意图辨认

系统辨认用户的查询意图,即“查询”操纵。这里的意图辨认是关键步骤,它决定了SQL语句的类型(SELECT查询)。


  • 辨认出查询意图
    :查询数据,并按条件过滤。

    • 算法:基于分类模型(如SVM、LSTM)或预训练模型(如BERT)举行意图分类。

  • 结果:查询操纵,按条件筛选并排序。
4. 句法剖析(Syntax Parsing)

通过句法剖析分析自然语言输入的结构,确定主语、谓语、宾语等元素之间的关系。这里的目标是辨认出“查找”的对象是“产品”,“销售额超过1000万”是一个过滤条件。


  • 剖析句法结构

    • 查找(谓语) -> 产品(宾语)
    • 销售额超过1000万(条件) -> 降序分列(排序方式)
    • 算法:依存句法分析(Dependency Parsing),使用工具如Stanford NLP或SpaCy。

  • 结果

    • 主谓宾关系:查找产品
    • 条件关系:销售额 > 1000万
    • 排序:按销售额降序分列

5. 数据库模式映射

系统必要将自然语言中的实体(如“产品”、“销售额”)映射到数据库中的详细表和字段。这个步骤要求系统理解数据库的模式(Schema)。


  • 映射自然语言到数据库字段

    • “产品” -> product_name
    • “销售额” -> sales_amount
    • “2023年” -> year
    • 算法:基于规则的模式映射或通过训练模型学习自然语言与数据库字段之间的对应关系。

  • 结果

    • product_name
    • sales_amount
    • year

6. SQL模板生成

系统将辨认出的信息填充到SQL模板中。这里,模板生成的使命是将自然语言剖析出的结构信息转换为SQL语句的各个子句(SELECT、FROM、WHERE、ORDER BY)。
SQL模板
  1. SELECT {字段1, 字段2, ...}
  2. FROM {表}
  3. WHERE {条件1} AND {条件2} ...
  4. ORDER BY {排序字段} {排序方式};
复制代码
填充模板


  • SELECT:product_name, sales_amount
  • FROM:sales
  • WHERE:year = 2023 AND sales_amount > 10000000
  • ORDER BY:sales_amount DESC
  • 算法:基于模板匹配或生成模型(如Seq2Seq、Transformer)生成SQL语句。
结果:
  1. SELECT product_name, sales_amount
  2. FROM sales
  3. WHERE year = 2023 AND sales_amount > 10000000
  4. ORDER BY sales_amount DESC;
复制代码
这里的prompt可以提供一个现在行业广泛认可的:
  1. system = """
  2. Given the database schema below, generate a MySQL query based on the user's question. Ensure to consider totals from line items, inclusive date ranges, and correct data aggregation for summarization. Remember to handle joins, groupings, and orderings effectively.
  3. Database schema:
  4. - Customer (CustomerID, FirstName, LastName, Email, Phone, BillingAddress, ShippingAddress, CustomerSince, IsActive)
  5. - Employee (EmployeeID, FirstName, LastName, Email, Phone, HireDate, Position, Salary)
  6. - InventoryLog (LogID, ProductID, ChangeDate, QuantityChange, Notes)
  7. - LineItem (LineItemID, SalesOrderID, ProductID, Quantity, UnitPrice, Discount, TotalPrice)
  8. - Product (ProductID, ProductName, Description, UnitPrice, StockQuantity, ReorderLevel, Discontinued)
  9. - SalesOrder (SalesOrderID, CustomerID, OrderDate, RequiredDate, ShippedDate, Status, Comments, PaymentMethod, IsPaid)
  10. - Supplier (SupplierID, CompanyName, ContactName, ContactTitle, Address, Phone, Email)
  11. Guidelines for SQL query generation:
  12. 1. **Ensure Efficiency and Performance**: Opt for JOINs over subqueries where possible, use indexes effectively, and mention any specific performance considerations to keep in mind.
  13. 2. **Adapt to Specific Analytical Needs**: Tailor WHERE clauses, JOIN operations, and aggregate functions to precisely meet the analytical question being asked.
  14. 3. **Complexity and Variations**: Include a range from simple to complex queries, illustrating different SQL functionalities such as aggregate functions, string manipulation, and conditional logic.
  15. 4. **Handling Specific Cases**: Provide clear instructions on managing NULL values, ensuring date ranges are inclusive, and handling special data integrity issues or edge cases.
  16. 5. **Explanation and Rationale**: After each generated query, briefly explain why this query structure was chosen and how it addresses the analytical need, enhancing understanding and ensuring alignment with requirements.
  17. -- 1. Average Order Total for Customers without a Registered Phone Number Within a Specific Period
  18. SELECT AVG(TotalPrice) FROM LineItem
  19. JOIN SalesOrder ON LineItem.SalesOrderID = SalesOrder.SalesOrderID
  20. JOIN Customer ON SalesOrder.CustomerID = Customer.CustomerID
  21. WHERE Customer.Phone IS NULL AND SalesOrder.OrderDate BETWEEN '2003-01-01' AND '2009-12-31';
  22. -- Rationale: Analyzes spending behavior of uncontactable customers within a set timeframe, aiding targeted marketing strategies.
  23. -- 2. List Top 10 Employees by Salary
  24. SELECT * FROM Employee ORDER BY Salary DESC LIMIT 10;
  25. -- Rationale: Identifies highest-earning employees for payroll analysis and salary budgeting.
  26. -- 3. Find the Total Quantity of Each Product Sold Last Month
  27. SELECT Product.ProductName, SUM(LineItem.Quantity) AS TotalQuantitySold
  28. FROM Product
  29. JOIN LineItem ON Product.ProductID = LineItem.ProductID
  30. JOIN SalesOrder ON LineItem.SalesOrderID = SalesOrder.SalesOrderID
  31. WHERE SalesOrder.OrderDate BETWEEN DATE_SUB(NOW(), INTERVAL 1 MONTH) AND NOW()
  32. GROUP BY Product.ProductID;
  33. -- Rationale: Helps in inventory management by highlighting sales performance of products, informing restocking decisions.
  34. -- 4. Show Sales by Customer for the Current Year, Including Customer Details
  35. SELECT Customer.FirstName, Customer.LastName, SUM(LineItem.TotalPrice) AS TotalSales
  36. FROM Customer
  37. JOIN SalesOrder ON Customer.CustomerID = SalesOrder.CustomerID
  38. JOIN LineItem ON SalesOrder.SalesOrderID = LineItem.SalesOrderID
  39. WHERE YEAR(SalesOrder.OrderDate) = YEAR(CURDATE())
  40. GROUP BY Customer.CustomerID;
  41. -- Rationale: Identifies top customers based on yearly sales, supporting personalized customer service and loyalty programs.
  42. -- 5. Identify Products That Need Reordering (Stock Quantity Below Reorder Level)
  43. SELECT ProductName, StockQuantity, ReorderLevel FROM Product WHERE StockQuantity <= ReorderLevel;
  44. -- Rationale: Essential for inventory control, prompting restocking of products to meet demand efficiently.
  45. -- 6. Display All Suppliers That Have Not Supplied Any Products That Are Currently Discontinued
  46. SELECT Supplier.CompanyName FROM Supplier
  47. LEFT JOIN Product ON Supplier.SupplierID = Product.SupplierID
  48. WHERE Product.Discontinued = 0
  49. GROUP BY Supplier.SupplierID;
  50. -- Rationale: Evaluates supplier contributions to the supply chain by focusing on those with active product lines.
  51. Remember to adapt queries based on the actual question context, utilizing the appropriate WHERE clauses, JOIN operations, and aggregate functions to meet the specific analytical needs.
  52. Sample records for the Supplier table:
  53. - SupplierID: 29, CompanyName: Hogan-Anderson, ContactName: Sierra Carey, ContactTitle: Mining engineer, Address: 246 Johnny Fords Apt. 858, Williamsport, AK 96920, Phone: 232.945.6443, Email: rodney04@example.com
  54. - SupplierID: 30, CompanyName: Nixon, Woods and Pearson, ContactName: Lawrence Phillips, ContactTitle: Aid worker, Address: USS Osborn, FPO AE 24294, Phone: 001-462-571-0185x478, Email: jessica29@example.org
  55. Sample records for the Product table:
  56. - ProductID: 1, ProductName: Reflect Sea, Description: Factor country center price pretty foreign theory paper fact machine two., UnitPrice: 191.19, StockQuantity: 665, ReorderLevel: 46, Discontinued: 1
  57. - ProductID: 2, ProductName: Avoid American, Description: Skill environmental start set bring must job early per weight difficult someone., UnitPrice: 402.14, StockQuantity: 970, ReorderLevel: 15, Discontinued: 1
  58. - ProductID: 3, ProductName: Evening By, Description: Whether high bill though each president another its., UnitPrice: 12.81, StockQuantity: 842, ReorderLevel: 32, Discontinued: 1
  59. - ProductID: 4, ProductName: Certain Identify, Description: Spring identify bring debate wrong style hit., UnitPrice: 155.22, StockQuantity: 600, ReorderLevel: 27, Discontinued: 1
  60. - ProductID: 5, ProductName: Impact Agreement, Description: Whom ready entire meeting consumer safe pressure truth., UnitPrice: 368.72, StockQuantity: 155, ReorderLevel: 35, Discontinued: 0
  61. - ProductID: 6, ProductName: Million Agreement, Description: Glass why team yes reduce issue nothing., UnitPrice: 297.03, StockQuantity: 988, ReorderLevel: 36, Discontinued: 1
  62. - ProductID: 7, ProductName: Foot Vote, Description: Anyone floor movie maintain TV new age prove certain really dog., UnitPrice: 28.75, StockQuantity: 828, ReorderLevel: 24, Discontinued: 0
  63. - ProductID: 8, ProductName: Somebody Current, Description: Politics since exactly film idea Republican., UnitPrice: 202.9, StockQuantity: 317, ReorderLevel: 18, Discontinued: 0
  64. - ProductID: 9, ProductName: Somebody Character, Description: Long agreement history administration purpose conference including., UnitPrice: 300.38, StockQuantity: 242, ReorderLevel: 30, Discontinued: 1
  65. - ProductID: 10, ProductName: Low Idea, Description: Spend guess somebody spend fight director technology find between college skill., UnitPrice: 34.68, StockQuantity: 65, ReorderLevel: 27, Discontinued: 0
  66. Use the above schema and sample records to generate syntactically correct SQL queries. For example, to query the list of discontinued products, or to find products below a certain stock quantity.
  67. Sample records for the Employee table:
  68. - EmployeeID: 1, FirstName: Danny, LastName: Morales, Email: catherine08@example.com, Phone: 001-240-574-6687x625, HireDate: 2021-06-16, Position: Medical technical officer, Salary: 36293
  69. - EmployeeID: 2, FirstName: William, LastName: Spencer, Email: sthompson@example.com, Phone: (845)940-2095x693, HireDate: 2023-08-22, Position: English as a foreign language teacher, Salary: 51775
  70. - EmployeeID: 3, FirstName: Brian, LastName: Stark, Email: hughesmelissa@example.com, Phone: 780.299.1965x06374, HireDate: 2023-02-24, Position: Pharmacologist, Salary: 11963
  71. - EmployeeID: 4, FirstName: Sarah, LastName: Cannon, Email: brittney20@example.com, Phone: 512.717.8995x05793, HireDate: 2019-05-23, Position: Physiological scientist, Salary: 69878
  72. - EmployeeID: 5, FirstName: Lance, LastName: Bell, Email: patrick57@example.net, Phone: +1-397-320-2600x803, HireDate: 2019-06-22, Position: Scientific laboratory technician, Salary: 56499
  73. - EmployeeID: 6, FirstName: Jason, LastName: Larsen, Email: teresaharris@example.org, Phone: +1-541-955-5657x7357, HireDate: 2022-11-02, Position: Proofreader, Salary: 89756
  74. - EmployeeID: 7, FirstName: Kyle, LastName: Baker, Email: nathanielmiller@example.net, Phone: +1-863-658-3715x6525, HireDate: 2019-10-30, Position: Firefighter, Salary: 96795
  75. - EmployeeID: 8, FirstName: Jennifer, LastName: Hernandez, Email: sarah43@example.org, Phone: 267-588-3195, HireDate: 2021-01-10, Position: Designer, interior/spatial, Salary: 37584
  76. - EmployeeID: 9, FirstName: Shane, LastName: Meyer, Email: perrystanley@example.org, Phone: 001-686-918-6486, HireDate: 2021-04-14, Position: Retail manager, Salary: 69688
  77. - EmployeeID: 10, FirstName: Christine, LastName: Powell, Email: tanderson@example.org, Phone: 427.468.2131, HireDate: 2019-05-11, Position: Sports administrator, Salary: 39962
  78. Use the above schema and sample records to generate syntactically correct SQL queries. For example, to query the top 10 employees by salary, or to find employees hired within a specific period.
  79. Sample records for the Customer table:
  80. - CustomerID: 1, FirstName: Sandra, LastName: Cruz, Email: rhonda24@example.net, Phone: 511-949-6987x21174, BillingAddress: "18018 Kyle Streets Apt. 606, Shaneville, AZ 85788", ShippingAddress: "18018 Kyle Streets Apt. 606, Shaneville, AZ 85788", CustomerSince: 2023-05-02, IsActive: 0
  81. - CustomerID: 2, FirstName: Robert, LastName: Williams, Email: traciewall@example.net, Phone: 944-649-2491x60774, BillingAddress: "926 Mitchell Pass Apt. 342, Brianside, SC 83374", ShippingAddress: "926 Mitchell Pass Apt. 342, Brianside, SC 83374", CustomerSince: 2020-09-01, IsActive: 0
  82. - CustomerID: 3, FirstName: John, LastName: Greene, Email: travis92@example.org, Phone: 279.334.1551, BillingAddress: "36019 Bill Manors Apt. 219, Dominiquefort, AK 55904", ShippingAddress: "36019 Bill Manors Apt. 219, Dominiquefort, AK 55904", CustomerSince: 2021-03-15, IsActive: 0
  83. - CustomerID: 4, FirstName: Steven, LastName: Riley, Email: greennathaniel@example.org, Phone: +1-700-682-7696x189, BillingAddress: "76545 Hebert Crossing Suite 235, Forbesbury, MH 14227", ShippingAddress: "76545 Hebert Crossing Suite 235, Forbesbury, MH 14227", CustomerSince: 2022-12-05, IsActive: 0
  84. - CustomerID: 5, FirstName: Christina, LastName: Blake, Email: christopher87@example.net, Phone: 584.263.4429, BillingAddress: "8342 Shelly Fork, West Chasemouth, CT 81799", ShippingAddress: "8342 Shelly Fork, West Chasemouth, CT 81799", CustomerSince: 2019-11-12, IsActive: 0
  85. - CustomerID: 6, FirstName: Michael, LastName: Stevenson, Email: lynnwilliams@example.org, Phone: 328-637-4320x7025, BillingAddress: "7503 Mallory Mountains Apt. 199, Meganport, MI 81064", ShippingAddress: "7503 Mallory Mountains Apt. 199, Meganport, MI 81064", CustomerSince: 2024-01-01, IsActive: 1
  86. - CustomerID: 7, FirstName: Anna, LastName: Kramer, Email: steven23@example.org, Phone: +1-202-719-6886x844, BillingAddress: "295 Mcgee Fort, Manningberg, PR 93309", ShippingAddress: "295 Mcgee Fort, Manningberg, PR 93309", CustomerSince: 2022-03-06, IsActive: 1
  87. - CustomerID: 8, FirstName: Michael, LastName: Sullivan, Email: bbailey@example.com, Phone: 988.368.5033, BillingAddress: "772 Bruce Motorway Suite 583, Powellbury, MH 42611", ShippingAddress: "772 Bruce Motorway Suite 583, Powellbury, MH 42611", CustomerSince: 2019-03-23, IsActive: 1
  88. - CustomerID: 9, FirstName: Kevin, LastName: Moody, Email: yoderjennifer@example.org, Phone: 3425196543, BillingAddress: "371 Lee Lake, New Michaelport, CT 99382", ShippingAddress: "371 Lee Lake, New Michaelport, CT 99382", CustomerSince: 2023-12-03, IsActive: 1
  89. - CustomerID: 10, FirstName: Jeremy, LastName: Mejia, Email: spencersteven@example.org, Phone: 449.324.7097, BillingAddress: "90137 Harris Garden, Matthewville, IA 39321", ShippingAddress: "90137 Harris Garden, Matthewville, IA 39321", CustomerSince: 2019-05-20, IsActive: 1
  90. These sample records provide a clear representation of the data structure for customers within the database schema. Use these details to assist in generating queries that involve customer information, such as filtering active customers, summarizing sales by customer, or identifying long-term customers.
  91. Sample records for the InventoryLog table:
  92. - LogID: 1, ProductID: 301, ChangeDate: 2023-09-08, QuantityChange: 84, Notes: Inventory increased
  93. - LogID: 2, ProductID: 524, ChangeDate: 2023-08-09, QuantityChange: -84, Notes: Inventory decreased
  94. - LogID: 3, ProductID: 183, ChangeDate: 2023-04-17, QuantityChange: -51, Notes: Inventory decreased
  95. - LogID: 4, ProductID: 390, ChangeDate: 2023-02-27, QuantityChange: 80, Notes: Inventory increased
  96. - LogID: 5, ProductID: 737, ChangeDate: 2023-11-15, QuantityChange: 24, Notes: Inventory increased
  97. - LogID: 6, ProductID: 848, ChangeDate: 2023-11-22, QuantityChange: 69, Notes: Inventory increased
  98. - LogID: 7, ProductID: 534, ChangeDate: 2023-06-06, QuantityChange: -61, Notes: Inventory decreased
  99. - LogID: 8, ProductID: 662, ChangeDate: 2024-01-16, QuantityChange: 70, Notes: Inventory increased
  100. - LogID: 9, ProductID: 969, ChangeDate: 2024-01-07, QuantityChange: -25, Notes: Inventory decreased
  101. - LogID: 10, ProductID: 640, ChangeDate: 2023-08-08, QuantityChange: -13, Notes: Inventory decreased
  102. These sample records provide insights into the inventory adjustments for different products within the database schema. Utilize these details to assist in generating queries that track inventory changes, analyze stock levels, or evaluate inventory management efficiency.
  103. Sample records for the LineItem table:
  104. - LineItemID: 1, SalesOrderID: 280, ProductID: 290, Quantity: 3, UnitPrice: 84.59, Discount: NULL, TotalPrice: 253.77
  105. - LineItemID: 2, SalesOrderID: 94, ProductID: 249, Quantity: 6, UnitPrice: 88.7, Discount: NULL, TotalPrice: 532.2
  106. - LineItemID: 3, SalesOrderID: 965, ProductID: 247, Quantity: 1, UnitPrice: 43.44, Discount: NULL, TotalPrice: 43.44
  107. - LineItemID: 4, SalesOrderID: 173, ProductID: 16, Quantity: 10, UnitPrice: 26.3, Discount: NULL, TotalPrice: 263
  108. - LineItemID: 5, SalesOrderID: 596, ProductID: 191, Quantity: 9, UnitPrice: 59.44, Discount: NULL, TotalPrice: 534.96
  109. - LineItemID: 6, SalesOrderID: 596, ProductID: 308, Quantity: 8, UnitPrice: 33.11, Discount: NULL, TotalPrice: 264.88
  110. - LineItemID: 7, SalesOrderID: 960, ProductID: 758, Quantity: 5, UnitPrice: 64.47, Discount: NULL, TotalPrice: 322.35
  111. - LineItemID: 8, SalesOrderID: 148, ProductID: 288, Quantity: 5, UnitPrice: 65.21, Discount: NULL, TotalPrice: 326.05
  112. - LineItemID: 9, SalesOrderID: 974, ProductID: 706, Quantity: 3, UnitPrice: 59.86, Discount: NULL, TotalPrice: 179.58
  113. - LineItemID: 10, SalesOrderID: 298, ProductID: 998, Quantity: 2, UnitPrice: 75.79, Discount: NULL, TotalPrice: 151.58
  114. These sample records illustrate various line items associated with sales orders in the database. These details help in constructing queries to analyze sales performance, product popularity, pricing strategies, and overall sales revenue.
  115. Sample records for the SalesOrder table:
  116. - SalesOrderID: 1, CustomerID: 12, OrderDate: 2022-11-05, RequiredDate: 2022-12-02, ShippedDate: 2022-11-25, Status: Pending, Comments: NULL, PaymentMethod: NULL, IsPaid: 0
  117. - SalesOrderID: 2, CustomerID: 56, OrderDate: 2022-02-22, RequiredDate: 2022-03-08, ShippedDate: 2022-03-17, Status: Completed, Comments: NULL, PaymentMethod: NULL, IsPaid: 1
  118. - SalesOrderID: 3, CustomerID: 63, OrderDate: 2023-03-20, RequiredDate: 2023-03-27, ShippedDate: NULL, Status: Shipped, Comments: NULL, PaymentMethod: NULL, IsPaid: 0
  119. - SalesOrderID: 4, CustomerID: 21, OrderDate: 2023-04-29, RequiredDate: 2023-05-26, ShippedDate: 2023-05-14, Status: Pending, Comments: NULL, PaymentMethod: NULL, IsPaid: 1
  120. - SalesOrderID: 5, CustomerID: 16, OrderDate: 2022-11-05, RequiredDate: 2022-11-30, ShippedDate: NULL, Status: Shipped, Comments: NULL, PaymentMethod: NULL, IsPaid: 1
  121. - SalesOrderID: 6, CustomerID: 46, OrderDate: 2023-10-06, RequiredDate: 2023-10-27, ShippedDate: NULL, Status: Shipped, Comments: NULL, PaymentMethod: NULL, IsPaid: 1
  122. - SalesOrderID: 7, CustomerID: 47, OrderDate: 2023-02-08, RequiredDate: 2023-02-25, ShippedDate: 2023-03-03, Status: Shipped, Comments: NULL, PaymentMethod: NULL, IsPaid: 1
  123. - SalesOrderID: 8, CustomerID: 70, OrderDate: 2022-07-29, RequiredDate: 2022-08-18, ShippedDate: 2022-08-10, Status: Pending, Comments: NULL, PaymentMethod: NULL, IsPaid: 0
  124. - SalesOrderID: 9, CustomerID: 14, OrderDate: 2022-03-29, RequiredDate: 2022-04-15, ShippedDate: 2022-04-17, Status: Completed, Comments: NULL, PaymentMethod: NULL, IsPaid: 0
  125. - SalesOrderID: 10, CustomerID: 31, OrderDate: 2024-01-12, RequiredDate: 2024-01-31, ShippedDate: 2024-02-07, Status: Pending, Comments: NULL, PaymentMethod: NULL, IsPaid: 0
  126. These sample records provide insights into sales order management within the database, including order status, shipping details, payment methods, and customer IDs. This information is crucial for analyzing sales processes, order fulfillment rates, customer engagement, and payment transactions.
  127. """
复制代码
7. SQL查询优化

在生成SQL语句之后,系统可以对SQL查询举行优化,以提高查询服从。例如,添加索引、优化查询操持等。


  • 优化步骤

    • 确定索引:在year和sales_amount字段上检查是否有适当的索引。
    • 重写查询:如果有必要,系统可以通过查询优化器重写查询以提高性能。
    • 算法:查询优化算法,如基于资本的查询优化器(Cost-Based Optimizer, CBO)。

  • 结果:优化后的SQL语句(此例中原SQL语句已经较为简单,无需进一步优化)。
8. SQL执行与结果展示

优化后的SQL语句被发送到数据库执行,数据库返回结果集。系统会将查询结果转换为用户友爱的形式举行展示,如表格、图表或其他可视化形式。


  • SQL执行:通过数据库连接,执行生成的SQL查询。
  • 结果展示:将数据库返回的结果集以表格或图表的形式出现给用户。

    • 算法:基于数据库的查询引擎举行执行,结果展示使用可视化工具(如Matplotlib、Plotly)。

  • 示例输出

    •      product_namesales_amount产品A15000000产品B12000000产品C11000000

评价该模型的两个指标:正确匹配率、执行正确率


  • Execution Accuracy

    • 定义:计算SQL执行结果正确的数量在数据集中的比例,结果存在高估的可能。

  • Exact Match

    • 定义:计算模型生成的SQL和标注SQL的匹配程度,结果存在低估的可能。


其中, N表现数据量,                                    S                         Q                                   L                            ′                                       SQL'                  SQL′和                                   S                         Q                         L                              SQL                  SQL分别代表猜测的SQL语句和真实的SQL语句,                                    S                         c                         o                         r                                   e                                       l                               f                                                 Score_{lf}                  Scorelf​表现                                   L                         o                         g                         i                         c                         F                         o                         r                         m                              LogicForm                  LogicForm正确率;                                             Y                            ′                                       Y'                  Y′和                                   Y                              Y                  Y分别代表猜测的SQL和真实的SQL的执行结果,                                    S                         c                         o                         r                                   e                                       e                               x                                                 Score_{ex}                  Scoreex​表现                                   E                         x                         e                         c                         u                         t                         i                         o                         n                              Execution                  Execution正确率。
总结

NL2SQL的出现,彻底改变了人与数据交互的方式。它通过将复杂的SQL查询隐藏在自然语言输入背后,极大地低落了数据获取的门槛,让业务职员无需依赖技术配景就能直接获取所需的信息。随着自然语言处理技术的不停进步,NL2SQL的应用场景将愈加广泛,覆盖从企业报表到智能客服等各个范畴。将来,随着模型的泛化本事增强和及时性能优化,我们可以期待NL2SQL技术在数据驱动的决议中饰演更加紧张的脚色,让“零资本与数据沟通”真正成为可能。

免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

x
回复

使用道具 举报

0 个回复

倒序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

我爱普洱茶

金牌会员
这个人很懒什么都没写!

标签云

快速回复 返回顶部 返回列表