LLM - 利用 Neo4j 可视化 GraphRAG 构建的 知识图谱(KG) 教程 ...

  论坛元老 | 2024-10-23 11:50:02 | 显示全部楼层 | 阅读模式
打印 上一主题 下一主题

主题 1712|帖子 1712|积分 5136

接待关注我的CSDN:https://spike.blog.csdn.net/
本文地址:https://spike.blog.csdn.net/article/details/142938982
  免责声明:本文泉源于个人知识与公开资料,仅用于学术交流,接待讨论,不支持转载。


Neo4j 是一个高性能的图形数据库,允许用户以图形的情势存储和检索数据,这种情势非常得当处理复杂的关系和网络布局,因其在数据关系处理方面的强盛能力而广受接待,尤其是在社交网络、推荐体系、网络分析等领域。
   构建 GraphRAG 的知识图谱,请参考:配置 GraphRAG + Ollama 服务 构建 中文知识图谱 教程(踩坑记录)
  

  • Doc:https://neo4j.com/docs/apoc/current/

1. 配置 Neo4j 服务

准备 Docker,参考 Docker - Neo4j
  1. docker pull neo4j:5.24.1
复制代码
启动 Docker (直接启动,同时运行服务):
  1. docker run --network=host --gpus all --rm --name neo4j-apoc \
  2. -e NEO4J_apoc_export_file_enabled=true \
  3. -e NEO4J_apoc_import_file_enabled=true \
  4. -e NEO4J_apoc_import_file_use__neo4j__config=true \
  5. -e NEO4J_PLUGINS=\["apoc"\] \
  6. --volume=[your folder]:[your folder] \
  7. neo4j:5.24.1
复制代码
大概,进入 Docker,再启动服务:
  1. docker run --network=host --gpus all -it --name neo4j-apoc -e NEO4J_apoc_export_file_enabled=true -e NEO4J_apoc_import_file_enabled=true -e NEO4J_apoc_import_file_use__neo4j__config=true -e NEO4J_PLUGINS=\["apoc"\] --volume=[your folder]:[your folder] neo4j:5.24.1 /bin/bash
  2. bin/neo4j start
复制代码
  留意:利用 Neo4j + APOC 版本的 Docker。APOC(Awesome Procedures on Cypher) 是 Neo4j 图数据库的一个插件,提供一组强盛的过程和函数,扩展 Cypher 查询语言的功能。参考:Neo4J and APOC
  日记:
  1. Installing Plugin 'apoc' from /var/lib/neo4j/labs/apoc-*-core.jar to /var/lib/neo4j/plugins/apoc.jar
  2. Applying default values for plugin apoc to neo4j.conf
  3. 2024-10-15 01:40:54.429+0000 INFO  Logging config in use: File '/var/lib/neo4j/conf/user-logs.xml'
  4. 2024-10-15 01:40:54.443+0000 INFO  Starting...
  5. 2024-10-15 01:40:55.191+0000 INFO  This instance is ServerId{0350f51a} (0350f51a-ef80-414f-b82f-8e4b38fc369f)
  6. 2024-10-15 01:40:56.078+0000 INFO  ======== Neo4j 5.24.1 ========
  7. 2024-10-15 01:40:58.875+0000 INFO  Anonymous Usage Data is being sent to Neo4j, see https://neo4j.com/docs/usage-data/
  8. 2024-10-15 01:40:58.910+0000 INFO  Bolt enabled on 0.0.0.0:7687.
  9. 2024-10-15 01:40:59.325+0000 INFO  HTTP enabled on 0.0.0.0:7474.
  10. 2024-10-15 01:40:59.326+0000 INFO  Remote interface available at http://localhost:7474/
  11. 2024-10-15 01:40:59.328+0000 INFO  id: 3C118963730B6744966FCB5FC5D9D5795B11AD1F791A4DDC113D02D1F926441F
  12. 2024-10-15 01:40:59.329+0000 INFO  name: system
  13. 2024-10-15 01:40:59.329+0000 INFO  creationDate: 2024-10-15T01:40:57.342Z
  14. 2024-10-15 01:40:59.329+0000 INFO  Started.
复制代码
启动服务:http://[your ip]:7474/browser/,默认账户和暗码都是 neo4j,必要修改新暗码 xxxxxx,建议 neo4j123 (自界说)。
启动页面,留意,实体和关系都空的,即:

2. 注入知识图谱数据

数据位于:/var/lib/neo4j/data/databases/neo4j,其中 neo4j 是数据库。
读取 GraphRAG 的知识图谱数据,如下:
  1. import os
  2. import pandas as pd
  3. rag_dir = "[your folder]/llm/graphrag/ragtest/output/"
  4. entities = pd.read_parquet(os.path.join(rag_dir, "create_final_entities.parquet"))
  5. relationships = pd.read_parquet(os.path.join(rag_dir, "create_final_relationships.parquet"))
  6. text_units = pd.read_parquet(os.path.join(rag_dir, "create_final_text_units.parquet"))
  7. communities = pd.read_parquet(os.path.join(rag_dir, "create_final_communities.parquet"))
  8. community_reports = pd.read_parquet(os.path.join(rag_dir, "create_final_community_reports.parquet"))
复制代码
测试数据:
  1. entities.head(2)
  2. relationships.head(2)
  3. text_units.head(2)
  4. communities.head(2)
  5. community_reports.head(2)
复制代码
毗连服务器:
  1. NEO4J_URI = "neo4j://localhost:7687"
  2. NEO4J_USERNAME = "neo4j"
  3. NEO4J_PASSWORD = "xxxxxx"        # 之前修改的密码
  4. NEO4J_DATABASE = "neo4j"          # 默认
  5. driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))
复制代码
  留意:社区版本,不能创建新的 Database 只能利用默认的 neo4j,创建命令 CREATE DATABASE my-database,参考
  数据导入函数:
  1. def import_data(cypher, df, batch_size=1000):
  2.     for i in range(0,len(df), batch_size):
  3.         batch = df.iloc[i: min(i+batch_size, len(df))]
  4.         result = driver.execute_query("UNWIND $rows AS value " + cypher,
  5.                                       rows=batch.to_dict('records'),
  6.                                       database_=NEO4J_DATABASE)
  7.         print(result.summary.counters)
  8.     return
复制代码
导入 text_units 命令:
  1. #导入text_units
  2. cypher_text_units = """
  3. MERGE (c:__Chunk__ {id:value.id})
  4. SET c += value {.text, .n_tokens}
  5. WITH c, value
  6. UNWIND value.document_ids AS document
  7. MATCH (d:__Document__ {id:document})
  8. MERGE (c)-[:PART_OF]->(d)
  9. """
  10. import_data(cypher_text_units, text_units)
复制代码
运行成功,日记:
  1. {'_contains_updates': True, 'labels_added': 99, 'relationships_created': 235, 'nodes_created': 99, 'properties_set': 396}
复制代码
导入 entities 数据的命令:
  1. #导入entities
  2. cypher_entities= """
  3. MERGE (e:__Entity__ {id:value.id})
  4. SET e += value {.human_readable_id, .description, name:replace(value.name,'"','')}
  5. WITH e, value
  6. CALL db.create.setNodeVectorProperty(e, "description_embedding", value.description_embedding)
  7. CALL apoc.create.addLabels(e, case when coalesce(value.type,"") = "" then [] else [apoc.text.upperCamelCase(replace(value.type,'"',''))] end) yield node
  8. UNWIND value.text_unit_ids AS text_unit
  9. MATCH (c:__Chunk__ {id:text_unit})
  10. MERGE (c)-[:HAS_ENTITY]->(e)
  11. """
  12. import_data(cypher_entities, entities)
复制代码
导入 relationships 数据的命令:
  1. #导入relationships
  2. cypher_relationships = """
  3.     MATCH (source:__Entity__ {name:replace(value.source,'"','')})
  4.     MATCH (target:__Entity__ {name:replace(value.target,'"','')})
  5.     // not necessary to merge on id as there is only one relationship per pair
  6.     MERGE (source)-[rel:RELATED {id: value.id}]->(target)
  7.     SET rel += value {.rank, .weight, .human_readable_id, .description, .text_unit_ids}
  8.     RETURN count(*) as createdRels
  9. """
  10. import_data(cypher_relationships, relationships)
复制代码
导入 communities 数据的命令:
  1. #导入communities
  2. cypher_communities = """
  3. MERGE (c:__Community__ {community:value.id})
  4. SET c += value {.level, .title}
  5. /*
  6. UNWIND value.text_unit_ids as text_unit_id
  7. MATCH (t:__Chunk__ {id:text_unit_id})
  8. MERGE (c)-[:HAS_CHUNK]->(t)
  9. WITH distinct c, value
  10. */
  11. WITH *
  12. UNWIND value.relationship_ids as rel_id
  13. MATCH (start:__Entity__)-[:RELATED {id:rel_id}]->(end:__Entity__)
  14. MERGE (start)-[:IN_COMMUNITY]->(c)
  15. MERGE (end)-[:IN_COMMUNITY]->(c)
  16. RETURn count(distinct c) as createdCommunities
  17. """
  18. import_data(cypher_communities, communities)
复制代码
导入 community_reports 数据的命令:
  1. #导入community_reports
  2. cypher_community_reports = """MATCH (c:__Community__ {community: value.community})
  3. SET c += value {.level, .title, .rank, .rank_explanation, .full_content, .summary}
  4. WITH c, value
  5. UNWIND range(0, size(value.findings)-1) AS finding_idx
  6. WITH c, value, finding_idx, value.findings[finding_idx] as finding
  7. MERGE (c)-[:HAS_FINDING]->(f:Finding {id: finding_idx})
  8. SET f += finding"""
  9. import_data(cypher_community_reports, community_reports)
复制代码
3. 测试效果

启动 Neo4j 页面,知识图谱可视化,包罗 Node labels 和 Relationship types 等功能,即:

   其他知识图谱元素的可视化,参考 Neo4j 的文档。

免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

x
回复

使用道具 举报

0 个回复

倒序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

论坛元老
这个人很懒什么都没写!
快速回复 返回顶部 返回列表