零基础5分钟上手亚马逊云科技-NLP文字理解AI服务

莱莱 · 2024-8-27 16:30:38

马上注册，结交更多好友，享用更多功能，让你轻松玩转社区。

您需要登录才可以下载或查看，没有账号？立即注册

x

简介：

欢迎来到小李哥全新亚马逊云科技AWS云盘算知识学习系列，适用于任何无云盘算或者亚马逊云科技技能背景的开发者，通过这篇文章大家零基础5分钟就能完全学会亚马逊云科技一个经典的服务开发架构方案。
我会每天介绍一个基于亚马逊云科技AWS云盘算平台的全球前沿云开发/架构技能办理方案，资助大家快速了解国际上最热门的云盘算平台亚马逊云科技AWS最佳实践，并应用到自己的一样平常工作里。本次介绍的是如何利用亚马逊云科技上设计云原生架构，利用S3服务托管前端、EC2服务托管后端应用以及利用DynamoDB托管NoSQL数据库，提拔云上应用程序的扩展性，降低运维维护难度。本方案架构图如下：

方案所需基础知识

什么是 Amazon Comprehend 服务？

Amazon Comprehend 是亚马逊云科技提供的一项天然语言处置惩罚 (NLP) 服务，旨在资助用户从非布局化文本中提取有价值的见解和信息。借助呆板学习技能，Comprehend 可以自动辨认文本中的实体、关键短语、情感、语言等，资助企业轻松分析客户反馈、交际媒体内容、文章等各种文本数据。
Comprehend 的应用场景非常广泛。例如，它可以用于情感分析，资助企业了解客户对产品或服务的情感倾向；也可以用于文天职类，将文档或内容自动归类到预界说的种别中。此外，Comprehend 还能辨认文本中的人名、所在、组织等实体，并提取出文本的主要话题，资助企业更好地举行信息管理和决定支持。

Amazon Comprehend 的自界说词汇辨认功能是什么

辨认自界说实体：

Custom Entity Recognition 功能答应用户根据特定业务需求，训练模型辨认文本中的自界说实体，如产品名称、合同条款或行业特定术语，从而更精准地提取关键信息。
提拔数据分析精准度：

通过定制化实体辨认，企业可以或许更精确地分析和分类文本数据，提高数据处置惩罚的精准度，支持更深入的业务洞察和决定。
无缝集成现有流程：

Custom Entity Recognition 功能可以轻松集成到现有的业务流程和应用中，无需复杂的设置或编码，从而快速部署并产生业务价值。

本方案包罗的内容

1. 利用Amazon Comprehend创建一个自界说文字理解模型，用于理解特定专有词汇

2. 利用模型API节点对文字举行及时理解

项目搭建具体步调

1. 起首辈入亚马逊云科技控制台，进入S3服务。

2. 创建一个S3存储桶，定名为”databucket-us-west-2-172330051“。

3. 上传用于训练Amazon Comprehend服务的数据集。
数据集共包罗两个文件，第一个文件是”documents.txt“，作为原始文本，用于模型训练。

"Pearson Boosts Security and Productivity Using Amazon Elasticsearch Service"
"2020"
"Global educational media company Pearson needed a more efficient way to analyze and gain insights from its log data. With a number of teams in various locations using Elasticsearch\u2014the popular open-source tool for search and log analytics\u2014Pearson found that keeping track of log data and managing updates led to high operating costs. Faced with this, as well as increasingly complex security log management and analysis, the company found a solution on Amazon Web Services (AWS). Pearson quickly saw improvements by migrating from its self-managed open-source Elasticsearch architecture to Amazon Elasticsearch Service, a fully managed service that makes it easy to deploy, secure, and run Elasticsearch cost effectively at scale. Rather than spending considerable time and resources on managing the Elasticsearch clusters on its own, Pearson used the managed Amazon Elasticsearch Service as part of its initiative to modernize its products. "

复制代码

第二个文件是”annotations.csv“，利用标记注释了文本文件内的内容，如”AWS_Service“表示该字段为AWS一个服务，资助模型理解文本文件。

File,Line,Begin Offset,End Offset,Type
documents.txt,0,47,75,AWS_SERVICE
documents.txt,2,167,180,AWS_SERVICE
documents.txt,2,453,479,AWS_SERVICE
documents.txt,2,590,610,AWS_SERVICE
documents.txt,2,860,888,AWS_SERVICE
documents.txt,5,17,45,AWS_SERVICE
documents.txt,7,0,26,JOB_TITLE
documents.txt,7,31,56,JOB_TITLE

复制代码

4. 我们进入AWS Comprehend服务

5. 点击”Launch“创建一个Comprehend服务。

6. 点击左侧的”Custom entity recognition.“页面，再点击右侧的”Create New Model“创建一个自界说模型。

7. 将模型定名为”aws-entity-recognizer“，版本号设置为1，添加自界说词汇范例，添加我们在标记文件中界说的”AWS_SERVICE“和”JOB_TITLE“

8. 选择数据集范例”Using annotations and training docs“，添加我们上传到S3中的标记文件和文本文件。

9. 为模型添加IAM权限，用于访问S3存储桶中的数据。

10. 点击Create开始训练模型，等待模型训练完成进入Trained状态，点击”aws-entity-recognizer“进入该模型。

11. 点击Performance可以查看该模型的性能评估分数，我们看到所有的分数指标精确率都是100，可以满足我们的必要。

12. 接下来我们为模型创建一个API节点，用于API调用

13. 将节点定名为”aws-entity-recognizer-endpoint“，并选择模型范例为自界说词汇辨认模型，选择我们刚刚训练好的模型”aws-entity-recognizer“。点击Create创建。

14. 点击左侧菜单栏”Real-time analysis“，选择分析模型为Custom范例模型，选择我们刚创建的模型作为分析模型，并输入必要分析的文字，点击Analysis开始分析。

15. 效果显示我们的Comprehend自界说模型乐身分析出”Amazon HealthLake“为一个AWS服务，而且Confidence score为0.99，NLP分析的精确度很高。

如何通过Python代码创建Amazon Comprehend自界说词汇辨认模型？

以下是使用AWS Boto3 SDK创建一个Amazon Comprehend自界说词汇辨认模型（Comprehend Custom Entity Recognition Model）并创建一个API端点（Endpoint）的Python代码示例。

import boto3
import time
# 创建Comprehend客户端
comprehend = boto3.client('comprehend', region_name='us-east-1')
# 定义S3路径和模型名称
training_data_s3_uri = 's3://your-bucket-name/training-data/'
model_name = 'MyCustomEntityModel'
data_access_role_arn = 'arn:aws:iam::YOUR_ACCOUNT_ID:role/YourComprehendRole'
# 创建自定义实体识别模型
def create_entity_recognition_model():
try:
response = comprehend.create_entity_recognizer(
RecognizerName=model_name,
DataAccessRoleArn=data_access_role_arn,
InputDataConfig={
'EntityTypes': [{'Type': 'AWS_SERVICE'}, {'Type': 'JOB_TITLE'}],
'Documents': {'S3Uri': training_data_s3_uri},
'Annotations': {'S3Uri': 's3://your-bucket-name/annotations/'}
},
LanguageCode='en'
)
recognizer_arn = response['EntityRecognizerArn']
print(f'Entity recognizer {model_name} created successfully. ARN: {recognizer_arn}')
return recognizer_arn
except Exception as e:
print(f'Error creating entity recognizer: {str(e)}')
return None
# 检查模型训练状态
def check_training_status(recognizer_arn):
while True:
response = comprehend.describe_entity_recognizer(
EntityRecognizerArn=recognizer_arn
)
status = response['EntityRecognizerProperties']['Status']
print(f'Model status: {status}')
if status in ['TRAINED', 'FAILED']:
break
time.sleep(600) # 每10分钟检查一次
# 创建端点
def create_endpoint(recognizer_arn):
try:
endpoint_name = 'MyComprehendEndpoint'
response = comprehend.create_endpoint(
EndpointName=endpoint_name,
ModelArn=recognizer_arn,
DesiredInferenceUnits=1
)
print(f'Endpoint {endpoint_name} created successfully. ARN: {response["EndpointArn"]}')
except Exception as e:
print(f'Error creating endpoint: {str(e)}')
if __name__ == '__main__':
# 第一步：创建实体识别模型
recognizer_arn = create_entity_recognition_model()
if recognizer_arn:
# 第二步：检查模型训练状态
check_training_status(recognizer_arn)
# 第三步：创建端点
create_endpoint(recognizer_arn)

复制代码

代码表明

创建自界说实体辨认模型：

利用自界说create_entity_recognition_model 函数调用 create_entity_recognizer API 创建自界说实体辨认模型。
输入配置：InputDataConfig 包含实体范例、文档的S3路径和注释的S3路径。
LanguageCode 指定模型的语言，这里我们使用英文。

创建端点：

create_endpoint 函数调用 create_endpoint API 创建一个端点，用于及时处置惩罚和辨认文本中的词汇。
DesiredInferenceUnits 指定了推理单位的数量，用于控制处置惩罚能力。

以上就是在亚马逊云科技上利用NLP AI服务Amazon Comprehend辨认自界说词汇，更精确理解文字关键信息，并通过API开发应用的全部步调。欢迎大家关注0基础5分钟上手AWS系列，未来获取更多国际前沿的AWS云开发/云架构方案。

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

		自动登录	找回密码
密码			立即注册

零基础5分钟上手亚马逊云科技-NLP文字理解AI服务

马上注册，结交更多好友，享用更多功能，让你轻松玩转社区。

0 个回复

快速回复

楼主热帖

标签云

浏览过的版块