FATAL: terminating connection due to session timeout [1022502] (ar_odb

熊熊出没 · 2024-8-20 02:27:10

问题描述：

AWS DMS将Redshift作为目标端，出现了报错，日志里是如下报错表现：

00885557: 2023-08-21T10:32:15 [TARGET_APPLY ]I: Truncate table statement: TRUNCATE TABLE "test"."test" (cloud_imp.c:3088)
00885557: 2023-08-21T10:32:15 [DMS_INTERNAL ]I: Failed (retcode -1) to execute statement: 'TRUNCATE TABLE "test"."test"' (ar_odbc_stmt.c:138)
.....
1.69261E+12,2023-08-21T10:32:15 [TARGET_APPLY ]E: Failed (retcode -1) to execute statement [1022502] (ar_odbc_stmt.c:4820)
1.69261E+12,2023-08-21T10:32:15 [TARGET_APPLY ]E: RetCode: SQL_ERROR SqlState: 57P01 NativeError: 30 Message: [Amazon][Amazon Redshift] (30) Error occurred while trying to execute a query: [SQLState 57P01] FATAL: terminating connection due to session timeout [1022502] (ar_odbc_stmt.c:4828)
1.69261E+12,2023-08-21T10:32:15 [TASK_MANAGER ]W: Table 'test'.'test' (subtask 0 thread 1) is suspended (replicationtask.c:2550)

复制代码

分析过程及办理方案：

1. 首先，报错中是实行了一个truncate table语句然后报错的。table error了，但是实在这个事件本身和truncate table语句没什么关系。就算实行一个别的DML也会发生这个报错。
2. 根据DMS文档[1], Amazon Redshift 的默认空闲会话超时时间为 4 小时。如果 DMS 复制使命中没有任何运动，Redshift 会在 4 小时后断开会话连接。 Errors can result from DMS being unable to connect and potentially needing to restart. As a workaround, set a SESSION TIMEOUT limit greater than 4 hours for the DMS replication user.
3. 报错日志中的这句话，就指向了超时相干：

1.69261E+12,2023-08-21T10:32:15 [TARGET_APPLY ]E: RetCode: SQL_ERROR SqlState: 57P01 NativeError: 30 Message: [Amazon][Amazon Redshift] (30) Error occurred while trying to execute a query: [SQLState 57P01] FATAL: terminating connection due to session timeout [1022502] (ar_odbc_stmt.c:4828)

复制代码

“FATAL: terminating connection due to session timeout” 的报错与文档中的4小时空闲会话会导致Redshift断开连接的限定相符。
4. 如果再深入查察，那么就查察日志中，Target_Apply大概Target_Load部分，上一条信息。看看上一次Redshfit目标端有DMS写入的时间，是不是恰恰timestamp是4小时前（甚至更早）。
我们的示例：

00885623: 2023-08-21T01:27:44 [TARGET_LOAD ]I: handling COPY from S3 via IAM Role (cloud_imp.c:2271)
00885625: 2023-08-21T01:27:45 [TARGET_LOAD ]I: Load finished for table 'metadata'.'sys_param' (Id = 8). 20 rows received. 0 rows skipped. Volume transferred 15728. (streamcomponent.c:3838)
00885625: 2023-08-21T01:27:45 [TARGET_LOAD ]I: handling COPY from S3 via IAM Role (cloud_imp.c:2271)

复制代码

这两条Target log信息之间仅有源端信息。也就是说，DMS使命在启动以后，在UTC时间01:27竣事了fullload，并完成了在Target Redshift的写入。而后在10:32开始实行truncate。1:27 - 10:32中间的相隔时间远超过4小时。这个时间差也验证了我们对于报错的分析。
5. 在AWS Redshift 中测试，一个空闲连结，在超过了 session timeout的设置的时长以后，如果再次运行指令，会出现如下报错：

dev=# select getdate();
FATAL: terminating connection due to session timeout //----> 与我们的DMS报错相符
SSL connection has been closed unexpectedly
The connection to the server was lost. Attempting reset: Succeeded.

复制代码

6. 根据文档[1], 我们建议将DMS user 在Redshift端的timeout时间延长。这样可以制止log中出现的session timeout的报错[2]：
ALTER USER <DMS User> SESSION TIMEOUT 86400;
---> 将dms使用的user的session timeoout设置为86400秒，也就是24小时。
需要留意的是，在运行了类似 ALTER USER SESSION TIMEOUT 86400; 的指令以后，用DMS User 当前连结的用户session并不会收效，只有在运行了alter的时间之后新创建的session，才会接纳新的session timeout设置。
参考文档：
[1] Redshift作为DMS目标：
https://docs.amazonaws.cn/dms/latest/userguide/CHAP_Target.Redshift.html
[2] Redshift alter user 指令：
https://docs.amazonaws.cn/redshift/latest/dg/r_ALTER_USER.html

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

		自动登录	找回密码
密码			立即注册

FATAL: terminating connection due to session timeout [1022502] (ar_odb

0 个回复

快速回复

楼主热帖

标签云