[2023-09-08 10:37:21.028960] [90663][Y0-00007F8EB76544E0] [CONNECTION](trace_type="LOGIN_TRACE", connection_diagnosis={cs_id:1031798785, ss_id:0, proxy_session_id:0, server_session_id:0, client_addr:"xxx.xxx.xxx.xxx:xxxx", server_addr:"*Not IP address [0]*:0", cluster_name:"undefined", tenant_name:"sys", user_name:"root", error_code:-10018, error_msg:"fail to check observer version, empty result", request_cmd:"COM_SLEEP", sql_cmd:"COM_LOGIN"}{internal_sql:"SELECT ob_version() AS cluster_version"})
复制代码
额外诊断信息 internal_sql:obproxy当前实行的内部请求
用户侧使用题目
场景诊断错误码诊断信息办理手段集群名错误4669cluster xxx does not exist确保对应集群存在,可以通过直连ObServer的方式举行确认租户名错误4043dummy entry is empty, please check if the tenant exists确保对应的租户存在,可以通过直连ObServer的方式确认ObProxy白名单校验失败8205user xxx@xxx can not pass white list通过OCP确认ObProxy白名单是否配置正确ObServer白名单校验失败1227Access denied确认ObServer白名单是否配置正确客户端连接数达上限client_max_connections5059too many sessions可以调整ObProxy的全局配置client_max_connections做暂时的规避ObProxy配置要求使用SSL协议,但是用户发起平凡协议请求8004obproxy is configured to use ssl connection修改SSL协议配置enable_client_ssl,或者使用SSL协议访问直接访问proxyro@sys10021user proxyro is rejected while proxyro_check on不应直接使用proxyro@sys访问数据库云上用户在关闭enable_cloud_full_user_name的场景下使用三段式访问10021connection with cluster name and tenant name is rejected while cloud_full_user_name_check off云上用户关闭enable_cloud_full_user_name时,ObProxy会限制三段式的访问非云用户开启enable_full_user_name的场景下,没有使用三段式访问10021cluster name and tenant name is required while full_username_check on非云用户关闭enable_full_user_name时,ObProxy会限制非三段式的访问 部署错误
trace_type场景诊断错误码诊断信息办理手段LOGIN_TRACEproxyro暗码配置错误10018fail to check observer version, proxyro@sys access denied, error resp { code:1045, msg:Access denied for user xxx }默认情况下的部署proxyro的暗码是不会存在题目标,如果手动更改proxyro用户的暗码,请确保ObProxy的启动参数配置正确LOGIN_TRACE启动obproxy时配置的rootservice_list 不可用10018fail to check observer version, empty result这里可以通过直连ObServer确认ObProxy启动时配置的server ip是否可用 OB侧错误
trace_type场景诊断错误码诊断信息办理手段LOGIN_TRACE集群信息查询为空4669cluster info is empty直连ObServer实行internal_sql字段的sql语句确认ObServer返回的集群信息是否为空LOGIN_TRACE集群信息查询失败10018fail to check observer version
fail to check cluster info
fail to init server state直连ObServer实行internal_sql字段的sql语句确认ObServer返回的集群信息是否为空LOGIN_TRACEconfig_server信息查询失败10301fail to fetch root server list from config server fail to fetch root server list from local可以手动拉去启动时配置的config_server的url确认这里config server返回的信息是否正常 超时断连接
[2023-08-10 23:35:00.132805] [32339][Y0-00007F74C9A244E0] [CONNECTION](trace_type="SERVER_VC_TRACE", connection_diagnosis={cs_id:838860809, ss_id:0, proxy_session_id:7230691830869983240, server_session_id:0, client_addr:"xxx.xxx.xxx.xxx:45765", server_addr:"", cluster_name:"undefined", tenant_name:"sys", user_name:"root", error_code:-10013, error_msg:"Fail to build connection to observer", request_cmd:"COM_QUERY", sql_cmd:"COM_HANDSHAKE"}{vc_event:"unknown event", total_time(us):2952626, user_sql:"select 1 from dual"})
复制代码
外诊断信息: vc_event:断连接相干的事件,用户不需要太关注 total_time:请求实行时间 user_sql:用户请求
trace_type场景诊断错误码诊断信息办理手段SERVER_VC_TRACEObProxy 与 ObServer 节点建连失败10013Fail to build connection to observer需要observer配合诊断SERVER_VC_TRACEObProxy 传输请求给ObServer时连接断开10016An EOS event eceived while proxy transferring request需要observer配合诊断SERVER_VC_TRACEObProxy 传输 ObServer回包时连接断开10014An EOS event received while proxy reading response需要observer配合诊断 ObServer主动断连接的场景ObProxy没有办法网络更为具体的信息,如果ObProxy配置的ObServer节点状态正常则需要配合ObServer的日志举行诊断。 客户端主动断连接
[2023-08-10 23:28:24.699168] [32339][Y0-00007F74C9A244E0] [CONNECTION](trace_type="CLIENT_VC_TRACE", connection_diagnosis={cs_id:838860807, ss_id:26, proxy_session_id:7230691830869983239, server_session_id:3221698209, client_addr:"xxx.xxx.xxx.xxx:44701", server_addr:"xxx.xxx.xxx.xxx:xxxx", cluster_name:"undefined", tenant_name:"sys", user_name:"root", error_code:-10010, error_msg:"An EOS event received from client while obproxy reading request", request_cmd:"COM_SLEEP", sql_cmd:"COM_END"}{vc_event:"VC_EVENT_EOS", total_time(us):57637, user_sql:""})
复制代码
额外诊断信息: vc_event:断连接相干的事件,用户不需要太关注 total_time:请求实行时间 user_sql:用户请求
trace_type场景诊断错误码诊断信息办理手段CLIENT_VC_TRACEObProxy收发包时客户端发生断连接10010An EOS event received from client while obproxy reading request需要客户端配合诊断CLIENT_VC_TRACEObProxy处理请求时客户端断连接10011An EOS event received from client while obproxy handling response需要客户端配合诊断CLIENT_VC_TRACEObProxy回包时客户端发送断连接10012An EOS event received from client while obproxy transferring response需要客户端配合诊断 客户端断连接的场景ObProxy没有办法网络更为具体的信息,只能指出客户端方面主动断开连接的操纵,比较常见的断连接题目有驱动超时主动断开连接、Druid/Hikaricp/Nginx等中间件主动断连接、网络抖动等题目 ObProxy/ObServer内部错误
[2023-08-10 23:26:12.558201] [32339][Y0-00007F74C9A244E0] [CONNECTION](trace_type="PROXY_INTERNAL_TRACE", connection_diagnosis={cs_id:838860805, ss_id:0, proxy_session_id:7230691830869983237, server_session_id:0, client_addr:"xxx.xxx.xxx.xxx:44379", server_addr:"", cluster_name:"undefined", tenant_name:"sys", user_name:"root", error_code:-10019, error_msg:"OBProxy reached the maximum number of retrying request", request_cmd:"COM_QUERY", sql_cmd:"COM_QUERY"}{user_sql:"USE `ý<8f>ý<91>ý<92>`"})
复制代码
额外诊断信息: user_sql:用户请求sql
trace_type场景诊断错误码诊断信息办理手段PROXY_INTERNAL_TRACE租户分区信息查询失败4664dummy entry is empty, disconnect未预期错误场景PROXY_INTERNAL_TRACEObProxy部门内部请求实行失败10018proxy execute internal request failed, received error resp, error_type: xxx未预期错误场景PROXY_INTERNAL_TRACEObProxy重试请求达上限10019OBProxy reached the maximum number of retrying request未预期错误场景PROXY_INTERNAL_TRACEObProxy目标Session被关闭10001target session is closed, disconnect未预期错误场景PROXY_INTERNAL_TRACE其他未预期的错误场景10001诊断信息为空未预期错误场景SERVER_INTERNAL_TRACECheckSum 校验出错10001ora fatal error未预期错误场景SERVER_INTERNAL_TRACE主备库切换场景10001primary cluster switchover to standby, disconnect主备库切换过程中大概存在的断连接题目,符合预期的场景 其他场景
对应trace_type:PROXY_INTERNAL_TRACE 诊断日志示例:
[2023-08-10 23:27:15.107427] [32339][Y0-00007F74CAAE84E0] [CONNECTION](trace_type="PROXY_INTERNAL_TRACE", connection_diagnosis={cs_id:838860806, ss_id:21, proxy_session_id:7230691830869983238, server_session_id:3221695443, client_addr:"xxx.xxx.xxx.xxx:44536", server_addr:"xxx.xxx.xxx.xxx:xxxx", cluster_name:"undefined", tenant_name:"sys", user_name:"", error_code:-5065, error_msg:"connection was killed by user self, cs_id: 838860806", request_cmd:"COM_QUERY", sql_cmd:"COM_QUERY"}{user_sql:"kill 838860806"})
复制代码
额外诊断信息: user_sql:用户请求sql
trace_type场景诊断错误码诊断信息排查手段PROXY_INTERNAL_TRACEkil 当前session5065connection was killed by user self, cs_id: xxx符合预期的场景,诊断日志作纪录PROXY_INTERNAL_TRACEkill 其他session5065connection was killed by user session xxx符合预期的场景,诊断日志作纪录 断连接诊断示例
[2023-09-07 15:59:52.308553] [122701][Y0-00007F7071D194E0] [CONNECTION](trace_type="CLIENT_VC_TRACE", connection_diagnosis={cs_id:524328, ss_id:0, proxy_session_id:7230691833961840700, server_session_id:0, client_addr:"xxx.xxx.xxx.xxx:38877", server_addr:"xxx.xxx.xxx.xxx:50110", cluster_name:"ob1.changluo.cc.xxx.xxx.xxx.xxx", tenant_name:"mysql", user_name:"root", error_code:-10011, error_msg:"An unexpected connection event received from client while obproxy handling request", request_cmd:"COM_QUERY", sql_cmd:"COM_QUERY"}{vc_event:"VC_EVENT_EOS", total_time(us):5016353, user_sql:"select sleep(20) from dual"})
复制代码
主要诊断信息:
trace_type: CLIENT_VC_TRACE, 判断出是客户端主动断开的连接
error_msg: An unexpected connection event received from client while obproxy handling request,说明客户端在obproxy处理请求时断开连接
total_time: 5016353,请求总的实行时间为5s左右,可以通过total_time去匹配客户端的超时参数
根据诊断信息能确定是客户端主动断开了连接,需要排查客户端相干的题目。
根据obproxy连接诊断日志,从客户端入手排查
查看JDBC堆栈:
The last packet successfully received from the server was 5,016 milliseconds ago. The last packet sent successfully to the server was 5,011 milliseconds ago.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1129)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3720)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3609)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4160)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2617)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2778)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2819)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2768)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:949)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:795)
at odp.Main.main(Main.java:12)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114)
at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161)
at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189)
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3163)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3620)
9 more
复制代码
The last packet successfully received from the server was 5,013 milliseconds ago. The last packet sent successfully to the server was 5,012 milliseconds ago.Caused by: java.net.SocketTimeoutException: Read timed out 可以从堆栈以及收发包时间中大致判断这里是socketTimeout触发的题目。
ObProxy断连接
•obdiag 下载地点: https://www.oceanbase.com/softwarecenter
•obdiag 官方文档: https://www.oceanbase.com/docs/obdiag-cn
•obdiag github地点: GitHub - oceanbase/obdiag: obdiag (OceanBase Diagnostic Tool) is designed to help OceanBase users quickly gather necessary information and analyze the root cause of the problem.
•obdiag SIG 营地: [obdiag SIG] 诊断工具组 · OceanBase 技能交换