使用DataX同步hive数据到MySQL

诗林  金牌会员 | 2025-2-16 04:44:56 | 显示全部楼层 | 阅读模式
打印 上一主题 下一主题

主题 580|帖子 580|积分 1740

目录

1、组件环境
2、安装datax
2.1、下载datax并解压
3、安装datax-web
3.0、下载datax-web的源码,进行编译
3.1、在MySQL中创建datax-web元数据
3.2、安装data-web 
3.2.1实行install.sh命令解压部署
3.2.1、手动修改 datax-admin配置文件
 3.2.2、手动修改 datax-executor配置文件
  3.2.3、更换datax下的python实行文件
 3.2.4、更换MySQLjar文件
4、创建MySQL和Hive数据
4.1、创建MySQL数据库
4.2、创建Hive数据库 
5、配置datax和datax-web
5.1、启动datax-web服务
5.2、页面访问及配置
5.2.1创建项目 
5.2.2、添加mysql和hive的数据源
5.2.3创建DataX使命模板
5.2.4、使命构建
5.2.5、查询创建的使命
 5.2.6、在Hive中实行数据插入
6、查看效果


1、组件环境

名称版本形貌下载所在
hadoop3.4.0官方下载bin
hive3.1.3下载源码编译
mysql8.0.31
datax0.0.1-SNAPSHOThttp://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
datax-webdatax-web-2.1.2下载源码进行编译github.com
centoscentos7x86版本
java1.8
 
2、安装datax

2.1、下载datax并解压

   http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
    tar -zxvf datax.tar.gz
  
  解压到/cluster目录下 
  实行测试命令
   ./bin/datax.py job/job.json
  出现报错
    File "/cluster/datax/bin/./datax.py", line 114
    print readerRef
    ^^^^^^^^^^^^^^^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)?
  
  阐明:体系中安装python3和python2,默认是python3,由于datax bin中的datax默认支持python2,
  以是需要指定python版本为python2,后面会将这个三个文件进行备份更换,datax-web中doc下提供了默认的支持。
  

  • Python (2.x) (支持Python3需要修改更换datax/bin下面的三个python文件,更换文件在doc/datax-web/datax-python3下) 必选,重要用于调度实行底层DataX的启动脚本,默认的方式是以Java子历程方式实行DataX,用户可以选择以Python方式来做自定义的改造
  • 参考所在:datax-web/doc/datax-web/datax-web-deploy.md at master · WeiYe-Jing/datax-web (github.com)
  python2 bin/datax.py job/job.json  
   
  
  DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
  
2024-10-13 21:04:41.543 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2024-10-13 21:04:41.553 [main] INFO  Engine - the machine info  => 
      osInfo:    Oracle Corporation 1.8 25.40-b25
    jvmInfo:    Linux amd64 5.8.13-1.el7.elrepo.x86_64
    cpu num:    8
      totalPhysicalMemory:    -0.00G
    freePhysicalMemory:    -0.00G
    maxFileDescriptorCount:    -1
    currentOpenFileDescriptorCount:    -1
      GC Names    [PS MarkSweep, PS Scavenge]
      MEMORY_NAME                    | allocation_size                | init_size                      
    PS Eden Space                  | 256.00MB                       | 256.00MB                       
    Code Cache                     | 240.00MB                       | 2.44MB                         
    Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
    PS Survivor Space              | 42.50MB                        | 42.50MB                        
    PS Old Gen                     | 683.00MB                       | 683.00MB                       
    Metaspace                      | -0.00MB                        | 0.00MB                         
  
2024-10-13 21:04:41.575 [main] INFO  Engine - 
{
    "content":[
        {
            "reader":{
                "name":"streamreader",
                "parameter":{
                    "column":[
                        {
                            "type":"string",
                            "value":"DataX"
                        },
                        {
                            "type":"long",
                            "value":19890604
                        },
                        {
                            "type":"date",
                            "value":"1989-06-04 00:00:00"
                        },
                        {
                            "type":"bool",
                            "value":true
                        },
                        {
                            "type":"bytes",
                            "value":"test"
                        }
                    ],
                    "sliceRecordCount":100000
                }
            },
            "writer":{
                "name":"streamwriter",
                "parameter":{
                    "encoding":"UTF-8",
                    "print":false
                }
            }
        }
    ],
    "setting":{
        "errorLimit":{
            "percentage":0.02,
            "record":0
        },
        "speed":{
            "byte":10485760
        }
    }
}
  2024-10-13 21:04:41.599 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2024-10-13 21:04:41.601 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2024-10-13 21:04:41.601 [main] INFO  JobContainer - DataX jobContainer starts job.
2024-10-13 21:04:41.604 [main] INFO  JobContainer - Set jobId = 0
2024-10-13 21:04:41.623 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2024-10-13 21:04:41.624 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .
2024-10-13 21:04:41.624 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2024-10-13 21:04:41.624 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2024-10-13 21:04:41.625 [job-0] INFO  JobContainer - Job set Max-Byte-Speed to 10485760 bytes.
2024-10-13 21:04:41.626 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2024-10-13 21:04:41.627 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2024-10-13 21:04:41.649 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2024-10-13 21:04:41.654 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2024-10-13 21:04:41.657 [job-0] INFO  JobContainer - Running by standalone Mode.
2024-10-13 21:04:41.666 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2024-10-13 21:04:41.671 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2024-10-13 21:04:41.672 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2024-10-13 21:04:41.685 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2024-10-13 21:04:41.986 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[302]ms
2024-10-13 21:04:41.987 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2024-10-13 21:04:51.677 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.022s |  All Task WaitReaderTime 0.040s | Percentage 100.00%
2024-10-13 21:04:51.677 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2024-10-13 21:04:51.678 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2024-10-13 21:04:51.678 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2024-10-13 21:04:51.678 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2024-10-13 21:04:51.680 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /cluster/datax/hook
2024-10-13 21:04:51.682 [job-0] INFO  JobContainer - 
     [total cpu info] => 
        averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
        -1.00%                         | -1.00%                         | -1.00%
                        
       [total gc info] => 
         NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
         PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
         PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
  2024-10-13 21:04:51.682 [job-0] INFO  JobContainer - PerfTrace not enable!
2024-10-13 21:04:51.683 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.022s |  All Task WaitReaderTime 0.040s | Percentage 100.00%
2024-10-13 21:04:51.684 [job-0] INFO  JobContainer - 
使命启动时刻                    : 2024-10-13 21:04:41
使命结束时刻                    : 2024-10-13 21:04:51
使命总计耗时                    :                 10s
使命均匀流量                    :          253.91KB/s
记录写入速度                    :          10000rec/s
读出记录总数                    :              100000
读写失败总数                    :                   0
  ./bin/datax.py -r hdfsreader -w mysqlwriter 

3、安装datax-web

3.0、下载datax-web的源码,进行编译

   git@github.com:WeiYe-Jing/datax-web.git
  mvn -U clean package assembly:assembly -Dmaven.test.skip=true
  

  • 打包成功后的DataX包位于 {DataX_source_code_home}/target/datax/datax/ ,结构如下:
    1. $ cd  {DataX_source_code_home}
    2. $ ls ./target/datax/datax/
    3. bin                conf                job                lib                log                log_perf        plugin                script                tmp
    复制代码
  
3.1、在MySQL中创建datax-web元数据

   mysql -u root -p
  password:******
  
  CREATE DATABASE dataxweb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
  
  use dataxweb;
  3.2、安装data-web 

3.2.1实行install.sh命令解压部署

   在/cluster/datax-web-2.1.2/bin目录下
  实行./install.sh
  根据提示实行
  

实行完成后,会解压文件,并初始化数据库。
 

3.2.1、手动修改 datax-admin配置文件

   /cluster/datax-web-2.1.2/modules/datax-admin/bin/env.properties
  内容如下
   # environment variables
  JAVA_HOME=/java/jdk
  WEB_LOG_PATH=/cluster/datax-web-2.1.2/modules/datax-admin/logs
WEB_CONF_PATH=/cluster/datax-web-2.1.2/modules/datax-admin/conf
  DATA_PATH=/cluster/datax-web-2.1.2/modules/datax-admin/data
SERVER_PORT=6895
  PID_FILE_PATH=/cluster/datax-web-2.1.2/modules/datax-admin/dataxadmin.pid
  
# mail account
MAIL_USERNAME="example@qq.com"
MAIL_PASSWORD="*********************"
  
#debug
REMOTE_DEBUG_SWITCH=true
REMOTE_DEBUG_PORT=7223
 
   3.2.2、手动修改 datax-executor配置文件

   /cluster/datax-web-2.1.2/modules/datax-executor/bin/env.properties
  内容如下:重要是PYTHON_PATH=/cluster/datax/bin/datax.py
   # environment variables
  JAVA_HOME=/java/jdk
  SERVICE_LOG_PATH=/cluster/datax-web-2.1.2/modules/datax-executor/logs
SERVICE_CONF_PATH=/cluster/datax-web-2.1.2/modules/datax-executor/conf
DATA_PATH=/cluster/datax-web-2.1.2/modules/datax-executor/data
  
## datax json文件存放位置
JSON_PATH=/cluster/datax-web-2.1.2/modules/datax-executor/json
  
## executor_port
EXECUTOR_PORT=9999
  
## 保持和datax-admin端口一致
DATAX_ADMIN_PORT=6895
  ## PYTHON脚本实行位置
PYTHON_PATH=/cluster/datax/bin/datax.py
  ## dataxweb 服务端口
SERVER_PORT=9504
  PID_FILE_PATH=/cluster/datax-web-2.1.2/modules/datax-executor/service.pid
  
#debug 远程调试端口
REMOTE_DEBUG_SWITCH=true
REMOTE_DEBUG_PORT=7224
 
    3.2.3、更换datax下的python实行文件

   

  • Python (2.x) (支持Python3需要修改更换datax/bin下面的三个python文件,更换文件在doc/datax-web/datax-python3下) 必选,重要用于调度实行底层DataX的启动脚本,默认的方式是以Java子历程方式实行DataX,用户可以选择以Python方式来做自定义的改造
  • 这个单个文件可从datax-web源码目录中获取
  

 3.2.4、更换MySQLjar文件

   datax下的mysql reader和writer的jar版本过低,使用的mysql8数据库,需要更换下jar文件
  路径是:
  /cluster/datax/plugin/writer/mysqlwriter/libs/mysql-connector-j-8.3.0.jar
  和
  /cluster/datax/plugin/reader/mysqlreader/libs/mysql-connector-j-8.3.0.jar
  

4、创建MySQL和Hive数据

4.1、创建MySQL数据库

   =================================================================================================================================
2024年10月13日 星期日 第41周 00:07:20
mysql建表
-- m31094.mm definition
=================================================================================================================================
CREATE TABLE `mm` (
  `uuid` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,
  `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `time` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `age` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `sex` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `job` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `address` text COLLATE utf8mb4_unicode_ci,
  PRIMARY KEY (`uuid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
  不设置主键uuid
  4.2、创建Hive数据库 

   hive建表
create database m31094;
drop table m31094.mm;
CREATE TABLE m31094.mm (
 `uuid` STRING COMMENT '主键',
 `name` STRING COMMENT '姓名',
 `time` STRING COMMENT '时间',
 `age` STRING COMMENT '年事',
 `sex` STRING COMMENT '性别',
 `job` STRING COMMENT '工作',
 `address` STRING COMMENT '所在'
) COMMENT '美女表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
  建表信息:

hive查询表形貌信息
    DESCRIBE FORMATTED m31094.mm;
 

   
  5、配置datax和datax-web

5.1、启动datax-web服务

启动命令
   /cluster/datax-web-2.1.2/modules/datax-admin/bin/datax-admin.sh restart
  tail -f /cluster/datax-web-2.1.2/modules/datax-admin/bin/console.out
  
/cluster/datax-web-2.1.2/modules/datax-executor/bin/datax-executor.sh restart
tail -f /cluster/datax-web-2.1.2/modules/datax-executor/bin/console.out
   查看历程的命令jps -l

kill历程的命令
   sudo kill -9 $(ps -ef|grep datax|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')
   datax-executor需要启动成功,可在页面查看主动注册的信息。


5.2、页面访问及配置

   http://ip:6895/index.html 
  6895是自定义端口,根据现实设置
  登任命户名、暗码:admin/123456
  5.2.1创建项目 



5.2.2、添加mysql和hive的数据源


MySQL

HIVE 

5.2.3创建DataX使命模板


5.2.4、使命构建

步骤一、配置输入

步骤二、配置输出

步骤三、 字段映射

步骤四、构建、选择使命模板,复制JSON、下一步操作


生成JSON模板内容 

   
{
  "job": {
    "setting": {
      "speed": {
        "channel": 3,
        "byte": 1048576
      },
      "errorLimit": {
        "record": 0,
        "percentage": 0.02
      }
    },
    "content": [
      {
        "reader": {
          "name": "hdfsreader",
          "parameter": {
            "path": "/cluster/hive/warehouse/m31094.db/mm",
            "defaultFS": "hdfs://10.7.215.33:8020",
            "fileType": "text",
            "fieldDelimiter": ",",
            "skipHeader": false,
            "column": [
              {
                "index": "0",
                "type": "string"
              },
              {
                "index": "1",
                "type": "string"
              },
              {
                "index": "2",
                "type": "string"
              },
              {
                "index": "3",
                "type": "string"
              },
              {
                "index": "4",
                "type": "string"
              },
              {
                "index": "5",
                "type": "string"
              },
              {
                "index": "6",
                "type": "string"
              }
            ]
          }
        },
        "writer": {
          "name": "mysqlwriter",
          "parameter": {
            "username": "yRjwDFuoPKlqya9h9H2Amg==",
            "password": "XCYVpFosvZBBWobFzmLWvA==",
            "column": [
              "`uuid`",
              "`name`",
              "`time`",
              "`age`",
              "`sex`",
              "`job`",
              "`address`"
            ],
            "connection": [
              {
                "table": [
                  "mm"
                ],
                "jdbcUrl": "jdbc:mysql://10.7.215.33:3306/m31094"
              }
            ]
          }
        }
      }
    ]
  }
}

  5.2.5、查询创建的使命


 5.2.6、在Hive中实行数据插入

   insert into m31094.mm values('9','hive数据使用datax同步到MySQL',from_unixtime(unix_timestamp()),'1000000000090101','北京','新疆','加油');
   控制台输出内容:


6、查看效果

日志实行环境


  
  1. 2024-10-13 21:30:00 [JobThread.run-130] <br>----------- datax-web job execute start -----------<br>----------- Param:
  2. 2024-10-13 21:30:00 [BuildCommand.buildDataXParam-100] ------------------Command parameters:
  3. 2024-10-13 21:30:00 [ExecutorJobHandler.execute-57] ------------------DataX process id: 95006
  4. 2024-10-13 21:30:00 [ProcessCallbackThread.callbackLog-186] <br>----------- datax-web job callback finish.
  5. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]
  6. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
  7. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
  8. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]
  9. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]
  10. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.588 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
  11. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.597 [main] INFO  Engine - the machine info  =>
  12. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]
  13. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         osInfo:        Oracle Corporation 1.8 25.40-b25
  14. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         jvmInfo:        Linux amd64 5.8.13-1.el7.elrepo.x86_64
  15. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         cpu num:        8
  16. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]
  17. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         totalPhysicalMemory:        -0.00G
  18. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         freePhysicalMemory:        -0.00G
  19. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         maxFileDescriptorCount:        -1
  20. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         currentOpenFileDescriptorCount:        -1
  21. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]
  22. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         GC Names        [PS MarkSweep, PS Scavenge]
  23. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]
  24. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         MEMORY_NAME                    | allocation_size                | init_size                     
  25. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         PS Eden Space                  | 256.00MB                       | 256.00MB                       
  26. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         Code Cache                     | 240.00MB                       | 2.44MB                        
  27. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         Compressed Class Space         | 1,024.00MB                     | 0.00MB                        
  28. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         PS Survivor Space              | 42.50MB                        | 42.50MB                        
  29. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         PS Old Gen                     | 683.00MB                       | 683.00MB                       
  30. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         Metaspace                      | -0.00MB                        | 0.00MB                        
  31. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]
  32. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]
  33. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.622 [main] INFO  Engine -
  34. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] {
  35. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         "content":[
  36. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                 {
  37. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                         "reader":{
  38. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                 "name":"hdfsreader",
  39. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                 "parameter":{
  40. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                         "column":[
  41. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 {
  42. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "index":"0",
  43. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "type":"string"
  44. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 },
  45. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 {
  46. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "index":"1",
  47. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "type":"string"
  48. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 },
  49. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 {
  50. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "index":"2",
  51. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "type":"string"
  52. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 },
  53. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 {
  54. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "index":"3",
  55. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "type":"string"
  56. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 },
  57. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 {
  58. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "index":"4",
  59. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "type":"string"
  60. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 },
  61. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 {
  62. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "index":"5",
  63. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "type":"string"
  64. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 },
  65. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 {
  66. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "index":"6",
  67. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "type":"string"
  68. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 }
  69. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                         ],
  70. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                         "defaultFS":"hdfs://10.7.215.33:8020",
  71. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                         "fieldDelimiter":",",
  72. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                         "fileType":"text",
  73. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                         "path":"hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm",
  74. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                         "skipHeader":false
  75. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                 }
  76. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                         },
  77. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                         "writer":{
  78. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                 "name":"mysqlwriter",
  79. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                 "parameter":{
  80. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                         "column":[
  81. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 "`uuid`",
  82. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 "`name`",
  83. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 "`time`",
  84. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 "`age`",
  85. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 "`sex`",
  86. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 "`job`",
  87. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 "`address`"
  88. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                         ],
  89. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                         "connection":[
  90. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 {
  91. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "jdbcUrl":"jdbc:mysql://10.7.215.33:3306/m31094",
  92. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         "table":[
  93. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                                 "mm"
  94. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                         ]
  95. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                                 }
  96. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                         ],
  97. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                         "password":"******",
  98. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                         "username":"root"
  99. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                                 }
  100. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                         }
  101. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                 }
  102. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         ],
  103. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         "setting":{
  104. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                 "errorLimit":{
  105. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                         "percentage":0.02,
  106. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                         "record":0
  107. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                 },
  108. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                 "speed":{
  109. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                         "byte":1048576,
  110. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                         "channel":3
  111. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]                 }
  112. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]         }
  113. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] }
  114. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53]
  115. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.645 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
  116. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.647 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
  117. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.648 [main] INFO  JobContainer - DataX jobContainer starts job.
  118. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.650 [main] INFO  JobContainer - Set jobId = 0
  119. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.667 [job-0] INFO  HdfsReader$Job - init() begin...
  120. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.993 [job-0] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":[]}
  121. 2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.994 [job-0] INFO  HdfsReader$Job - init() ok and end...
  122. 2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.580 [job-0] INFO  OriginalConfPretreatmentUtil - table:[mm] all columns:[
  123. 2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] uuid,name,time,age,sex,job,address
  124. 2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] ].
  125. 2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.613 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
  126. 2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] INSERT INTO %s (`uuid`,`name`,`time`,`age`,`sex`,`job`,`address`) VALUES(?,?,?,?,?,?,?)
  127. 2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] ], which jdbcUrl like:[jdbc:mysql://10.7.215.33:3306/m31094?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
  128. 2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.614 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
  129. 2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.614 [job-0] INFO  JobContainer - DataX Reader.Job [hdfsreader] do prepare work .
  130. 2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.614 [job-0] INFO  HdfsReader$Job - prepare(), start to getAllFiles...
  131. 2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.614 [job-0] INFO  HdfsReader$Job - get HDFS all files in path = [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm]
  132. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.699 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0]是[text]类型的文件, 将该文件加入source files列表
  133. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.709 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_1]是[text]类型的文件, 将该文件加入source files列表
  134. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.718 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_2]是[text]类型的文件, 将该文件加入source files列表
  135. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.728 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_3]是[text]类型的文件, 将该文件加入source files列表
  136. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.737 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_4]是[text]类型的文件, 将该文件加入source files列表
  137. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.759 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_5]是[text]类型的文件, 将该文件加入source files列表
  138. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.771 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_6]是[text]类型的文件, 将该文件加入source files列表
  139. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.771 [job-0] INFO  HdfsReader$Job - 您即将读取的文件数为: [7], 列表为: [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_5,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_4,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_3,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_2,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_1,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_6]
  140. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.772 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
  141. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.773 [job-0] INFO  JobContainer - jobContainer starts to do split ...
  142. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.774 [job-0] INFO  JobContainer - Job set Max-Byte-Speed to 1048576 bytes.
  143. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.775 [job-0] INFO  HdfsReader$Job - split() begin...
  144. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.777 [job-0] INFO  JobContainer - DataX Reader.Job [hdfsreader] splits to [7] tasks.
  145. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.779 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [7] tasks.
  146. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.797 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
  147. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.810 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
  148. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.813 [job-0] INFO  JobContainer - Running by standalone Mode.
  149. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.825 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [7] tasks.
  150. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.830 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
  151. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.831 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
  152. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.845 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] attemptCount[1] is started
  153. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.883 [0-0-2-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
  154. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.885 [0-0-2-reader] INFO  Reader$Task - read start
  155. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.886 [0-0-2-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_3]
  156. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.918 [0-0-2-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":""","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
  157. 2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.925 [0-0-2-reader] INFO  Reader$Task - end read source files...
  158. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.247 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] is successed, used[403]ms
  159. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.250 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[5] attemptCount[1] is started
  160. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.286 [0-0-5-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
  161. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.287 [0-0-5-reader] INFO  Reader$Task - read start
  162. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.287 [0-0-5-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0]
  163. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.290 [0-0-5-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":""","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
  164. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.292 [0-0-5-reader] INFO  Reader$Task - end read source files...
  165. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.351 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[5] is successed, used[101]ms
  166. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.354 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[4] attemptCount[1] is started
  167. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.379 [0-0-4-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
  168. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.380 [0-0-4-reader] INFO  Reader$Task - read start
  169. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.380 [0-0-4-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_1]
  170. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.384 [0-0-4-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":""","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
  171. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.386 [0-0-4-reader] INFO  Reader$Task - end read source files...
  172. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.454 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[4] is successed, used[101]ms
  173. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.457 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
  174. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.486 [0-0-0-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
  175. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.487 [0-0-0-reader] INFO  Reader$Task - read start
  176. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.487 [0-0-0-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_5]
  177. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.489 [0-0-0-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":""","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
  178. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.491 [0-0-0-reader] INFO  Reader$Task - end read source files...
  179. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.558 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[101]ms
  180. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.561 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[1] attemptCount[1] is started
  181. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.587 [0-0-1-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
  182. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.588 [0-0-1-reader] INFO  Reader$Task - read start
  183. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.588 [0-0-1-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_4]
  184. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.592 [0-0-1-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":""","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
  185. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.594 [0-0-1-reader] INFO  Reader$Task - end read source files...
  186. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.662 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[1] is successed, used[101]ms
  187. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.664 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[3] attemptCount[1] is started
  188. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.691 [0-0-3-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
  189. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.691 [0-0-3-reader] INFO  Reader$Task - read start
  190. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.691 [0-0-3-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_2]
  191. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.694 [0-0-3-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":""","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
  192. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.696 [0-0-3-reader] INFO  Reader$Task - end read source files...
  193. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.765 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[3] is successed, used[101]ms
  194. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.768 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[6] attemptCount[1] is started
  195. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.791 [0-0-6-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
  196. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.791 [0-0-6-reader] INFO  Reader$Task - read start
  197. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.791 [0-0-6-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_6]
  198. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.795 [0-0-6-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":""","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
  199. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.798 [0-0-6-reader] INFO  Reader$Task - end read source files...
  200. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.868 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[6] is successed, used[100]ms
  201. 2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.869 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
  202. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.838 [job-0] INFO  StandAloneJobContainerCommunicator - Total 7 records, 282 bytes | Speed 28B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.028s | Percentage 100.00%
  203. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.838 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
  204. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.838 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
  205. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.839 [job-0] INFO  JobContainer - DataX Reader.Job [hdfsreader] do post work.
  206. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.839 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
  207. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.840 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /cluster/datax/hook
  208. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.841 [job-0] INFO  JobContainer -
  209. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53]          [total cpu info] =>
  210. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53]                 averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
  211. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53]                 -1.00%                         | -1.00%                         | -1.00%
  212. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53]                        
  213. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53]
  214. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53]          [total gc info] =>
  215. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53]                  NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
  216. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53]                  PS MarkSweep         | 1                  | 1                  | 1                  | 0.040s             | 0.040s             | 0.040s            
  217. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53]                  PS Scavenge          | 1                  | 1                  | 1                  | 0.022s             | 0.022s             | 0.022s            
  218. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53]
  219. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.841 [job-0] INFO  JobContainer - PerfTrace not enable!
  220. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.842 [job-0] INFO  StandAloneJobContainerCommunicator - Total 7 records, 282 bytes | Speed 28B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.028s | Percentage 100.00%
  221. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.843 [job-0] INFO  JobContainer -
  222. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 任务启动时刻                    : 2024-10-13 21:30:00
  223. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 任务结束时刻                    : 2024-10-13 21:30:12
  224. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 任务总计耗时                    :                 12s
  225. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 任务平均流量                    :               28B/s
  226. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 记录写入速度                    :              0rec/s
  227. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 读出记录总数                    :                   7
  228. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 读写失败总数                    :                   0
  229. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53]
  230. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
  231. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 十月 13, 2024 9:30:01 下午 org.apache.hadoop.util.NativeCodeLoader <clinit>
  232. 2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  233. 2024-10-13 21:30:12 [JobThread.run-165] <br>----------- datax-web job execute end(finish) -----------<br>----------- ReturnT:ReturnT [code=200, msg=LogStatistics{taskStartTime=2024-10-13 21:30:00, taskEndTime=2024-10-13 21:30:12, taskTotalTime=12s, taskAverageFlow=28B/s, taskRecordWritingSpeed=0rec/s, taskRecordReaderNum=7, taskRecordWriteFailNum=0}, content=null]
  234. 2024-10-13 21:30:12 [TriggerCallbackThread.callbackLog-186] <br>----------- datax-web job callback finish.
复制代码

   查询MySQL数据库查询数据同步


免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

x
回复

使用道具 举报

0 个回复

倒序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

诗林

金牌会员
这个人很懒什么都没写!

标签云

快速回复 返回顶部 返回列表