DataX-数据迁移Oracle到Mysql-ETL工具

一、安装

https://github.com/WeiYe-Jing/datax-web/blob/master/doc/datax-web/datax-web-deploy.md

  • 1、直接下载DataX工具包:DataX下载地址

    下载后解压至本地某个目录,进入bin目录,即可运行同步作业:

    $ cd  {YOUR_DATAX_HOME}/bin
    $ python datax.py {YOUR_JOB.json}

    自检脚本:    python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json

  • 2、Python (2.x) (支持Python3需要修改替换datax/bin下面的三个python文件,替换文件在doc/datax-web/datax-python3下) 必选,主要用于调度执行底层DataX的启动脚本,默认的方式是以Java子进程方式执行DataX,用户可以选择以Python方式来做自定义的改造

二、测试脚本

C:\DATAX\datax\bin>CHCP 65001
Active code page: 65001

C:\DATAX\datax\bin>python datax.py C:\DATAX\datax\job\test.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2024-03-14 15:02:01.361 [main] INFO  MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN
2024-03-14 15:02:01.363 [main] INFO  MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]
2024-03-14 15:02:01.375 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2024-03-14 15:02:01.381 [main] INFO  Engine - the machine info  =>

        osInfo: Windows 10 amd64 10.0
        jvmInfo:        Oracle Corporation 1.8 25.341-b10
        cpu num:        6

        totalPhysicalMemory:    -0.00G
        freePhysicalMemory:     -0.00G
        maxFileDescriptorCount: -1
        currentOpenFileDescriptorCount: -1

        GC Names        [PS MarkSweep, PS Scavenge]

        MEMORY_NAME                    | allocation_size                | init_size
        PS Eden Space                  | 256.00MB                       | 256.00MB
        Code Cache                     | 240.00MB                       | 2.44MB
        Compressed Class Space         | 1,024.00MB                     | 0.00MB
        PS Survivor Space              | 42.50MB                        | 42.50MB
        PS Old Gen                     | 683.00MB                       | 683.00MB
        Metaspace                      | -0.00MB                        | 0.00MB


2024-03-14 15:02:01.392 [main] INFO  Engine -
{
        "setting":{
                "speed":{
                        "channel":4
                }
        },
        "content":[
                {
                        "reader":{
                                "name":"oraclereader",
                                "parameter":{
                                        "username":"test",
                                        "password":"******",
                                        "column":[
                                                "*"
                                        ],
                                        "connection":[
                                                {
                                                        "table":[
                                                                "CAR.S_U"
                                                        ],
                                                        "jdbcUrl":[
                                                                "jdbc:oracle:thin:@//10.33.51.231:1521/helowin"
                                                        ]
                                                }
                                        ]
                                }
                        },
                        "writer":{
                                "name":"mysqlwriter",
                                "parameter":{
                                        "username":"car",
                                        "password":"******",
                                        "writeMode":"replace",
                                        "column":[
                                                "*"
                                        ],
                                        "connection":[
                                                {
                                                        "jdbcUrl":"jdbc:mysql://10.33.51.231:3306/car",
                                                        "table":[
                                                                "S_U"
                                                        ]
                                                }
                                        ]
                                }
                        }
                }
        ]
}

2024-03-14 15:02:01.417 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false
2024-03-14 15:02:01.418 [main] INFO  JobContainer - DataX jobContainer starts job.
2024-03-14 15:02:01.420 [main] INFO  JobContainer - Set jobId = 0
2024-03-14 15:02:01.623 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:oracle:thin:@//10.33.51.231:1521/helowin.
2024-03-14 15:02:01.624 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置 读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
2024-03-14 15:02:07.748 [job-0] INFO  OriginalConfPretreatmentUtil - table:[S_BU] all columns:[
M_ROW$$,ANNLRVW_ENDT_OFFST,ANNLRVW_STDT_OFFST,BU_FLG,CONFLICT_ID,CREATED,CREATED_BY,CURR_PRD_OBJ_NAME,CURR_PRD_RVW_NAME,DYN_HRCHY_ID,LAST_UPD,LAST_UPD_BY,MODIFICATION_NUM,NAME,PAR_ROW_ID,ROW_ID,X_LANG_ID
].
2024-03-14 15:02:07.748 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写 入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2024-03-14 15:02:07.752 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
replace INTO %s (M_ROW$$,ANNLRVW_ENDT_OFFST,ANNLRVW_STDT_OFFST,BU_FLG,CONFLICT_ID,CREATED,CREATED_BY,CURR_PRD_OBJ_NAME,CURR_PRD_RVW_NAME,DYN_HRCHY_ID,LAST_UPD,LAST_UPD_BY,MODIFICATION_NUM,NAME,PAR_ROW_ID,ROW_ID,X_LANG_ID) VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
], which jdbcUrl like:[jdbc:mysql://10.33.51.231:3306/carmen?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false]
2024-03-14 15:02:07.753 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2024-03-14 15:02:07.753 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] do prepare work .
2024-03-14 15:02:07.754 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2024-03-14 15:02:07.755 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2024-03-14 15:02:07.755 [job-0] INFO  JobContainer - Job set Channel-Number to 4 channels.
2024-03-14 15:02:07.757 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] splits to [1] tasks.
2024-03-14 15:02:07.758 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2024-03-14 15:02:07.778 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2024-03-14 15:02:07.779 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2024-03-14 15:02:07.781 [job-0] INFO  JobContainer - Running by standalone Mode.
2024-03-14 15:02:07.786 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2024-03-14 15:02:07.790 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2024-03-14 15:02:07.790 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2024-03-14 15:02:07.796 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2024-03-14 15:02:07.798 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select * from CARMEN.S_BU
] jdbcUrl:[jdbc:oracle:thin:@//10.33.51.231:1521/helowin].
2024-03-14 15:02:16.306 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select * from CARMEN.S_BU
] jdbcUrl:[jdbc:oracle:thin:@//10.33.51.231:1521/helowin].
2024-03-14 15:02:16.845 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[9050]ms
2024-03-14 15:02:16.847 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2024-03-14 15:02:17.798 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 14644530 bytes | Speed 1.40MB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 6.239s |  All Task WaitReaderTime 1.988s | Percentage 100.00%
2024-03-14 15:02:17.800 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2024-03-14 15:02:17.805 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2024-03-14 15:02:17.805 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] do post work.
2024-03-14 15:02:17.806 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2024-03-14 15:02:17.807 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: C:\DATAX\datax\hook
2024-03-14 15:02:17.808 [job-0] INFO  JobContainer -
         [total cpu info] =>
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu
                -1.00%                         | -1.00%                         | -1.00%


         [total gc info] =>
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime
                 PS MarkSweep         | 1                  | 1                  | 1                  | 0.018s             | 0.018s             | 0.018s
                 PS Scavenge          | 5                  | 5                  | 5                  | 0.028s             | 0.028s             | 0.028s

2024-03-14 15:02:17.808 [job-0] INFO  JobContainer - PerfTrace not enable!
2024-03-14 15:02:17.808 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 14644530 bytes | Speed 1.40MB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 6.239s |  All Task WaitReaderTime 1.988s | Percentage 100.00%
2024-03-14 15:02:17.810 [job-0] INFO  JobContainer -
任务启动时刻                    : 2024-03-14 15:02:01
任务结束时刻                    : 2024-03-14 15:02:17
任务总计耗时                    :                 16s
任务平均流量                    :            1.40MB/s
记录写入速度                    :          10000rec/s
读出记录总数                    :              100000
读写失败总数                    :                   0


C:\DATAX\datax\bin>

三、datax-web: DataX管理调度GitHub - WeiYe-Jing/datax-web: DataX集成可视化页面,选择数据源即可一键生成数据同步任务,支持RDBMS、Hive、HBase、ClickHouse、MongoDB等数据源,批量创建RDBMS数据同步任务,集成开源调度系统,支持分布式、增量同步数据、实时查看运行日志、监控执行器资源、KILL运行进程、数据源信息加密等。

你可能感兴趣的:(【数据库】,etl,数据仓库)