开发背景:常用的MyBatis-plus批插效率不高,耗时长。
采取方案:采用JDBC批插方式并使用ThreadPoolTaskExecutor多线程处理批量数据
1.数据库配置(数据库连接需要加上rewriteBatchedStatements=true配置,非常关键!)
jdbc:mysql://127.0.0.1:3306/xxxx?serverTimezone=Asia/Shanghai&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=convertToNull&useSSL=false&allowPublicKeyRetrieval=true&rewriteBatchedStatements=true
2.线程池配置(线程数不是越多越好,具体多少合适,网上有一个不成文的算法:CPU核心数量*2
+2 个线程)
# 异步线程配置
# 配置核心线程数
async.executor.thread.core_pool_size = 30
# 配置最大线程数
async.executor.thread.max_pool_size = 30
# 配置队列大小
async.executor.thread.queue_capacity = 99988
# 配置线程池中的线程的名称前缀
async.executor.thread.name.prefix = async-importDB-
@Configuration
@EnableAsync
@Slf4j
public class ExecutorConfig {
@Value("${async.executor.thread.core_pool_size}")
private int corePoolSize;
@Value("${async.executor.thread.max_pool_size}")
private int maxPoolSize;
@Value("${async.executor.thread.queue_capacity}")
private int queueCapacity;
@Value("${async.executor.thread.name.prefix}")
private String namePrefix;
@Bean(name = "asyncServiceExecutor")
public Executor asyncServiceExecutor() {
log.warn("start asyncServiceExecutor");
//在这里修改
ThreadPoolTaskExecutor executor = new VisiableThreadPoolTaskExecutor();
//配置核心线程数
executor.setCorePoolSize(corePoolSize);
//配置最大线程数
executor.setMaxPoolSize(maxPoolSize);
//配置队列大小
executor.setQueueCapacity(queueCapacity);
//配置线程池中的线程的名称前缀
executor.setThreadNamePrefix(namePrefix);
// rejection-policy:当pool已经达到max size的时候,如何处理新任务
// CALLER_RUNS:不在新线程中执行任务,而是有调用者所在的线程来执行
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
//执行初始化
executor.initialize();
return executor;
}
}
@Async("asyncServiceExecutor")
public void executeAsyncCarKind(List tzCarKinds, CountDownLatch countDownLatch) {
Connection conn = null;
PreparedStatement ps = null;
try {
DynamicDataSourceContextHolder.push("cloud");
System.out.println("异步方法开始执行");
String sql = "INSERT INTO XXXXXX ( id, series_id, name, valid_tag, show_order, create_time, years, pinyin,short_letter,newcar_price, output_volume, gear_type, env_level, seat_number,min_reg_year, max_reg_year, version_number, version ,jzg_style_id,user_matching,vid,check_tag) VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? ,?,?,?,?,?,?,?)";
conn = dataSource.getConnection();
ps = conn.prepareStatement(sql);
conn.setAutoCommit(false);//取消自动提交
int i = 0;
for (TzCarKind tzCarKind : tzCarKinds) {
ps.setLong(1, (tzCarKind.getId()));
ps.setLong(2, tzCarKind.getSeriesId());
ps.setString(3, StringUtils.isNotEmpty(tzCarKind.getName()) ? tzCarKind.getName() : "");
ps.setString(4, StringUtils.isNotEmpty(tzCarKind.getValidTag()) ? tzCarKind.getValidTag() : "");
ps.setInt(5, tzCarKind.getShowOrder() != null ? tzCarKind.getShowOrder() : Types.INTEGER);
ps.setDate(6, tzCarKind.getCreateTime() != null ? new Date(tzCarKind.getCreateTime().getTime()) : null);
ps.setString(7, StringUtils.isNotEmpty(tzCarKind.getYears()) ? tzCarKind.getYears() : "");
ps.setString(8, StringUtils.isNotEmpty(tzCarKind.getPinyin()) ? tzCarKind.getPinyin() : "");
ps.setString(9, StringUtils.isNotEmpty(tzCarKind.getShortLetter()) ? tzCarKind.getShortLetter() : "");
ps.setInt(10, tzCarKind.getNewcarPrice() != null ? tzCarKind.getNewcarPrice() : Types.INTEGER);
ps.setString(11, StringUtils.isNotEmpty(tzCarKind.getOutputVolume()) ? tzCarKind.getOutputVolume() : "");
ps.setString(12, StringUtils.isNotEmpty(tzCarKind.getGearType()) ? tzCarKind.getGearType() : "");
ps.setString(13, StringUtils.isNotEmpty(tzCarKind.getEnvLevel()) ? tzCarKind.getEnvLevel() : "");
ps.setString(14, StringUtils.isNotEmpty(tzCarKind.getSeatNumber()) ? tzCarKind.getSeatNumber() : "");
ps.setString(15, StringUtils.isNotEmpty(tzCarKind.getMinRegYear()) ? tzCarKind.getMinRegYear() : "");
ps.setString(16, StringUtils.isNotEmpty(tzCarKind.getMaxRegYear()) ? tzCarKind.getMaxRegYear() : "");
ps.setString(17, StringUtils.isNotEmpty(tzCarKind.getVersionNumber()) ? tzCarKind.getVersionNumber() : "");
ps.setLong(18, tzCarKind.getVersion() != null ? tzCarKind.getVersion() : Types.BIGINT);
ps.setString(19, StringUtils.isNotEmpty(tzCarKind.getJzgStyleId()) ? tzCarKind.getJzgStyleId() : "");
ps.setString(20, StringUtils.isNotEmpty(tzCarKind.getUserMatching()) ? tzCarKind.getUserMatching() : "");
ps.setString(21, StringUtils.isNotEmpty(tzCarKind.getVid()) ? tzCarKind.getVid() : "");
ps.setString(22, StringUtils.isNotEmpty(tzCarKind.getCheckTag()) ? tzCarKind.getCheckTag() : "");
ps.addBatch();
if (i % 500 == 0) {
ps.executeBatch();//将容器中的sql语句提交
ps.clearBatch();//清空容器,为下一次打包做准备
}
i++;
}
//为防止有sql语句漏提交【如i结束时%500!=0的情况】,需再次提交sql语句
ps.executeBatch();
ps.clearBatch();
conn.commit();
System.out.println("异步方法执行完毕");
} catch (Exception e) {
log.error("批量插入数据异常", e);
} finally {
countDownLatch.countDown();
DynamicDataSourceContextHolder.poll();
close(conn, ps);
}
}
/**
* JDBC数据库连接关闭
*
* @param conn
* @param ps
*/
private void close(Connection conn, PreparedStatement ps) {
try {
if (ps != null) ps.close();
} catch (Exception e) {
log.error("关闭数据库连接异常", e);
}
try {
if (conn != null) conn.close();
} catch (Exception e) {
log.error("关闭数据库连接异常", e);
}
}
ps:本文的策略是将数据全部查出再进行分段,数据量大的情况下非常占用内存,可能会出现内存溢出的情况,此处建议采用分页查询方式对数据进行分段处理。
@Override
public void asyncKind() {
DynamicDataSourceContextHolder.push("cloud");
List list = tfCKindService.list();
List tzCarKinds1 = list();
List tzCarKinds2 = BeanUtil.copyToList(list, TzCarKind.class);
List collect = tzCarKinds2.stream().map(TzCarKind::getId).collect(Collectors.toList());
ArrayList kindArrayList = new ArrayList<>(tzCarKinds2);
for (TzCarKind tzCarKind : tzCarKinds1) {
if (!collect.contains(tzCarKind.getId())) {
kindArrayList.add(tzCarKind);
}
}
tzCarKindMapper.removeAll();
CountDownLatch countDownLatch;
try {
List> partition = ListUtil.partition(kindArrayList, 5000);
countDownLatch = new CountDownLatch(partition.size());
for (List tzCarKinds : partition) {
executeAsyncCarKind(tzCarKinds, countDownLatch);
}
} finally {
DynamicDataSourceContextHolder.poll();
}
}
ORM框架 | 线程状态 | 数据量(条) | 执行时间 |
---|---|---|---|
MyBatis-Plus | 单线程 | 21W | 967.759s |
JDBC | 多线程 | 21W | 350.862s |
通过以上测试,我们发现进过优化后的批量插入性能得到提升,从原先的16分钟优化到了6分多钟。此方法可以进一步优化,可以通过批量查询的方式对数据进行分段处理,能大大减少数据处理的速度,使得批插速度能快。
ps:在设置线程池配置的时候应根据本机电脑配置进行动态调整。线程数不是越多越好,具体多少合适,网上有一个不成文的算法:CPU核心数量*2
+2 个线程