关注墨瑾轩,带你探索编程的奥秘!
超萌技术攻略,轻松晋级编程高手
技术宝库已备好,就等你来挖掘
订阅墨瑾轩,智趣学习不孤单
即刻启航,编程之旅更有趣
“数据散落在10个系统里,生成月报要熬3个通宵?”——别慌!今天我们就用JDBC+Apache Spark+Thymeleaf三剑客,教你如何让Java在EDW中将“数据沼泽”炼成“报告神器”!从“数据找人”到“报告找人”!
问题:Java如何连接EDW数据库并提取数据?需要哪些“装备”?
解决方案:JDBC+HikariCP连接池+Spark SQL,给数据仓库装上“吸尘器”!
// 配置文件(application.properties)
spring.datasource.url=jdbc:mysql://localhost:3306/edw_db
spring.datasource.username=root
spring.datasource.password=root
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
// HikariCP配置类
@Configuration
public class DataSourceConfig {
@Bean
@ConfigurationProperties(prefix = "spring.datasource")
public HikariDataSource dataSource() {
return new HikariDataSource();
}
}
// 数据库工具类(JDBC)
public class JdbcUtil {
private static final HikariDataSource ds = new HikariDataSource();
static {
ds.setJdbcUrl("jdbc:mysql://localhost:3306/edw_db");
ds.setUsername("root");
ds.setPassword("root");
}
public static Connection getConnection() throws SQLException {
return ds.getConnection();
}
}
关键点:
Connection
、Statement
、ResultSet
操作数据库。问题:如何从EDW中提取销售数据?
解决方案:JDBC+预编译SQL+分页查询,让数据“主动上钩”!
public class SalesDataExtractor {
public List<SalesRecord> getMonthlySales(String yearMonth) {
String sql = "SELECT product_id, SUM(amount) AS total " +
"FROM sales " +
"WHERE DATE_FORMAT(sale_date, '%Y-%m') = ? " +
"GROUP BY product_id";
try (Connection conn = JdbcUtil.getConnection();
PreparedStatement pstmt = conn.prepareStatement(sql)) {
pstmt.setString(1, yearMonth); // 预编译防止SQL注入
ResultSet rs = pstmt.executeQuery();
List<SalesRecord> records = new ArrayList<>();
while (rs.next()) {
SalesRecord record = new SalesRecord();
record.setProductId(rs.getInt("product_id"));
record.setTotal(rs.getDouble("total"));
records.add(record);
}
return records;
} catch (SQLException e) {
throw new RuntimeException("数据提取失败", e);
}
}
}
// 数据模型类(SalesRecord.java)
public class SalesRecord {
private int productId;
private double total;
// Getters and Setters
}
关键点:
LIMIT
和OFFSET
,避免一次性加载百万级数据。sale_date
和product_id
,查询速度提升10倍!问题:如何计算销售额的平均值和方差?
解决方案:Apache Commons Math+Spark RDD,让统计指标“秒出”!
import org.apache.commons.math3.stat.descriptive.moment.Mean;
import org.apache.commons.math3.stat.descriptive.moment.Variance;
public class SalesAnalyzer {
public static void analyze(List<SalesRecord> records) {
double[] totals = records.stream()
.mapToDouble(SalesRecord::getTotal)
.toArray();
Mean mean = new Mean();
double average = mean.evaluate(totals);
Variance variance = new Variance();
double var = variance.evaluate(totals);
System.out.println("平均销售额:" + average);
System.out.println("销售额方差:" + var);
}
}
// Spark配置(SparkSession)
SparkSession spark = SparkSession.builder()
.appName("EDWAnalysis")
.master("local[*]")
.getOrCreate();
// 从数据库读取数据到RDD
Dataset<Row> salesDF = spark.read()
.format("jdbc")
.option("url", "jdbc:mysql://localhost:3306/edw_db")
.option("dbtable", "(SELECT * FROM sales) AS sales_temp")
.load();
// 计算总销售额(Spark SQL)
Dataset<Row> result = salesDF
.groupBy("product_id")
.agg(sum("amount").alias("total_sales"));
// 将结果转换为Java对象
List<SalesRecord> records = result.as(Encoders.bean(SalesRecord.class))
.collectAsList();
// 关闭Spark
spark.stop();
关键点:
问题:如何将分析结果生成可视化报告?
解决方案:Thymeleaf模板+PDF导出,让报告“一键生成”!
DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org">
<head>
<title>月度销售报告title>
head>
<body>
<h1>产品销售分析报告h1>
<table>
<tr>
<th>产品IDth>
<th>总销售额th>
tr>
<tr th:each="record : ${records}">
<td th:text="${record.productId}">td>
<td th:text="${record.total}">td>
tr>
table>
<p>平均销售额:[[${average}]]p>
<p>销售额方差:[[${variance}]]p>
body>
html>
public class ReportGenerator {
public void generatePdf(List<SalesRecord> records, double average, double variance) {
try (PdfWriter writer = new PdfWriter("monthly_sales_report.pdf");
PdfDocument pdfDoc = new PdfDocument(writer);
Document document = new Document(pdfDoc)) {
document.add(new Paragraph("月度销售报告"));
// 添加表格
Table table = new Table(2);
table.addCell("产品ID");
table.addCell("总销售额");
for (SalesRecord record : records) {
table.addCell(String.valueOf(record.getProductId()));
table.addCell(String.valueOf(record.getTotal()));
}
document.add(table);
// 添加统计指标
document.add(new Paragraph("平均销售额:" + average));
document.add(new Paragraph("销售额方差:" + variance));
} catch (IOException e) {
throw new RuntimeException("PDF生成失败", e);
}
}
}
关键点:
问题:如何每天自动生成并发送报告?
解决方案:Quartz调度器+邮件通知,让报告“自动巡逻”!
// 定时任务类
public class ReportTask implements Job {
@Override
public void execute(JobExecutionContext context) throws JobExecutionException {
// 1. 提取数据
List<SalesRecord> records = new SalesDataExtractor().getMonthlySales("2023-09");
// 2. 分析数据
double average = new SalesAnalyzer().calculateAverage(records);
double variance = new SalesAnalyzer().calculateVariance(records, average);
// 3. 生成PDF
new ReportGenerator().generatePdf(records, average, variance);
// 4. 发送邮件(略)
}
}
// 配置定时任务(Spring Boot)
@Configuration
public class SchedulerConfig {
@Bean
public JobDetail reportJobDetail() {
return JobBuilder.newJob(ReportTask.class)
.withIdentity("reportJob")
.build();
}
@Bean
public Trigger reportJobTrigger() {
return TriggerBuilder.newTrigger()
.forJob(reportJobDetail())
.withSchedule(CronScheduleBuilder.cronSchedule("0 0 2 * * ?")) // 每天凌晨2点
.build();
}
}
关键点:
0 0 2 * * ?
表示每天凌晨2点执行。某电商EDW优化前:
优化后:
关键操作:
JOIN
操作并行化。通过这五步,你已经掌握了Java在EDW数据分析与报告的核心技术!记住: