MySQL高级查询:聚合与分组分析实战指南

一、聚合函数深度解析

1. 五大核心聚合函数

-- 统计员工数量
SELECT COUNT(*) AS total_employees FROM employees;

-- 计算平均薪资(排除NULL)
SELECT AVG(IFNULL(salary, 0)) AS avg_salary FROM employees;

-- 获取最高/最低薪资
SELECT 
    MAX(salary) AS max_salary,
    MIN(salary) AS min_salary 
FROM employees;

-- 薪资总和
SELECT SUM(salary) AS total_salary_cost FROM employees;

函数特性对比

函数 是否忽略NULL 适用数据类型 典型应用场景
COUNT() 任意类型 统计记录数量
SUM() 数值类型 计算总和
AVG() 数值类型 计算平均值
MAX() 数值/日期/字符串 找最大值
MIN() 数值/日期/字符串 找最小值

2. 高级聚合技巧

-- 多列聚合计算
SELECT 
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary,
    SUM(salary) * 12 AS annual_cost
FROM employees
WHERE department = '研发';

-- 配合DISTINCT去重统计
SELECT 
    COUNT(DISTINCT department) AS dept_types,
    AVG(DISTINCT salary) AS unique_salary_avg
FROM employees;

二、分组查询(GROUP BY)实战

1. 基础分组应用

-- 按部门统计员工数和平均薪资
SELECT 
    department,
    COUNT(*) AS emp_count,
    ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department;

执行顺序解析

  1. FROM 子句指定数据源

  2. WHERE 子句过滤行(如果有)

  3. GROUP BY 子句分组

  4. 计算聚合函数

  5. HAVING 子句过滤组(如果有)

  6. SELECT 选择输出列

2. 多列分组

 

-- 按部门和性别双重分组
SELECT 
    department,
    gender,
    COUNT(*) AS count,
    MAX(salary) AS max_salary
FROM employees
GROUP BY department, gender
ORDER BY department, gender;

3. HAVING与WHERE的区别

-- WHERE在分组前过滤,HAVING在分组后过滤
SELECT 
    department,
    AVG(salary) AS avg_salary
FROM employees
WHERE hire_date > '2020-01-01'  -- 先过滤新员工
GROUP BY department
HAVING AVG(salary) > 15000;     -- 再筛选高薪部门

对比总结

特性 WHERE HAVING
执行时机 分组前 分组后
可用字段 原始表字段 分组字段/聚合值
索引利用 能使用索引 不能使用索引
性能影响 减少分组数据量 只过滤结果

三、实战案例:销售数据分析

1. 数据准备

CREATE TABLE sales (
    id INT PRIMARY KEY AUTO_INCREMENT,
    product_name VARCHAR(50) NOT NULL,
    region VARCHAR(20) NOT NULL,
    sale_date DATE NOT NULL,
    amount DECIMAL(10,2) NOT NULL,
    salesperson_id INT
);

INSERT INTO sales VALUES
(NULL, '笔记本电脑', '华东', '2023-01-15', 8999.00, 101),
(NULL, '智能手机', '华北', '2023-01-16', 5999.00, 102),
(NULL, '平板电脑', '华南', '2023-01-16', 4999.00, 103),
(NULL, '笔记本电脑', '华东', '2023-01-17', 7999.00, 101),
(NULL, '智能手表', '华北', '2023-01-18', 2999.00, 102);

2. 多维分析查询

-- 按产品和地区统计销售额
SELECT 
    product_name,
    region,
    COUNT(*) AS sale_count,
    SUM(amount) AS total_amount,
    AVG(amount) AS avg_amount
FROM sales
WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31'
GROUP BY product_name, region
HAVING SUM(amount) > 5000
ORDER BY total_amount DESC;

四、性能优化技巧

1. 索引策略

-- 为分组字段和条件字段添加索引
ALTER TABLE sales ADD INDEX idx_region (region);
ALTER TABLE sales ADD INDEX idx_date_product (sale_date, product_name);

2. 执行计划分析

EXPLAIN 
SELECT product_name, AVG(amount)
FROM sales
GROUP BY product_name;

关键指标解读

  • type:应避免ALL(全表扫描)

  • key:确认使用了正确索引

  • rows:预估扫描行数越少越好

3. 临时表优化

-- 对于复杂分组,控制临时表大小
SET tmp_table_size = 64 * 1024 * 1024;
SET max_heap_table_size = 64 * 1024 * 1024;

五、常见问题解决方案

1. 分组字段选择错误

-- 错误示例(非分组字段直接出现在SELECT)
SELECT 
    product_name,
    region,  -- 错误!未包含在GROUP BY中
    SUM(amount)
FROM sales
GROUP BY product_name;

-- 正确写法
SELECT 
    product_name,
    GROUP_CONCAT(DISTINCT region) AS regions,  -- 聚合处理
    SUM(amount)
FROM sales
GROUP BY product_name;

2. NULL值分组处理

-- 将NULL分组显式标记
SELECT 
    IFNULL(department, '未分配') AS dept,
    COUNT(*) AS emp_count
FROM employees
GROUP BY department;

你可能感兴趣的:(mysql,sql,数据库)