留存率问题——MySQL数据库

drop table if  exists `question_practice_detail`;
CREATE TABLE `question_practice_detail` (
`id` int NOT NULL,
`device_id` int NOT NULL,
`question_id`int NOT NULL,
`result` varchar(32) NOT NULL,
`date` date NOT NULL
);

INSERT INTO question_practice_detail (id, device_id, question_id, result, date)
VALUES
-- 用户1:高留存用户(每天登录)
(1, 1001, 101, 'correct', '2023-01-01'),
(2, 1001, 102, 'wrong', '2023-01-02'),
(3, 1001, 103, 'correct', '2023-01-03'),
(4, 1001, 104, 'correct', '2023-01-04'),
(5, 1001, 105, 'wrong', '2023-01-05'),
(6, 1001, 106, 'correct', '2023-01-06'),
(7, 1001, 107, 'correct', '2023-01-07'),
(8, 1001, 108, 'wrong', '2023-01-08'),
(9, 1001, 109, 'correct', '2023-01-09'),
(10, 1001, 110, 'correct', '2023-01-10'),
(11, 1001, 111, 'wrong', '2023-01-31'),

-- 用户2:次日留存,但7日和30日未留存
(12, 1002, 101, 'correct', '2023-01-01'),
(13, 1002, 102, 'wrong', '2023-01-02'),

-- 用户3:7日留存,但次日和30日未留存
(14, 1003, 101, 'correct', '2023-01-01'),
(15, 1003, 103, 'correct', '2023-01-07'),

-- 用户4:30日留存,但次日和7日未留存
(16, 1004, 101, 'correct', '2023-01-01'),
(17, 1004, 104, 'correct', '2023-01-31'),

-- 用户5:只有首次登录,无任何留存
(18, 1005, 101, 'correct', '2023-01-01'),

-- 用户6:7日和30日留存,但次日未留存
(19, 1006, 101, 'correct', '2023-01-01'),
(20, 1006, 105, 'wrong', '2023-01-07'),
(21, 1006, 106, 'correct', '2023-01-31'),

-- 用户7:次日和30日留存,但7日未留存
(22, 1007, 101, 'correct', '2023-01-01'),
(23, 1007, 107, 'correct', '2023-01-02'),
(24, 1007, 108, 'wrong', '2023-01-31'),

-- 用户8:次日和7日留存,但30日未留存
(25, 1008, 101, 'correct', '2023-01-01'),
(26, 1008, 109, 'correct', '2023-01-02'),
(27, 1008, 110, 'wrong', '2023-01-07'),

-- 用户9:部分日期登录的用户
(28, 1009, 101, 'correct', '2023-01-01'),
(29, 1009, 102, 'wrong', '2023-01-02'),
(30, 1009, 103, 'correct', '2023-01-05'),
(31, 1009, 104, 'correct', '2023-01-10'),
(32, 1009, 105, 'wrong', '2023-01-20'),

-- 用户10:随机登录的用户
(33, 1010, 101, 'correct', '2023-01-01'),
(34, 1010, 106, 'correct', '2023-01-03'),
(35, 1010, 107, 'wrong', '2023-01-15'),
(36, 1010, 108, 'correct', '2023-01-25');

计算次日留存率

方法一:

不太推荐,多次使用distinct,不符合SQL优化规范,如果数据量太大,使用distinct不太合适

select count(distinct t2.device_id, t2.date) / count(distinct t1.device_id, t1.date) avg_ret
from question_practice_detail t1
         left join question_practice_detail t2 on t1.device_id = t2.device_id
    and datediff(t1.date, t2.date) = 1;

方法二: 

with t1 as ( -- 每个用户登录日期汇总
    select device_id, date
    from question_practice_detail
    group by device_id, date)
   , t2 as (-- 次日登录的用户
    select device_id, date
    from question_practice_detail
    group by device_id, date)
   , t3 as (
-- 初次登录基础上,次日登录的用户
    select t1.device_id first_login, t2.device_id second_login
    from t1
             left join t2 on t2.device_id = t1.device_id
        and t2.date = date_add(t1.date, interval 1 day))
select count(second_login) / count(first_login) avg_ret
from t3;


-- 或者改写成下面的也可以:
with t1 as ( -- 每个用户登录日期汇总
    select device_id, date
    from question_practice_detail
    group by device_id, date)
   , t2 as (-- 次日登录的用户
    select device_id, date_add(date, interval 1 day) second_date
    from question_practice_detail
    group by device_id, date)
   , t3 as (
-- 初次登录基础上,次日登录的用户
    select t1.device_id first_login, t2.device_id second_login
    from t1
             left join t2 on t2.device_id = t1.device_id
        and t1.date = second_date)
select count(second_login) / count(first_login) avg_ret
from t3;

统一计算次日、7日内和30日内留存率:

留存率的本质是 “后续时间窗口内的留存用户占比”:
次日留存率 = (第 1 天返回用户数 / 首日新增用户数) × 100%
7 日留存率 = (第 2-7 天返回用户数 / 首日新增用户数) × 100%
30 日留存率 = (第 8-30 天返回用户数 / 首日新增用户数) × 100%

你可能感兴趣的:(#,Mysql,数据库,mysql,sql,留存率问题)