A/B测试是一种统计学方法,用于比较两个或多个版本的效果,以确定哪个版本在特定指标上表现更佳。以下是进行A/B测试的一些最佳实践:
记住,A/B测试不仅仅是一个技术过程,它还涉及到对业务目标的深入理解和对用户行为的敏锐洞察。
窗口函数具备了我们之前学过的group by子句分组的功能和order by子句排序的功能。那么,为什么还要用窗口函数呢?
这是因为,group by分组汇总后改变了表的行数,一行只有一个类别。而partiition by和rank函数不会减少原表中的行数
select *,
sum(成绩) over (order by 学号) as current_sum,
avg(成绩) over (order by 学号) as current_avg,
count(成绩) over (order by 学号) as current_count,
max(成绩) over (order by 学号) as current_max,
min(成绩) over (order by 学号) as current_min
from 班级表
用途:可以在每一行的数据里直观的看到,截止到本行数据,统计数据是多少(最大值、最小值等)。同时可以看出每一行数据,对整体统计数据的影响。
备注:对于sum和count,相当于是累计值;等于一个整体就是一个parttion,内部的全部order再聚合;不改变原表的行数。
partition子句可是省略,省略就是不指定分组,只是按成绩由高到低进行了排序:
select *,
rank() over (order by 成绩 desc) as ranking
from 班级表
select *
from 班级表
order by 成绩 desc
方法1:窗口函数新建2列日期,然后datediff 2次都等于1
SELECT user_id
FROM
(SELECT user_id
,DATE(log_time) AS log_date
,LEAD(DATE(log_time), 1) OVER(PARTITION BY user_id ORDER BY log_time) AS l1
,LEAD(DATE(log_time), 2) OVER(PARTITION BY user_id ORDER BY log_time) AS l2
FROM login_tb
) AS t
WHERE DATEDIFF(l1,log_date) = 1
AND DATEDIFF(l2,l1) = 1
AND user_id IN (SELECT user_id FROM register_tb)
ORDER BY user_id;
方法2:窗口函数建立1列rank,然后 date_sub(log_time,interval rk day) as diff, 再having count(difftime)>=3
select user_id
from (
select user_id,
date_sub(log_time,interval rk day) as difftime
from (
select user_id,
date(log_time) as log_time,
row_number()over(partition by user_id order by date(log_time)) as rk
from login_tb where user_id in (select user_id from register_tb)
) t1
) t2
group by user_id,difftime
having count(difftime)>=3
order by user_id
方法3:自连接2次,用户一样,但是日期分别-1、-2,然后排序即可,
select a.user_id from login_tb as a
inner join login_tb as b
on a.user_id = b.user_id and date(a.log_time)=date(b.log_time)-1
#自联结,条件是某一用户存在第二天的登录记录
inner join login_tb as c
on b.user_id = c.user_id and date(b.log_time)= date(c.log_time) -1
#再次自联结,条件是某一用户存在第三天的登录记录
where a.user_id in (select user_id from register_tb) #筛选出新用户
order by a.user_id#排序
链接:https://www.nowcoder.com/questionTerminal/16d41af206cd4066a06a3a0aa585ad3d?toCommentId=20619649
方法1:left join
select round(count(t2.user_id)/COUNT(t1.user_id),3) p from
(select user_id,min(date) date1 from login
group by user_id) t1 #首日,用户&日期,分母
left join
(select user_id, date from login) t2 # 所有日期和用户的组合,分子
on t1.user_id=t2.user_id
and datediff(t1.date1,t2.date)=-1
方法2: 组合查询 ,where (user_id,date) in 首日作为分母,次日作为分子;即组合次日也在,注意不能直接 min(date)+1
select
ROUND(count(DISTINCT user_id)/(select count(DISTINCT user_id) FROM login),3) AS P
from login
where (user_id,date) in (select user_id, DATE_ADD(min(date), INTERVAL 1 DAY)
from login
group by user_id)
方法3:窗口函数
select round(count(distinct a.user_id)/(select count(distinct user_id) from login),3)
from
(select *,
min(date) over(partition by user_id) firstda from login
) a
where datediff(date,firstday)=1;
链接:https://www.nowcoder.com/questionTerminal/7cc3c814329546e89e71bb45c805c9ad?toCommentId=20616815
方法1:开窗
select u.name,c.name, l.date from
(select user_id,client_id,date,
rank () over (partition by user_id order by date desc) r
from login) l,
user u,client c
where l.r=1
and u.id=l.user_id
and c.id=l.client_id
order by 1
方法2:join
select u.name,c.name,l.date from login l,user u,client c
where (user_id,date) in
(select user_id,max(date) from login
group by user_id)
and u.id=l.user_id
and c.id=l.client_id
order by 1
按天的留存率:(SQL264 牛客每个人最近的登录日期(五)
方法1:join
select t0.date,
ifnull(round(count(t2.user_id)/count(t1.user_id),3) ,0)
from
(select min(date) md,user_id from login
group by user_id) t1 #分母
left join
(select date,user_id from login ) t2 #分子
on t1.user_id=t2.user_id and datediff(t2.date,t1.md)=1
right join (select date from login group by date ) t0 #每天
on t0.date=t1.md
group by 1
SELECT a.date,ROUND(COUNT(DISTINCT login.user_id)/ COUNT(a.user_id),3) AS p
FROM (SELECT user_id,MIN(date) AS date FROM login GROUP BY user_id) AS a
LEFT JOIN login
ON login.user_id=a.user_id
AND login.date=DATE_ADD(a.date,INTERVAL 1 DAY)
GROUP BY a.date
UNION
SELECT date,0.000 AS p
FROM login
WHERE date NOT IN(
SELECT MIN(date) FROM login GROUP BY user_id)
ORDER BY date;
方法2: case when (用户+日期) 组合查询
SELECT date,IFNULL(ROUND(
SUM(CASE WHEN
(user_id,date) IN (SELECT user_id,DATE_ADD(date,INTERVAL -1 DAY) FROM login)
AND
(user_id,date) IN (SELECT user_id,MIN(date) FROM login GROUP BY user_id) #分子组合
THEN 1 ELSE 0 END)
/
SUM(CASE WHEN (user_id,date) IN (SELECT user_id,MIN(date) FROM login GROUP BY user_id) #分母组合
THEN 1 ELSE 0 END),3),0) AS p
FROM login
GROUP BY date
ORDER BY date
方法3:窗口函数
select date,
round(ifnull(lead(sum(next_day),1)over(order by date)/sum(new),0),3)
from (select date,
if(date=(min(date)over(partition by user_id)),1,0) as new,
if(date=date_add(min(date)over(partition by user_id),interval 1 day),1,0) as next_day
from login) t
group by date
order by date
select
date,
round(ifnull(count(if(datediff(A.date_1,A.date) = 1,1,null))/count(if(A.num = 1,1,null)),0),3)
from(
select
user_id,
date,
lead(date,1) over(partition by user_id order by date asc) as date_1,
rank() over(partition by user_id order by date asc) as num
from login
) as A group by date
1.日期相减 :date_sub(日期, rn)
2.SQL CONCAT() 函数 :SELECT CONCAT(id, name, work_date)
3.group_concat函数:group_concat( [distinct] 要连接的字段 [order by 排序字段 asc/desc ] [separator '分隔符'] )
4.多行变成一行: group_concat函数;很多个列join起来然后concat拼到一起
5.isnull(exper) 判断exper是否为空,是则返回1,否则返回0
6.ifnull(exper1,exper2)判断exper1是否为空,是则用exper2代替
7.NVL()函数: nvl(expression, replacement_value),第一个参数 expression如果为 null,则用第二个参数 replacement_value 替换
8.nullif(exper1,exper2)如果expr1= expr2 成立,那么返回值为NULL,否则返回值为 expr1。
9.COALESCE() 函数:COALESCE ( expression,value1,value2……,valuen)返回包括expression在内的所有参数中的第一个非空表达式。
10、date_format(t.t_time,'%Y-%m')
11、奇数 WHERE MOD(id, 2) = 1; emp_no % 2 = 1
12、补充:不相等有三种表示方式:<>、!=、IS NOT; limit 2,1 第三个
13、
insert into student_info values(‘1014’ , ‘张三’ , ‘2002-01-06’ , ‘男’);-增
DELETE FROM 商品 WHERE 价格>3000--删
UPDATE books SET shelf = shelf - 2 WHERE type = ‘tool’;改
UPDATE titles_test SET emp_no = REPLACE(emp_no, 10001, 10005) WHERE id = 5 替换
ALTER TABLE 职员 ADD 年末奖金 Money NULL; ALTER TABLE titles_testRENAME TO titles_2017-改
14、SELECT id, LENGTH(string) - LENGTH(REPLACE(string, ",", "")) AS cnt FROM strings
15、group_concat(emp_no)
16、SELECT (SUM(salary) - MAX(salary) - MIN(salary)) / (COUNT(1)-2) avg_salary FROM salaries where to_date = '9999-01-01';
17、后2个字符:
right(first_name,2)
substr(first_name,-2)
substring(first_name,-2)
18、一般分页使用 order by + limit
limit 5,5; 偏移量为5,取5条记录
limit 5 offset 5; 取5条记录,偏移量为5
19、
IN 语句,只执行一次:select * from employees where emp_no not in (select emp_no from dept_emp )
用exists: select * from employees e where not exists (select emp_no from dept_emp d where d.emp_no = e.emp_no);
不用exists: select * from employees e left join dept_emp d on d.emp_no = e.emp_no where d.emp_no is null;
20、inner join
21、date_format(date,'%Y-%m')
22、 left join ...left join ...等同于FROM emp_bonus AS eb, employees AS e, salaries AS s
23、
ceil() / ceiling() 向上取整;示例: ceil(1.2) = 2
floor() 向下取整;示例: floor(1.2) = 1
round() 四舍五入
24、直接在 ORDER BY 子句中使用聚合函数是不被允许的
25、减去
26、month(date) year(date)=2026 concat(year(date)+'-'+month(date))) substr(date,1,7) 等同于left mid
27、中位数:
1)where abs(t1.s_rank-(t1.num+1)/2)<1
2)where a>=total/2 and b>=total/2 a和b是分别升序降序的rank
3)floor((count(id) + 1)/ 2), ceiling((count(id) + 1)/ 2
每个岗位(job)对应的成绩总个数为4,那么中位数位置为2,3;
每个岗位(job)对应的成绩总个数为5,那么中位数位置为3;
28、窗口函数的r
rank 111 4
dense_rank 1111 2
row_number() 1234
29、列名as的时候尽量不要sum rank等sql的关键词