前段时间看了一号店的一道校招面试题,如下:
问题:(1)计算2012-8-23每个省份搜索量最高的10个关键词(本文查前三);
当时一看感觉很简单,一个星期之后写了下SQL语句才发现,我想象中的答案是错的,后来靠百度才解决的,其实就是一个排名函数的用法。当天晚上顺道把其他几个也一并学了下。今天有空将其一并整理出来,于是有此文。
首先创建了表,如下:
if not exists (select 1 from sysobjects where name = N'KeywordRecord' and xtype = 'U') begin create table KeywordRecord ( ID int identity(1,1) primary key not null, Province nvarchar(20) not null, Keywords nvarchar(20) not null, SearchDate datetime not null ); end加入了一些测试数据,之后先把省、商品、和次数算出来,语句如下:
select Province,Keywords,COUNT(1) SearchCount from dbo.KeywordRecord where SearchDate='2014-08-23' group by Province,Keywords;得到结果:
这一步还是很容易的,关键就是排出各省份前三名的关键字。
最终做出来的答案:
with InitialStatistics as ( select Province,Keywords,COUNT(1) SearchCount from dbo.KeywordRecord where SearchDate='2014-08-23' group by Province,Keywords ) select Province,Keywords,SearchCount from ( select Province,Keywords,SearchCount, RANK() OVER(PARTITION BY Province ORDER BY SearchCount desc) as [Order] FROM InitialStatistics ) tempTable where tempTable.[Order] between 1 and 3;
结果如下:
这里需要使用排名函数RANK。说来惭愧,这几个排名函数除了Row_Number,其他的平时都没有接触过,pivot和unpivot用过,不过现在又忘了。下面就讲讲SQL2005之后添加的几个排名函数以及透析与反透析关键字,主要有Row_Number、RANK、DENSE_RANK、NTILE函数,pivot以及unpivot关键字。增加一列,用以标识序号,可以指定排序的字段,同时还可以用PARTITION指定按某个字段分组排序。关键就是这个PARTITION,在这之前都没用过,略显遗憾。查询8月23号每个城市搜索量排前三个的关键字,示例如下(只取前三个,如果第三名有多个也只取一个,可与后面的RANK和DENSE_RANK相比较)。
--查询8月23号每个城市搜索量排前三个的关键字 --(只取前三个,如果第三名有多个也只取一个,可与后面的RANK和DENSE_RANK相比较) with InitialStatistics as ( select Province,Keywords,COUNT(1) SearchCount from dbo.KeywordRecord where SearchDate='2014-08-23' group by Province,Keywords ) select Province,Keywords,SearchCount from ( select Province,Keywords,SearchCount, Row_Number() OVER(PARTITION BY Province ORDER BY SearchCount desc) as [Order] FROM InitialStatistics ) tempTable where tempTable.[Order] between 1 and 3;
结果如下:
Row_Number常常被用于分页(我原来也只会用作分页,嘿嘿),示例如下:<pre name="code" class="sql">--查询第11到20条的记录 with temp as ( Select (Row_Number() Over(Order By ID ASC)) AS rownum,* From KeywordRecord ) Select * From temp Where rownum Between 11 and 20 ; --查询每个城市的第11到20条的记录 with temp as ( Select (Row_Number() Over(PARTITION BY Province Order By ID ASC)) AS rownum, * From KeywordRecord ) Select * From temp Where rownum Between 11 and 20;
结果如下:
<pre name="code" class="sql">with InitialStatistics as ( select Province,Keywords,COUNT(1) SearchCount from dbo.KeywordRecord where SearchDate='2014-08-23' group by Province,Keywords ) select Province,Keywords,SearchCount, RANK() OVER(PARTITION BY Province ORDER BY SearchCount desc) as [RANKOrder] FROM InitialStatistics;结果如下:
这个函数跟RANK极为相似,唯一的区别是排名的序号是连续的,比如1、2、3、3、3、4、5。示例如下:
--DENSE_RANK排名时会出现1、2、2、3、3、4、5 with InitialStatistics as ( select Province,Keywords,COUNT(1) SearchCount from dbo.KeywordRecord where SearchDate='2014-08-23' group by Province,Keywords ) select Province,Keywords,SearchCount, DENSE_RANK() OVER(PARTITION BY Province ORDER BY SearchCount desc) as [DENSE_RANKOrder] FROM InitialStatistics;结果如下:
--将各省的关键字按热度大概分为三类,注意‘大概’这个词 with InitialStatistics as ( select Province,Keywords,COUNT(1) SearchCount from dbo.KeywordRecord where SearchDate='2014-08-23' group by Province,Keywords ) select Province,Keywords,SearchCount, NTILE(3) OVER(PARTITION BY Province ORDER BY SearchCount desc) as [NTILEOrder] FROM InitialStatistics;结果如下:
首先关于透析,先看个应用,比如这里的初步统计的详细信息如下:
现在想达到每个城市每个关键字的汇总信息,如下效果:
该怎么实现呢?在低版本的SQLServer或者MySQL等其他数据库采用如下方法:
--行列转换,获得汇总信息(低版本的SQLServer或者MySQL等其他数据库) with InitialStatistics as ( select Province,Keywords,COUNT(1) SearchCount from dbo.KeywordRecord where SearchDate='2014-08-23' group by Province,Keywords ) select Province '省份', max(case Keywords when '毯子'then SearchCount else 0 end) '毯子', max(case Keywords when '帽子' then SearchCount else 0 end) '帽子', max(case Keywords when '裤子' then SearchCount else 0 end) '裤子', max(case Keywords when '靴子' then SearchCount else 0 end) '靴子', max(case Keywords when '袜子' then SearchCount else 0 end) '袜子', max(case Keywords when '鞋子' then SearchCount else 0 end) '鞋子' from InitialStatistics group by Province;结果如下:
在SQL2005中引入了关键字pivot,可以很方便的解决这个问题,实现相同的效果代码如下:
--使用pivot进行行列转换,注意后面in中的不要加单引号,加了就会报错 with InitialStatistics as ( select Province,Keywords,COUNT(1) SearchCount from dbo.KeywordRecord where SearchDate='2014-08-23' group by Province,Keywords ) select * from InitialStatistics pivot(max(SearchCount)for Keywords in (毯子,帽子,裤子,靴子,袜子,鞋子)) tempTable结果如下:
这边需要注意的是后面in中的关键字不要加单引号,加了就会报错
--反透析数据 select Province,Keywords,SearchCount from ( select * from dbo.GeneralStatisticsTable ) temp unpivot ( SearchCount for Keywords in(毯子,帽子,裤子,靴子,袜子,鞋子) ) TempTable结果如下:
相信看到这的人会发现,以上透析和反透析有个很恶心的地方,就是关键字毯子、帽子等出现在SQL语句中,而实际情况下关键字可能很多,此时就需要动态拼凑(动态反透析类似):
<pre name="code" class="sql">--动态透析(动态反透析类似) declare @colomnNames varchar(200); declare @sql varchar(200); set @colomnNames=''; select @colomnNames=@colomnNames+'['+Keywords+'],' from ( select distinct Keywords from InitialStatisticsTable ) temp; set @colomnNames=left(@colomnNames,len(@colomnNames)-1); set @sql='select *from InitialStatisticsTable' +' pivot(max(SearchCount)for Keywords in (@colomnNames)) tempTable '; set @sql=replace(@sql,'@colomnNames',@colomnNames); exec (@sql);结果如下:
至此全部完成,只是个人的一些总结,对自己是一个记录,同时希望也能对别人有些帮助,如果有什么错误,还望不吝指出,共同进步,转载请保留原文地址。
SQL脚本下载