hive中的开窗函数 - 代码先锋网


本站和网页 https://www.codeleading.com/article/1915335570/#count_66 的作者无关,不对其内容负责。快照谨为网络故障时之索引,不代表被搜索网站的即时页面。

hive中的开窗函数 - 代码先锋网
代码先锋网 代码片段及技术文章聚合
首页
联系我们
版权申明
隐私政策
首页
联系我们
版权申明
隐私政策
hive中的开窗函数
技术标签: hive 开窗函数
目录count开窗函数sum开窗函数min开窗函数max开窗函数avg开窗函数first_value开窗函数last_value开窗函数lag开窗函数、lead开窗函数cume_dist开窗函数排序开窗函数rank开窗函数dense_rank开窗函数ntile开窗函数row_number开窗函数percent_rank开窗函数
最近遇到了hive开窗函数的工作,并且面试时很有可能会问到hive的开窗函数,特整理加深印象,并方便以后复习
普通的聚合函数聚合的行集是组,开窗函数聚合的行集是窗口。因此,普通的聚合函数每组(Group by)只返回一个值,而开窗函数则可为窗口中的每行都返回一个值。简单理解,就是对查询的结果多出一列,这一列可以是聚合值,也可以是排序值。
开窗函数一般分为两类,聚合开窗函数和排序开窗函数。
表的名称:
rt_data -> 存放了当天每半小时的店铺销售数据
表的字段:
shop_id, stat_date, ordamt
字段stat_date格式yyMMddHHmm
表的数据:
100262019012300305170
100262019012301005669
100262019012301302396
100262019012302001498
100262019012302301997
100262019012303001188
10026201901230330598
10026201901230400479
100262019012304301587
10026201901230530799
100272019012300302170
100272019012301001623
100272019012301303397
100272019012302001434
100272019012302301001
100282019012303001687
100282019012303301298
10028201901230400149
100292019012304302587
10029201901230530589
设置本地模式
set hive.exec.mode.local.auto=true;
创建数据库
CREATE DATABASE IF NOT EXISTS db_window;
创建表
CREATE TABLE IF NOT EXISTS db_window.rt_data(
shop_id INT,
stat_date STRING,
ordamt DOUBLE
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
加载数据到表的指定位置
hdfs dfs -put rt_data.txt /user/hive/warehouse-3.1.1/db_window.db/rt_data/
开窗函数中指定行及值的语法
(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING
count开窗函数
主要场景算截止到某位置的数目
select shop_id,stat_date,ordamt,
-- 以符合条件的所有行作为窗口
count(shop_id) over() as count1,
-- 以按shop_id分组的所有行作为窗口
count(shop_id) over(partition by shop_id) as count2,
-- 以按shop_id分组、按stat_date排序的所有行作为窗口
count(shop_id) over(partition by shop_id order by stat_date) as count3,
-- 以按shop_id分组、按stat_date排序、按当前行+往前1行+往后2行的行作为窗口
count(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as count4,
-- 以按shop_id分组、按stat_date排序、按从起点到末尾,默认从起点到末尾和count2结果相同
count(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as count5,
-- 以按shop_id分组、按stat_date排序、按从起点到当前行的前一行
count(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as count6,
-- 以按shop_id分组、按stat_date排序、按从起点到当前行
count(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as count7,
-- 以按shop_id分组、按stat_date排序、按从当前行到末尾
count(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as count8,
-- 以按shop_id分组、按stat_date排序、按从当前行往后一行到末尾
count(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as count9,
-- 以按shop_id分组、按stat_date排序、按从当前行往后一行到当前行往后2行
count(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as count10
from rt_data;
OK
shop_idstat_dateordamtcount1count2count3count4count5count6count7count8count9count10
100262019012300305170.020101310011092
100262019012301005669.02010241012982
100262019012301302396.02010341023872
100262019012302001498.02010441034762
100262019012302301997.02010541045652
100262019012303001188.02010641056542
10026201901230330598.02010741067432
10026201901230400479.02010841078322
100262019012304301587.02010931089211
10026201901230530799.0201010210910100
100272019012300302170.020513501542
100272019012301001623.020524512432
100272019012301303397.020534523322
100272019012302001434.020543534211
100272019012302301001.020552545100
100282019012303001687.020313301322
100282019012303301298.020323312211
10028201901230400149.020332323100
100292019012304302587.020212201211
10029201901230530589.020222212100
sum开窗函数
主要场景统计截止到某个位置的值
select
shop_id, stat_date, ordamt,
-- 以按shop_id分组、按stat_date排序、统计每个商品截止到当前时间的销售总额,默认从起点到当前行
sum(ordamt) over(partition by shop_id order by stat_date) as sum_amt1,
-- 以按shop_id分组、按stat_date排序、统计每个商品前半小时到后一小时的销售额(按当前行+往前1行+往后2行的行作为窗口)
sum(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as sum_amt2,
-- 以按shop_id分组、按stat_date排序、统计每个商品的销售总额(从起点到末尾)
sum(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as sum_amt3,
-- 以按shop_id分组、按stat_date排序、统计截止到前半小时的销售总额(从起点到当前行的前一行)
sum(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as sum_amt4,
-- 以按shop_id分组、按stat_date排序、统计每个商品截止到当前时间的销售总额,默认从起点到当前行(从起点到当前行)
sum(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as sum_amt5,
-- 以按shop_id分组、按stat_date排序、统计当前时间及之后的销售总额(从当前行的末尾)
sum(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as sum_amt6,
-- 以按shop_id分组、按stat_date排序、统计当前时间的后半小时及之后的销售额(当前行后一行到末尾)
sum(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as sum_amt7,
-- 以按shop_id分组、按stat_date排序、统计当前时间后半小时到后一小时之间的销售额(按从当前行往后一行到当前行往后2行)
sum(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as sum_amt8
from db_window.rt_data;
OK
shop_idstat_dateordamtsum_amt1sum_amt2sum_amt3sum_amt4sum_amt5sum_amt6sum_amt7sum_amt8
100262019012300305170.05170.013235.021381.0NULL5170.021381.016211.08065.0
100262019012301005669.010839.014733.021381.05170.010839.016211.010542.03894.0
100262019012301302396.013235.011560.021381.010839.013235.010542.08146.03495.0
100262019012302001498.014733.07079.021381.013235.014733.08146.06648.03185.0
100262019012302301997.016730.05281.021381.014733.016730.06648.04651.01786.0
100262019012303001188.017918.04262.021381.016730.017918.04651.03463.01077.0
10026201901230330598.018516.03852.021381.017918.018516.03463.02865.02066.0
10026201901230400479.018995.03463.021381.018516.018995.02865.02386.02386.0
100262019012304301587.020582.02865.021381.018995.020582.02386.0799.0799.0
10026201901230530799.021381.02386.021381.020582.021381.0799.0NULLNULL
100272019012300302170.02170.07190.09625.0NULL2170.09625.07455.05020.0
100272019012301001623.03793.08624.09625.02170.03793.07455.05832.04831.0
100272019012301303397.07190.07455.09625.03793.07190.05832.02435.02435.0
100272019012302001434.08624.05832.09625.07190.08624.02435.01001.01001.0
100272019012302301001.09625.02435.09625.08624.09625.01001.0NULLNULL
100282019012303001687.01687.03134.03134.0NULL1687.03134.01447.01447.0
100282019012303301298.02985.03134.03134.01687.02985.01447.0149.0149.0
10028201901230400149.03134.01447.03134.02985.03134.0149.0NULLNULL
100292019012304302587.02587.03176.03176.0NULL2587.03176.0589.0589.0
10029201901230530589.03176.03176.03176.02587.03176.0589.0NULLNULL
min开窗函数
select
shop_id, stat_date, ordamt,
-- 以按shop_id分组、按stat_date排序、
min(ordamt) over(partition by shop_id order by stat_date) as min_amt1,
-- 以按shop_id分组、按stat_date排序、按当前行+往前1行+往后2行的行作为窗口的最小数
min(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as min_amt2,
-- 以按shop_id分组、按stat_date排序、从起点到末尾的最小数
min(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as min_amt3,
-- 以按shop_id分组、按stat_date排序、从起点到当前行的前一行的最小数
min(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as min_amt4,
-- 以按shop_id分组、按stat_date排序、从起点到当前行的最小数
min(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as min_amt5,
-- 以按shop_id分组、按stat_date排序、从当前行的末尾的最小数
min(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as min_amt6,
-- 以按shop_id分组、按stat_date排序、当前行后一行到末尾的最小数
min(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as min_amt7,
-- 以按shop_id分组、按stat_date排序、按从当前行往后一行到当前行往后2行的最小数
min(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as min_amt8
from db_window.rt_data;
OK
shop_idstat_dateordamtmin_amt1min_amt2min_amt3min_amt4min_amt5min_amt6min_amt7min_amt8
100262019012300305170.05170.02396.0479.0NULL5170.0479.0479.02396.0
100262019012301005669.05170.01498.0479.05170.05170.0479.0479.01498.0
100262019012301302396.02396.01498.0479.05170.02396.0479.0479.01498.0
100262019012302001498.01498.01188.0479.02396.01498.0479.0479.01188.0
100262019012302301997.01498.0598.0479.01498.01498.0479.0479.0598.0
100262019012303001188.01188.0479.0479.01498.01188.0479.0479.0479.0
10026201901230330598.0598.0479.0479.01188.0598.0479.0479.0479.0
10026201901230400479.0479.0479.0479.0598.0479.0479.0799.0799.0
100262019012304301587.0479.0479.0479.0479.0479.0799.0799.0799.0
10026201901230530799.0479.0799.0479.0479.0479.0799.0NULLNULL
100272019012300302170.02170.01623.01001.0NULL2170.01001.01001.01623.0
100272019012301001623.01623.01434.01001.02170.01623.01001.01001.01434.0
100272019012301303397.01623.01001.01001.01623.01623.01001.01001.01001.0
100272019012302001434.01434.01001.01001.01623.01434.01001.01001.01001.0
100272019012302301001.01001.01001.01001.01434.01001.01001.0NULLNULL
100282019012303001687.01687.0149.0149.0NULL1687.0149.0149.0149.0
100282019012303301298.01298.0149.0149.01687.01298.0149.0149.0149.0
10028201901230400149.0149.0149.0149.01298.0149.0149.0NULLNULL
100292019012304302587.02587.0589.0589.0NULL2587.0589.0589.0589.0
10029201901230530589.0589.0589.0589.02587.0589.0589.0NULLNULL
max开窗函数
select
shop_id, stat_date, ordamt,
-- 以按shop_id分组、按stat_date排序、
max(ordamt) over(partition by shop_id order by stat_date) as max_amt1,
-- 以按shop_id分组、按stat_date排序、按当前行+往前1行+往后2行的行作为窗口的最大数
max(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as max_amt2,
-- 以按shop_id分组、按stat_date排序、从起点到末尾的最大数
max(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as max_amt3,
-- 以按shop_id分组、按stat_date排序、从起点到当前行的前一行的最大数
max(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as max_amt4,
-- 以按shop_id分组、按stat_date排序、从起点到当前行的最大数
max(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as max_amt5,
-- 以按shop_id分组、按stat_date排序、从当前行的末尾的最大数
max(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as max_amt6,
-- 以按shop_id分组、按stat_date排序、当前行后一行到末尾的最大数
max(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as max_amt7,
-- 以按shop_id分组、按stat_date排序、按从当前行往后一行到当前行往后2行的最大数
max(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as max_amt8
from db_window.rt_data;
OK
shop_idstat_dateordamtmax_amt1max_amt2max_amt3max_amt4max_amt5max_amt6max_amt7max_amt8
100262019012300305170.05170.05669.05669.0NULL5170.05669.05669.05669.0
100262019012301005669.05669.05669.05669.05170.05669.05669.02396.02396.0
100262019012301302396.05669.05669.05669.05669.05669.02396.01997.01997.0
100262019012302001498.05669.02396.05669.05669.05669.01997.01997.01997.0
100262019012302301997.05669.01997.05669.05669.05669.01997.01587.01188.0
100262019012303001188.05669.01997.05669.05669.05669.01587.01587.0598.0
10026201901230330598.05669.01587.05669.05669.05669.01587.01587.01587.0
10026201901230400479.05669.01587.05669.05669.05669.01587.01587.01587.0
100262019012304301587.05669.01587.05669.05669.05669.01587.0799.0799.0
10026201901230530799.05669.01587.05669.05669.05669.0799.0NULLNULL
100272019012300302170.02170.03397.03397.0NULL2170.03397.03397.03397.0
100272019012301001623.02170.03397.03397.02170.02170.03397.03397.03397.0
100272019012301303397.03397.03397.03397.02170.03397.03397.01434.01434.0
100272019012302001434.03397.03397.03397.03397.03397.01434.01001.01001.0
100272019012302301001.03397.01434.03397.03397.03397.01001.0NULLNULL
100282019012303001687.01687.01687.01687.0NULL1687.01687.01298.01298.0
100282019012303301298.01687.01687.01687.01687.01687.01298.0149.0149.0
10028201901230400149.01687.01298.01687.01687.01687.0149.0NULLNULL
100292019012304302587.02587.02587.02587.0NULL2587.02587.0589.0589.0
10029201901230530589.02587.02587.02587.02587.02587.0589.0NULLNULL
avg开窗函数
select
shop_id, stat_date, ordamt,
-- 以按shop_id分组、按stat_date排序、
round(avg(ordamt) over(partition by shop_id order by stat_date),2) as avg_amt1,
-- 以按shop_id分组、按stat_date排序、按当前行+往前1行+往后2行的行作为窗口的平均值
round(avg(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following), 2) as avg_amt2,
-- 以按shop_id分组、按stat_date排序、从起点到末尾的平均值
round(avg(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following), 2) as avg_amt3,
-- 以按shop_id分组、按stat_date排序、从起点到当前行的前一行的平均值
round(avg(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding), 2) as avg_amt4,
-- 以按shop_id分组、按stat_date排序、从起点到当前行的平均值
round(avg(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row), 2) as avg_amt5,
-- 以按shop_id分组、按stat_date排序、从当前行的末尾的平均值
round(avg(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following), 2) as avg_amt6,
-- 以按shop_id分组、按stat_date排序、当前行后一行到末尾的平均值
round(avg(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following), 2) as avg_amt7,
-- 以按shop_id分组、按stat_date排序、按从当前行往后一行到当前行往后2行的平均值
round(avg(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following), 2) as avg_amt8
from db_window.rt_data;
OK
shop_idstat_dateordamtavg_amt1avg_amt2avg_amt3avg_amt4avg_amt5avg_amt6avg_amt7avg_amt8
100262019012300305170.05170.04411.672138.1NULL5170.02138.11801.224032.5
100262019012301005669.05419.53683.252138.15170.05419.51801.221317.751947.0
100262019012301302396.04411.672890.02138.15419.54411.671317.751163.711747.5
100262019012302001498.03683.251769.752138.14411.673683.251163.711108.01592.5
100262019012302301997.03346.01320.252138.13683.253346.01108.0930.2893.0
100262019012303001188.02986.331065.52138.13346.02986.33930.2865.75538.5
10026201901230330598.02645.14963.02138.12986.332645.14865.75955.01033.0
10026201901230400479.02374.38865.752138.12645.142374.38955.01193.01193.0
100262019012304301587.02286.89955.02138.12374.382286.891193.0799.0799.0
10026201901230530799.02138.11193.02138.12286.892138.1799.0NULLNULL
100272019012300302170.02170.02396.671925.0NULL2170.01925.01863.752510.0
100272019012301001623.01896.52156.01925.02170.01896.51863.751944.02415.5
100272019012301303397.02396.671863.751925.01896.52396.671944.01217.51217.5
100272019012302001434.02156.01944.01925.02396.672156.01217.51001.01001.0
100272019012302301001.01925.01217.51925.02156.01925.01001.0NULLNULL
100282019012303001687.01687.01044.671044.67NULL1687.01044.67723.5723.5
100282019012303301298.01492.51044.671044.671687.01492.5723.5149.0149.0
10028201901230400149.01044.67723.51044.671492.51044.67149.0NULLNULL
100292019012304302587.02587.01588.01588.0NULL2587.01588.0589.0589.0
10029201901230530589.01588.01588.01588.02587.01588.0589.0NULLNULL
first_value开窗函数
select
shop_id, stat_date, ordamt,
-- 以按shop_id分组、按stat_date排序、
first_value(ordamt) over(partition by shop_id order by stat_date) as first_amt1,
-- 以按shop_id分组、按stat_date排序、按当前行+往前1行+往后2行的行作为窗口的第一个值
first_value(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as first_amt2,
-- 以按shop_id分组、按stat_date排序、从起点到末尾的第一个值
first_value(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as first_amt3,
-- 以按shop_id分组、按stat_date排序、从起点到当前行的前一行的第一个值
first_value(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as first_amt4,
-- 以按shop_id分组、按stat_date排序、从起点到当前行的第一个值
first_value(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as first_amt5,
-- 以按shop_id分组、按stat_date排序、从当前行的末尾的第一个值
first_value(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as first_amt6,
-- 以按shop_id分组、按stat_date排序、当前行后一行到末尾的第一个值
first_value(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as first_amt7,
-- 以按shop_id分组、按stat_date排序、按从当前行往后一行到当前行往后2行的第一个值
first_value(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as first_amt8
from db_window.rt_data;
OK
shop_idstat_dateordamtfirst_amt1first_amt2first_amt3first_amt4first_amt5first_amt6first_amt7first_amt8
100262019012300305170.05170.05170.05170.0NULL5170.05170.05669.05669.0
100262019012301005669.05170.05170.05170.05170.05170.05669.02396.02396.0
100262019012301302396.05170.05669.05170.05170.05170.02396.01498.01498.0
100262019012302001498.05170.02396.05170.05170.05170.01498.01997.01997.0
100262019012302301997.05170.01498.05170.05170.05170.01997.01188.01188.0
100262019012303001188.05170.01997.05170.05170.05170.01188.0598.0598.0
10026201901230330598.05170.01188.05170.05170.05170.0598.0479.0479.0
10026201901230400479.05170.0598.05170.05170.05170.0479.01587.01587.0
100262019012304301587.05170.0479.05170.05170.05170.01587.0799.0799.0
10026201901230530799.05170.01587.05170.05170.05170.0799.0NULLNULL
100272019012300302170.02170.02170.02170.0NULL2170.02170.01623.01623.0
100272019012301001623.02170.02170.02170.02170.02170.01623.03397.03397.0
100272019012301303397.02170.01623.02170.02170.02170.03397.01434.01434.0
100272019012302001434.02170.03397.02170.02170.02170.01434.01001.01001.0
100272019012302301001.02170.01434.02170.02170.02170.01001.0NULLNULL
100282019012303001687.01687.01687.01687.0NULL1687.01687.01298.01298.0
100282019012303301298.01687.01687.01687.01687.01687.01298.0149.0149.0
10028201901230400149.01687.01298.01687.01687.01687.0149.0NULLNULL
100292019012304302587.02587.02587.02587.0NULL2587.02587.0589.0589.0
10029201901230530589.02587.02587.02587.02587.02587.0589.0NULLNULL
last_value开窗函数
select
shop_id, stat_date, ordamt,
-- 以按shop_id分组、按stat_date排序、
last_value(ordamt) over(partition by shop_id order by stat_date) as last_amt1,
-- 以按shop_id分组、按stat_date排序、按当前行+往前1行+往后2行的行作为窗口的最后一个值
last_value(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as last_amt2,
-- 以按shop_id分组、按stat_date排序、从起点到末尾的最后一个值
last_value(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as last_amt3,
-- 以按shop_id分组、按stat_date排序、从起点到当前行的前一行的最后一个值
last_value(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as last_amt4,
-- 以按shop_id分组、按stat_date排序、从起点到当前行的最后一个值
last_value(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as last_amt5,
-- 以按shop_id分组、按stat_date排序、从当前行的末尾的最后一个值
last_value(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as last_amt6,
-- 以按shop_id分组、按stat_date排序、当前行后一行到末尾的最后一个值
last_value(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as last_amt7,
-- 以按shop_id分组、按stat_date排序、按从当前行往后一行到当前行往后2行的最后一个值
last_value(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as last_amt8
from db_window.rt_data;
OK
shop_idstat_dateordamtlast_amt1last_amt2last_amt3last_amt4last_amt5last_amt6last_amt7last_amt8
100262019012300305170.05170.02396.0799.0NULL5170.0799.0799.02396.0
100262019012301005669.05669.01498.0799.05170.05669.0799.0799.01498.0
100262019012301302396.02396.01997.0799.05669.02396.0799.0799.01997.0
100262019012302001498.01498.01188.0799.02396.01498.0799.0799.01188.0
100262019012302301997.01997.0598.0799.01498.01997.0799.0799.0598.0
100262019012303001188.01188.0479.0799.01997.01188.0799.0799.0479.0
10026201901230330598.0598.01587.0799.01188.0598.0799.0799.01587.0
10026201901230400479.0479.0799.0799.0598.0479.0799.0799.0799.0
100262019012304301587.01587.0799.0799.0479.01587.0799.0799.0799.0
10026201901230530799.0799.0799.0799.01587.0799.0799.0NULLNULL
100272019012300302170.02170.03397.01001.0NULL2170.01001.01001.03397.0
100272019012301001623.01623.01434.01001.02170.01623.01001.01001.01434.0
100272019012301303397.03397.01001.01001.01623.03397.01001.01001.01001.0
100272019012302001434.01434.01001.01001.03397.01434.01001.01001.01001.0
100272019012302301001.01001.01001.01001.01434.01001.01001.0NULLNULL
100282019012303001687.01687.0149.0149.0NULL1687.0149.0149.0149.0
100282019012303301298.01298.0149.0149.01687.01298.0149.0149.0149.0
10028201901230400149.0149.0149.0149.01298.0149.0149.0NULLNULL
100292019012304302587.02587.0589.0589.0NULL2587.0589.0589.0589.0
10029201901230530589.0589.0589.0589.02587.0589.0589.0NULLNULL
lag开窗函数、lead开窗函数
这两个函数的应用场景的语法使用在我的第一篇博客上已经有了,就不重复写了,这两个函数可以求连续登录的用户数
https://blog.csdn.net/lz6363/article/details/82470278
cume_dist开窗函数
计算某个窗口或分区中某个值的累积分布。假定升序排序,则使用以下公式确定累积分布:
小于等于当前值x的行数 / 窗口或partition分区内的总行数。其中,x 等于 order by 子句中指定的列的当前行中的值。
select
shop_id, stat_date, ordamt,
-- 统计小于等于当前销售额的订单占总订单的比例
cume_dist() over(order by ordamt) as cume_dist1,
-- 统计大于等于当前销售额的订单占总订单的比例
cume_dist() over(order by ordamt desc) as cume_dist2,
-- 统计分区内小于等于当前销售额的订单占总订单的比例
round(cume_dist() over(partition by shop_id order by ordamt), 2) as cume_dist3
from db_window.rt_data;
shop_idstat_dateordamtcume_dist1cume_dist2cume_dist3
10026201901230400479.00.10.950.1
10026201901230330598.00.20.850.2
10026201901230530799.00.250.80.3
100262019012303001188.00.350.70.4
100262019012302001498.00.50.550.5
100262019012304301587.00.550.50.6
100262019012302301997.00.70.350.7
100262019012301302396.00.80.250.8
100262019012300305170.00.950.10.9
100262019012301005669.01.00.051.0
100272019012302301001.00.30.750.2
100272019012302001434.00.450.60.4
100272019012301001623.00.60.450.6
100272019012300302170.00.750.30.8
100272019012301303397.00.90.151.0
10028201901230400149.00.051.00.33
100282019012303301298.00.40.650.67
100282019012303001687.00.650.41.0
10029201901230530589.00.150.90.5
100292019012304302587.00.850.21.0
排序开窗函数
rank开窗函数
rank 开窗函数基于 over 子句中的 order by 确定一组值中一个值的排名。
如果存在partition by ,则为每个分区组中的每个值排名。排名可能不是连续的。例如,如果两个行的排名为 1,则下一个排名为 3。
select
shop_id, stat_date, ordamt,
-- 以按shop_id排序
rank() over(order by shop_id) as rank_amt1,
-- 以按shop_id分区、按stat_date排序
rank() over(partition by shop_id order by stat_date) as rank_amt2
from db_window.rt_data;
OK
shop_idstat_dateordamtrank_amt1rank_amt2
100262019012300305170.011
100262019012301005669.012
100262019012301302396.013
100262019012302001498.014
100262019012302301997.015
100262019012303001188.016
10026201901230330598.017
10026201901230400479.018
100262019012304301587.019
10026201901230530799.0110
100272019012300302170.0111
100272019012301001623.0112
100272019012301303397.0113
100272019012302001434.0114
100272019012302301001.0115
100282019012303001687.0161
100282019012303301298.0162
10028201901230400149.0163
100292019012304302587.0191
10029201901230530589.0192
dense_rank开窗函数
dense_rank与rank有一点不同,当排名一样的时候,接下来的行是连续的。如两个行的排名为 1,则下一个排名为 2。
select
shop_id, stat_date, ordamt,
-- 以按shop_id排序
dense_rank() over(order by shop_id) as dense_amt1,
-- 以按shop_id分区、按stat_date排序
dense_rank() over(partition by shop_id order by stat_date) as dense_amt2
from db_window.rt_data;
OK
shop_idstat_dateordamtdense_amt1dense_amt2
100262019012300305170.011
100262019012301005669.012
100262019012301302396.013
100262019012302001498.014
100262019012302301997.015
100262019012303001188.016
10026201901230330598.017
10026201901230400479.018
100262019012304301587.019
10026201901230530799.0110
100272019012300302170.021
100272019012301001623.022
100272019012301303397.023
100272019012302001434.024
100272019012302301001.025
100282019012303001687.031
100282019012303301298.032
10028201901230400149.033
100292019012304302587.041
10029201901230530589.042
ntile开窗函数
将分区中已排序的行划分为大小尽可能相等的指定数量的排名的组,并返回给定行所在的组的排名。
select
shop_id, stat_date, ordamt,
-- 以按shop_id分成两组、按stat_date排序
ntile(2) over(partition by shop_id order by stat_date) as ntile_amt1,
-- 以按shop_id分成三组、按stat_date排序
ntile(3) over(partition by shop_id order by stat_date) as ntile_amt2
from db_window.rt_data;
OK
shop_idstat_dateordamtntile_amt1ntile_amt2
100262019012300305170.011
100262019012301005669.011
100262019012301302396.011
100262019012302001498.011
100262019012302301997.012
100262019012303001188.022
10026201901230330598.022
10026201901230400479.023
100262019012304301587.023
10026201901230530799.023
100272019012300302170.011
100272019012301001623.011
100272019012301303397.012
100272019012302001434.022
100272019012302301001.023
100282019012303001687.011
100282019012303301298.012
10028201901230400149.023
100292019012304302587.011
10029201901230530589.022
row_number开窗函数
从1开始对分区内的数据排序。
select
shop_id, stat_date, ordamt,
-- 以按shop_id分区、按stat_date排序
row_number() over(partition by shop_id order by stat_date) as row_amt
from db_window.rt_data;
OK
shop_idstat_dateordamtrow_amt
100262019012300305170.01
100262019012301005669.02
100262019012301302396.03
100262019012302001498.04
100262019012302301997.05
100262019012303001188.06
10026201901230330598.07
10026201901230400479.08
100262019012304301587.09
10026201901230530799.010
100272019012300302170.01
100272019012301001623.02
100272019012301303397.03
100272019012302001434.04
100272019012302301001.05
100282019012303001687.01
100282019012303301298.02
10028201901230400149.03
100292019012304302587.01
10029201901230530589.02
percent_rank开窗函数
计算给定行的百分比排名。可以用来计算超过了百分之多少的人。如360小助手开机速度超过了百分之多少的人。
(当前行的rank值-1)/(分组内的总行数-1)
select
shop_id, stat_date, ordamt,
-- 以按shop_id分区、按stat_date排序
row_number() over(partition by shop_id order by stat_date) as row_amt,
round(percent_rank() over(partition by shop_id order by stat_date), 2) as percent_amt
from db_window.rt_data;
OK
shop_idstat_dateordamtrow_amtpercent_amt
100262019012300305170.010.0
100262019012301005669.020.11
100262019012301302396.030.22
100262019012302001498.040.33
100262019012302301997.050.44
100262019012303001188.060.56
10026201901230330598.070.67
10026201901230400479.080.78
100262019012304301587.090.89
10026201901230530799.0101.0
100272019012300302170.010.0
100272019012301001623.020.25
100272019012301303397.030.5
100272019012302001434.040.75
100272019012302301001.051.0
100282019012303001687.010.0
100282019012303301298.020.5
10028201901230400149.031.0
100292019012304302587.010.0
10029201901230530589.021.0
版权声明:本文为lz6363原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/lz6363/article/details/86606379
智能推荐
Centos7安装Redis5.0.4,后台启动
一、安装gcc依赖 由于 redis 是用 C 语言开发,安装之前必先确认是否安装 gcc 环境(gcc -v),如果没有安装,执行以下命令进行安装 如果wget命令和vim命令不能用可用下面语句安装。 二、下载并解压安装包 三、cd切换到redis解压目录下,执行编译 四、安装并指定安装目录 五、启动服务 5.1前台启动(不推荐使用) [[email protected] redis-5.0.4]# ...
点云滤波——直通滤波
最简单的例子:按高度筛选点云...
假期使用Object-C制作了一个骰子游戏
首先蜜蜂祝大家狗年一帆风顺,因年底比较空闲,自己琢磨了一个骰子游戏 github地址 项目使用了Cocoapods,又设置了忽略文件,所以下载的时候请自行 pod install 一下 项目主要利用CATransform3DRotate改变view的perspective来实现骰子的3D旋转功能,再进行随机数来达到摇骰子的功能 因为使用了毛玻璃,所以在模拟器的情况下会崩溃,建议真机运行,或者将毛玻...
2020前端JavaScript面试题精选(不断更新)
1 .介绍JavaScript的基本数据类型 Number、String 、Boolean 、Null、Undefined Object 是 JavaScript 中所有对象的父对象 数据封装类对象:Object、Array、Boolean、Number 和 String 其他对象:Function、Arguments、Math、Date、RegExp、Error 新类型:Symbol 2.eva...
幂运算
1、整数的快速幂 a^b%m当b还不是很大的时候 (a%m)……%m 但是a,b太大可能存不下,所以这里介绍一种利用二分的思想 2、把指数b表示成相应的二进制形式 是0的位不给予考虑, 另外一种是递归的方法 ...
猜你喜欢
File
java.io.File 分隔符 static String pathSeparator 与系统有关的路径分隔符,为了方便,它被表示为一个字符串。 static char pathSeparatorChar 与系统有关的路径分隔符。 pathSeparator与pathSeparatorChar 的内容其实一样,不过前一个返回的是字符串(字符前多了个空格),后者返回的就是个字符 路径分隔符wind...
将两个从小到大排列好的数组再次排序于新数组中
...
[leetcode]791. Custom Sort String
[leetcode]791. Custom Sort String Analysis 周五ummmmmm—— [啊啊啊啊 paper结果要出来了,心塞] S and T are strings composed of lowercase letters. In S, no letter occurs more than once. S was sorted in some ...
让应用程序执行我们的代码?
伊始   当我们目标应用程序由于需要使用外部其他应用程序指定,启动一些函数,比如计算空间使用大小,显示当前空间内成员使用率时,开关打印日志时,需要手动触发计算时,可以利用这个方案。 主体 一切都是为了需求而做,而我们需要的就是落地!。 那我们开始吧!   首先,我们要有一个要API函数,我们要以此函数作为接口,实现我们的需求。这里我们就以最简单Message...
[1]、两数之和
题目 题目链接 给定一个整数数组 nums 和一个目标值 target,请你在该数组中找出和为目标值的那 两个 整数,并返回他们的数组下标。 你可以假设每种输入只会对应一个答案。但是,数组中同一个元素不能使用两遍。 示例1: 解法 第一种解法通过暴力,固定i,移动j扫描一遍数组,每次判断是否条件nums[i]+nums[j] == target,如果满足则保存,然后再i+1固定,j从头到尾再扫描一...
相关文章
Hive中开窗函数的案例
理解hive中的开窗函数
Hive的开窗函数
hive 开窗函数的使用
Hive开窗函数的理解
hive 开窗函数的例子
关于hive的开窗函数
hive 的开窗函数sum() over() 函数
hive开窗函数的案例-【sql练习】
Hive 开窗函数
热门文章
update5 动态绑定,Foundation内主要封装类以及UIKit部分类杂记
Flex支持滑轮滚动
php5.6 install
3. 无重复字符的最长子串
C++ 语言访问控制 - 友元 (friend)
整数分解为若干项之和
js获取行内以及非行内样式
Linux springboot项目 每天生产一个日志文件,如何配置?
hdu2588 GCD(欧拉函数)
vantui一个输入框验证两个条件
推荐文章
PTA (jmu-Java-02基本语法-08-ArrayList入门)
Kafka 0.9 新消费者API
Django框架(二):路由层
Kotlin学习(七):函数
Bzoj3211: 花神游历各国
el-select数据过多懒加载(loadmore)
16.Spark大型电商项目-用户访问session分析-Idea工程搭建以及工具类说明_
2019-填空题[年号字串]
Python - strings
JAVA处理日期(Date)时间(Time)以及相关类的介绍
相关标签
Hive
大数据
hive
sql
数据库
database
Redis
点云
滤波
前端面试硬核
2018-2022 All rights reserved by codeleading.com