kylin操作实战

一、数据准备

事实表:dwd_payment_info
维度表:dwd_order_info和dwd_user_info

1.1建表

  • dwd_payment_info
hive (gmall)>
drop table if exists dwd_payment_info;
create external table dwd_payment_info(
    `id`   bigint COMMENT '',
    `out_trade_no`    string COMMENT '',
    `order_id`        string COMMENT '',
    `user_id`         string COMMENT '',
    `alipay_trade_no` string COMMENT '',
    `total_amount`    decimal(16,2) COMMENT '',
    `subject`         string COMMENT '',
    `payment_type`    string COMMENT '',
    `payment_time`    string COMMENT ''
   )  
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dwd/dwd_payment_info/'
tblproperties ("parquet.compression"="snappy")
;
  • dwd_order_info
hive (gmall)>
drop table if exists dwd_order_info;
create external table dwd_order_info (
    `id` string COMMENT '',
    `total_amount` decimal(10,2) COMMENT '',
    `order_status` string COMMENT ' 1 2 3 4 5',
    `user_id` string COMMENT 'id',
    `payment_way` string COMMENT '',
    `out_trade_no` string COMMENT '',
    `create_time` string COMMENT '',
    `operate_time` string COMMENT ''
) 
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dwd/dwd_order_info/'
tblproperties ("parquet.compression"="snappy")
;
  • dwd_user_info
hive (gmall)>
drop table if exists dwd_user_info;
create external table dwd_user_info( 
    `id` string COMMENT 'id',
    `name` string COMMENT '', 
    `birthday` string COMMENT '',
    `gender` string COMMENT '',
    `email` string COMMENT '',
    `user_level` string COMMENT '',
    `create_time` string COMMENT ''
) 
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dwd/dwd_user_info/'
tblproperties ("parquet.compression"="snappy")
;

此三张表组成简单的星型模型
因为dwd_order_info和dwd_user_info表为每日分区,kylin不支持维度表分区,导致外键重复,解决办法是使用临时表或者视图。
对维度表创建视图:

  • dwd_order_view
hive (gmall)>
create view dwd_order_view as select * from dwd_order_info where dt=current_date;
  • dwd_user_view
hive (gmall)>
create view dwd_user_view as select * from dwd_user_info where dt=current_date;

二、kylin操作

1.创建project(类比database)

点Add Project->gmall->test


2.导数据

data soucre->load table from tree-选择准备的三张表




选完后表名会变粗

下面可以看到有表元数据了:


3.创建model

3.1 点击new model->Model Name:module_payment

3.2 选择事实表

3.3 添加维度表

3.3.1 DWD_PAYMENT_INFO -> INNER JOIN -> DWD_ORDERE_INFO -> New Join Condition: ORDER_ID=ID


3.3.2 DWD_PAYMENT_INFO -> INNER JOIN -> DWD_USER_INFO -> New Join Condition:USER_ID=ID

3.4.Dimensions(维)

1.DWD_PAYMENT_INFO : PAYMENTN_TYPE
2.DWD_ORDER-INFO : PARMENT_WAY
3.DWD_USER_INFO : GENDER, USER_LEVEL


3.5. Messures(度量)

1.DWD_PAYMENT_INFO : TOTAL_AMOUNT

3.6.Settings

3.6.1Partiton

select Partition Table -> DWD_PAYMENT_INFO -> DT -> yyyy-MM-dd


3.6.2 Filter(过滤)

根据自己业务需要

4 创建cube

4.1 Cube info -> module_payment -> Cube_payment

4.2 Dimensions(维度)

Add Dimensions -> DWD_PAYMENT_INFO[FactTable]:选PAYMENT_TYPE -> DWD_ORDER_INFO:选PAYMENT_WAY -> DWD_USER_INFO: 选 GENDER和USER_LEVEL
另外,我们要选Normal,不选Derived(衍生,优化)

4.3 Measures(度量)

4.4Defresh Setting

直接默认值
每天做一个构建,数据存hbase,每天在hbase中新生成一个表,导致hbase去查数据时如果查询一个月的数据就要查询30个表,会很慢,所以就根据这个setting合并,7天一小并(将每天的合并),28天一大并(将每7天的数据合并)

4.5 Advanced setting

4.6 Configuration Overwrites

6 build cube

你可能感兴趣的:(kylin操作实战)