hive的UDF函数实现

第一步:创建Hive工程

创建maven项目,依赖为



    org.apache.hive
    hive-exec
    1.2.1



    org.apache.hadoop
    hadoop-common
    2.7.3

 第二步:写UDF代码

package UDF;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class LowerUDF extends UDF{
    /**
     * 1. Implement one or more methods named "evaluate" which will be called by Hive.
     *
     * 2. "evaluate" should never be a void method. However it can return "null" if needed.
     */
    public Text evaluate(Text str){
        // input parameter validate
        if(null == str){
            return null ;
        }

        // validate
        if(StringUtils.isBlank(str.toString())){
            return null ;
        }

        // lower
        return new Text(str.toString().toLowerCase()) ;
    }

    public static void main(String[] args) {
        System.out.println(new LowerUDF().evaluate(new Text("BBB")));
    }

}

第三步:打包

     打包之前大家一定要注意你这个工程所用的jdk是什么版本的,是否与Hadoop集群使用的jdk是一个版本,如果版本不一致的话可能会导致问题。因此为了安全起见,大家最好使用一样的版本。打包步骤如下

hive的UDF函数实现_第1张图片

找到maven打包的位置

hive的UDF函数实现_第2张图片

第四步:注册UDF

      首先我们需要把第三步生成的jar包上传到服务器,并上传至hdfs

hadoop fs -put Hive_UDF_demo-1.0-SNAPSHOT.jar /user/hue/weisc/

hive> add jar hdfs://hadoop01:8020/user/hue/weisc/Hive_UDF_demo-1.0-SNAPSHOT.jar;
converting to local hdfs://hadoop01:8020/user/hue/weisc/Hive_UDF_demo-1.0-SNAPSHOT.jar
Added [/tmp/c1473e1f-6985-4699-b956-95962489759c_resources/Hive_UDF_demo-1.0-SNAPSHOT.jar] to class path
Added resources: [hdfs://hadoop01:8020/user/hue/weisc/Hive_UDF_demo-1.0-SNAPSHOT.jar]
hive> create temporary function lower_udf as 'UDF.LowerUDF';
OK
Time taken: 0.049 seconds

第五步:测试

hive> create table b (id int ,name string);
OK
Time taken: 0.29 seconds
hive> insert into b values(1,'WWWWAA');
Query ID = hue_20180814080214_8592e828-113f-4126-9959-06066d28b7d9
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1533807733727_0109)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 6.35 s     
--------------------------------------------------------------------------------
Loading data to table test1.b
Table test1.b stats: [numFiles=1, numRows=1, totalSize=9, rawDataSize=8]
OK
Time taken: 21.383 seconds
hive> select lower_udf(name) from b;
OK
wwwwaa
Time taken: 0.187 seconds, Fetched: 1 row(s)

 

你可能感兴趣的:(hive)