Flink自定义生成测试数据的Source

Flink通过实现SourceFunction接口来自定义非并行的Source,实现ParallelSourceFunction接口来自定义并行的Source。


为了测试开发,以下实现SourceFunction接口自定义一个Source来生成测试数据。

生成如下格式的流量测试数据:

用户id|城市id|上行流量|下行流量|发生时间
471|4|0.0745|0.3826|1609863101410
495|0|0.5707|0.3274|1609863101410

实现SourceFunction需要重写以下两个方法:

run 发射元素
cancel 关闭source

实现代码:

package com.upupfeng.source;

import org.apache.flink.streaming.api.functions.source.SourceFunction;

import java.math.BigDecimal;
import java.math.RoundingMode;
import java.util.Random;

/**
 * @author mawf
 */
public class FakeTrafficRecordSource implements SourceFunction<String> {
     

    private static Random random = new Random();
    private volatile boolean isRunning = true;

    // sleep的毫秒数
    private long sleepMillis;

    public FakeTrafficRecordSource() {
     
        new FakeTrafficRecordSource(500l);
    }

    public FakeTrafficRecordSource(long sleepMills) {
     
        this.sleepMillis = sleepMills;
    }

    @Override
    public void run(SourceContext<String> ctx) throws Exception {
     
        while (isRunning) {
     
            ctx.collect(fakeTrafficRecordString());
            Thread.sleep(sleepMillis);
        }
    }

    // 生成流量记录
    // 字段:用户id、城市id、上行流量、下行流量、发生时间
    public String fakeTrafficRecordString() {
     
        int accountId = random.nextInt(500);
        int cityId = random.nextInt(5);
        double upTraffic = new BigDecimal(random.nextDouble()).setScale(4, RoundingMode.HALF_UP).doubleValue();
        double downTraffic = new BigDecimal(random.nextDouble()).setScale(4, RoundingMode.HALF_UP).doubleValue();
        long eventTime = System.currentTimeMillis();

        TrafficRecord trafficRecord = new TrafficRecord(accountId, cityId, upTraffic, downTraffic, eventTime);
        return trafficRecord.toString();
    }

    @Override
    public void cancel() {
     
        isRunning = false;
    }

    class TrafficRecord {
     
        public int accountId;
        public int cityId;
        public double upTraffic;
        public double downTraffic;
        public long eventTime;

        public TrafficRecord() {
     
        }

        public TrafficRecord(int accountId, int cityId, double upTraffic, double downTraffic, long eventTime) {
     
            this.accountId = accountId;
            this.cityId = cityId;
            this.upTraffic = upTraffic;
            this.downTraffic = downTraffic;
            this.eventTime = eventTime;
        }

        @Override
        public String toString() {
     
            return
                    "" +
                            accountId
                            + "|" +
                            cityId
                            + "|" +
                            upTraffic
                            + "|" +
                            downTraffic
                            + "|" +
                            eventTime
                    ;
        }
    }
}

测试使用

通过env.addSource就可以使用自定的Source了。

package com.upupfeng.source;

import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

/**
 * @author mawf
 */
public class Test01 {
     
    public static void main(String[] args) throws Exception {
     
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        FakeTrafficRecordSource fakeTrafficRecordSource = new FakeTrafficRecordSource();
        env.addSource(fakeTrafficRecordSource)
                .print();
        env.execute();
    }
}

你可能感兴趣的:(Flink,Flink自定义Source)