Flink:处理有界流数据的wordcount

数据源:

hello world
hello flink
hello scala

有界流:

package chapter02

import org.apache.flink.streaming.api.scala._

/**
 * ClassName: BoundedStreamWordCount
 * Package: chapter02
 * Description: 
 *
 * @Author 小易日拱一卒
 * @Create 2025-06-27 2:37 
 * @Version 1.0   
 */
object BoundedStreamWordCount {
  def main(args: Array[String]): Unit = {
    // 1.创建一个流式执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    // 2.读取文本文件
    val lineDataStream = env.readTextFile("input/words.txt")
    // 3.对数据集进行转换处理
    val wordAndOne = lineDataStream.flatMap(_.split(" ")).map(word => (word, 1))
    // 4.按照单词进行分组
    val wordAndOneGroup = wordAndOne.keyBy( _._1 ) // _._1表示第1个元素,就是key
    // 5.对分组数据进行聚合计算
    val sum = wordAndOneGroup.sum(1) // 1表示索引1
    // 6.打印输出
    sum.print()
    // 执行任务
    env.execute()
  }

}

结果;

5> (hello,1)
13> (flink,1)
1> (scala,1)
9> (world,1)
5> (hello,2)
5> (hello,3)

你可能感兴趣的:(flink,大数据)