【云星数据---Apache Flink实战系列(精品版)】:Flink流处理API详解与编程实战001-Flink基于流的wordcount示例001

一、基于本地字符串的wordcount

1.执行程序

package code.book.stream.streamwc

import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment, _}

object WordCountLocalStrings {

  def main(args: Array[String]): Unit = {
    //1.创建流处理环境
    val senv = StreamExecutionEnvironment.getExecutionEnvironment

    //2.准备数据
    val text = senv.fromElements(
      "To be, or not to be,that is the question",
      "Whether 'tis nobler in the mind to suffer",
      "The slings and arrows of outrageous fortune",
      "Or to take arms against a sea of troubles,")

    //3.执行运算
    val counts = text.flatMap(_.toLowerCase.split("\\W+")).map((_, 1)).keyBy(0).sum(1)

    //4.将结果打印出来
    counts.print()

    //5.触发流计算
    senv.execute("Flink Streaming Wordcount")
  }
}

2.执行效果

【云星数据---Apache Flink实战系列(精品版)】:Flink流处理API详解与编程实战001-Flink基于流的wordcount示例001_第1张图片

二、基于hdfs文件的wordcount

1.准备数据

1.1上传数据

hadoop fs -put $FLINK_HOME/README.txt /input/flink/README.txt

【云星数据---Apache Flink实战系列(精品版)】:Flink流处理API详解与编程实战001-Flink基于流的wordcount示例001_第2张图片

2.查看数据

hadoop fs -text /input/flink/README.txt

【云星数据---Apache Flink实战系列(精品版)】:Flink流处理API详解与编程实战001-Flink基于流的wordcount示例001_第3张图片

2.处理数据

2.1执行程序

package code.book.stream.streamwc

import org.apache.flink.streaming.api.scala._

object WordCountHdfsFile {

  def main(args: Array[String]): Unit = {
    //1.创建流处理环境
    val senv = StreamExecutionEnvironment.getExecutionEnvironment

    //2.准备数据
    val text = senv.readTextFile("hdfs:///input/flink/README.txt");

    //3.执行运算
    val counts = text.flatMap(_.toLowerCase.split("\\W+")).map((_, 1)).keyBy(0).sum(1)

    //4.将结果打印出来
    counts.print()

    //5.触发流计算
    senv.execute("Flink Streaming Wordcount")
  }
}

2.2执行效果

【云星数据---Apache Flink实战系列(精品版)】:Flink流处理API详解与编程实战001-Flink基于流的wordcount示例001_第4张图片

你可能感兴趣的:(bankend,bigdata,cloudcomputing,flink)