Spark 1.2.1有关writeAheadLog的官方文档的一个错误

阅读Spark 1.2.1的源码时发现官方文档的一处错误。

  • [Experimental in Spark 1.2] Configuring write ahead logs - In Spark 1.2, we have introduced a new experimental feature of write ahead logs for achieving strong fault-tolerance guarantees. If enabled, all the data received from a receiver gets written into a write ahead log in the configuration checkpoint directory. This prevents data loss on driver recovery, thus ensuring zero data loss (discussed in detail in the Fault-tolerance Semantics section). This can be enabled by setting the configuration parameterspark.streaming.receiver.writeAheadLogs.enable to true. However, these stronger semantics may come at the cost of the receiving throughput of individual receivers. This can be corrected by running more receivers in parallel to increase aggregate throughput. Additionally, it is recommended that the replication of the received data within Spark be disabled when the write ahead log is enabled as the log is already stored in a replicated storage system. This can be done by setting the storage level for the input stream toStorageLevel.MEMORY_AND_DISK_SER.
文档描述启用writeAheadLog的方法是设置 spark.streaming.receiver.writeAheadLogs.enable 为true。而在org.apache.spark.streaming.kafka.KafkaUtils中的createStream方法中,读取的其实是属性spark.streaming.receiver.writeAheadLog.enable。两者相差了一个'S'。
  def createStream[K: ClassTag, V: ClassTag, U <: Decoder[_]: ClassTag, T <: Decoder[_]: ClassTag](
      ssc: StreamingContext,
      kafkaParams: Map[String, String],
      topics: Map[String, Int],
      storageLevel: StorageLevel
    ): ReceiverInputDStream[(K, V)] = {
    val walEnabled = ssc.conf.getBoolean("spark.streaming.receiver.writeAheadLog.enable", false)
    new KafkaInputDStream[K, V, U, T](ssc, kafkaParams, topics, walEnabled, storageLevel)
  }

官网在1.3.0版本中已经改正了这个错误。但是1.2.1版本文档里的这个错误估计是忘了改了。

你可能感兴趣的:(spark)