RDD保存SaveMode

Save Modes

Save operations can optionally take a SaveMode, that specifies how to handle existing data if present. It is important to realize that these save modes do not utilize any locking and are not atomic. Additionally, when performing an Overwrite, the data will be deleted before writing out the new data.

保存操作可以选择SaveMode,它指定了如何处理现有的数据。重要的是要认识到这些保存模式不使用任何锁定,而不是原子。此外,在执行覆盖时,数据将在写入新数据之前被删除。


Scala/Java                                                                                                                                   

Any Language                                Meaning
SaveMode.ErrorIfExists(default) "error"(default)

When saving a DataFrame to a data source, if data already exists, an exception is expected to be thrown

在将DataFrame保存到数据源时,如果数据已经存在,则会抛出异常。.

SaveMode.Append "append"

When saving a DataFrame to a data source, if data/table already exists, contents of the DataFrame are expected to be appended to existing data

当将DataFrame保存到数据源时,如果数据/表已经存在,那么DataFrame的内容将被追加到现有的数据中。.

SaveMode.Overwrite "overwrite"

Overwrite mode means that when saving a DataFrame to a data source, if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame

覆盖模式意味着,当将DataFrame保存到数据源时,如果数据/表已经存在,则现有的数据将被DataFrame的内容覆盖。.

SaveMode.Ignore "ignore"

Ignore mode means that when saving a DataFrame to a data source, if data already exists, the save operation is expected to not save the contents of the DataFrame and to not change the existing data. This is similar to a CREATE TABLE IF NOT EXISTS in SQL

.忽略模式意味着在将DataFrame保存到数据源时,如果数据已经存在,那么保存操作将不会保存DataFrame的内容,也不会更改现有的数据。如果在SQL中不存在,则类似于SQL中的CREATE TABLE IF NOT EXISTS

实例如下:
accessDF.coalesce(1).write.format("parquet").partitionBy("day")
  .mode(SaveMode.Overwrite)//mode(SaveMode.Overwrite)覆盖已经存在的文件
  .save("src/data/clean")//存储为parquet格式,按day分区

你可能感兴趣的:(spark)