Spark中的排序SortBy

1、Tuple类型

val products = sc.parallelize(List("屏保 20 10","支架 20 1000","酒精棉 5 2000","吸氧机 5000 1000"))

    val productData = products.map(x=>{
     
      val splits = x.split(" ")
      val name = splits(0)
      val price = splits(1).toDouble
      val amount = splits(2).toInt
      (name,price,amount)
    })

    /**
      * 价格降序
      */
    productData.sortBy(_._2,false).collect().foreach(println)

    /**
      * 多个字段的排序,在sortBy方法中可以传入一个tuple
      * 价格降序 库存降序
      */
    productData.sortBy(x=>(-x._2,-x._3)).collect().foreach(println)

2、实体类Bean
定义实体类如果没有继承Ordered和Serializable,会报如下错误

Error:(20, 23) No implicit Ordering defined for com.bigdata.sort.Products.
    productData.sortBy(x=>x).collect().foreach(println)
Error:(20, 23) not enough arguments for method sortBy: (implicit ord: Ordering[com.bigdata.sort.Products], implicit ctag: scala.reflect.ClassTag[com.bigdata.sort.Products])org.apache.spark.rdd.RDD[com.bigdata.sort.Products].
Unspecified value parameters ord, ctag.
    productData.sortBy(x=>x).collect().foreach(println)
----------------------------------------------------------
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1.0 in stage 0.0 (TID 1) had a not serializable result: com.bigdata.sort.Products

实体类定义如下

class Products(val name:String,val price:Double,val amount:Int) extends Ordered[Products] with Serializable {
     
  override def toString: String = name + "," + price + "," + amount
   /**
    * 库存降序
    * @param that
    * @return
    */
  override def compare(that: Products): Int = {
     
    that.amount - this.amount
  }
}
val products = sc.parallelize(List("屏保 20 10","支架 20 1000","酒精棉 5 2000","吸氧机 5000 1000"))

val productData = products.map(x=>{
     
      val splits = x.split(" ")
      val name = splits(0)
      val price = splits(1).toDouble
      val amount = splits(2).toInt
      new Products(name,price,amount)
    })

productData.sortBy(x=>x).collect().foreach(println)

3、case class

case class ProductsCaseClass(name:String,price:Double,amount:Int) extends Ordered[ProductsCaseClass] {
     
  override def toString: String ="case class" +  name + "," + price + "," + amount
  override def compare(that: ProductsCaseClass): Int = {
     
    that.amount - this.amount
  }
}

val products = sc.parallelize(List("屏保 20 10","支架 20 1000","酒精棉 5 2000","吸氧机 5000 1000"))
val productData = products.map(x=>{
     
      val splits = x.split(" ")
      val name = splits(0)
      val price = splits(1).toDouble
      val amount = splits(2).toInt
      ProductsCaseClass(name,price,amount)
    })

productData.sortBy(x=>x).collect().foreach(println)

隐式转换

case class ProductsInfo(name:String,price:Double,amount:Int)

val products = sc.parallelize(List("支架 20 10","屏保 20 1000","酒精棉 5 2000","吸氧机 5000 1000"))

    val productData = products.map(x=>{
     
      val splits = x.split(" ")
      val name = splits(0)
      val price = splits(1).toDouble
      val amount = splits(2).toInt
      new ProductsInfo(name,price,amount)
    })

    /**
      * 隐式object 升序
      */
implicit object productsInfo2OrderingObj extends Ordering[ProductsInfo] {
     
      override def compare(x: ProductsInfo, y: ProductsInfo): Int = {
     
        x.amount - y.amount
      }
    }

  /**
    * 隐式变量 降序
    */
implicit val productsInfo2Ordering:Ordering[ProductsInfo] = new Ordering[ProductsInfo]{
     
      override def compare(x: ProductsInfo, y: ProductsInfo): Int = {
     
        y.amount - x.amount
      }
 }

  /**
    * 隐式方法 降序
    */
implicit def productInfo2Ordered(productsInfo:ProductsInfo):Ordered[ProductsInfo] = new Ordered[ProductsInfo] {
     
      override def compare(that: ProductsInfo): Int = {
     
        that.amount - productsInfo.amount
      }
}


productData.sortBy(x=>x).collect().foreach(println)

其中优先级 隐式object > 隐式变量 > 隐式方法

4、Tuple类型的隐式转换(重要)

implicit val ord = Ordering[(Double,Int)].on[(String,Double,Int)](x=>(-x._2,-x._3))

val products = sc.parallelize(List("支架 20 10","屏保 20 1000","酒精棉 5 2000","吸氧机 5000 1000"))

val productData = products.map(x=>{
     
      val splits = x.split(" ")
      val name = splits(0)
      val price = splits(1).toDouble
      val amount = splits(2).toInt
      (name,price,amount)
    })
productData.sortBy(x=>x).collect.foreach(println)

你可能感兴趣的:(spark)