spark sql——5. spark sql操作mysql表

目标:

1.jdbc到mysql,读mysql的表并load成dataframe

2.对dataframe执行dsl、sql语句

3.两张表的连接查询操作

4.另存dataframe为表,保存到mysql

 

spark自带的案例在:

/examples/src/.../sql/SQLDataSourceExample.scala

 

jar包:

jdbc的jar包为mysql-connector-java-5.1.47.jar,

可放在spark的jars目录下(采用此方法),也可在spark-submit时指定。

 

mysql的数据准备:

[root@hadoop01 ~]# mysql -u root -p

mysql> create database MyDB;

mysql> use MyDB;

第一张表StudentInfo:

mysql> create table StudentInfo(ID int(20),Name varchar(20),Gender char(1),birthday date);

mysql> insert into StudentInfo values(1,"A","F","1996-09-12"),(2,"b","M","1995-12-23"),(3,"C","M","1996-10-29"),(4,"D","M","1995-02-25"),(5,"E","F","1997-06-06");

mysql> select * from StudentInfo;

+------+------+--------+------------+

| ID | Name | Gender | birthday |

+------+------+--------+------------+

| 1 | A | F | 1996-09-12 |

| 2 | b | M | 1995-12-23 |

| 3 | C | M | 1996-10-29 |

| 4 | D | M | 1995-02-25 |

| 5 | E | F | 1997-06-06 |

+------+------+--------+------------+

5 rows in set (0.00 sec)

第二张表Score:

mysql> create table Score(ID int(20),Name varchar(20),score float(10));

mysql> insert into Score values(1,"A",91),(2,"B",87),(5,"E",88),(9,"H",89),(10,"P",97);

mysql> select * from Score;

+------+------+-------+

| ID | Name | score |

+------+------+-------+

| 1 | A | 91 |

| 2 | B | 87 |

| 5 | E | 88 |

| 9 | H | 89 |

| 10 | P | 97 |

+------+------+-------+

5 rows in set (0.03 sec)

mysql>

 

 

操作mysql表:

步骤:

1. 创建SparkSession

2. jdbc连接,load表

3. dsl风格操作

 

import org.apache.spark.sql.SparkSession

 

object mysql {

def main(args: Array[String]): Unit = {

 

// SparkSession

// Spark2.0始,spark使用SparkSession接口代替SQLcontext和HiveContext

val spark = SparkSession.builder()

.appName("mysql_test")

.master("local")

.getOrCreate()

 

//implicits隐士转换包,toDF、show方法等都需要这个包

import spark.implicits._

 

 

//jdbc连接,传入上面设置的参数

val mysqlDF = spark.read

.format("jdbc")

.option("url","jdbc:mysql://127.0.0.1:3306/MyDB") //jdbc连接的地址

.option("driver","com.mysql.jdbc.Driver") //驱动

.option("dbtable","StudentInfo") //学生信息表

.option("user","root") //用户名

.option("password","root") //密码

.load()

 

//测试

mysqlDF.show()

//来个复杂的

mysqlDF.select("Name","Gender","ID").where("Gender='M'").filter("id>1").sort($"id".desc).limit(3).show()

}

}

 

结果:

mysqlDF.show()

spark sql——5. spark sql操作mysql表_第1张图片

mysqlDF.select("Name","Gender","ID").where("Gender='M'").filter("id>1").sort($"id".desc).limit(3).show()

spark sql——5. spark sql操作mysql表_第2张图片

 

sql语句:

想要使用sql语句,就要注册成临时表

scoreDF.createOrReplaceTempView("scoreTable")

spark.sql("select * from scoreTable").show

 

 

 

连表查询:

内连接和外连接:https://blog.csdn.net/coding_hello/article/details/75452436

简单,与上面一样,再导入一张表,连表查询即可

上面代码中的mysqlDF不好分辨,改成studentDF

命令:DF1.join(DF2,"colName").show()

因为studentDF和scoreDF有两列相同,用Seq包起来

studentDF.join(scoreDF,Seq("ID","Name")).select("*").show

 

详细代码与结果:

import org.apache.spark.sql.SparkSession

 

object mysql {

def main(args: Array[String]): Unit = {

 

// SparkSession

// Spark2.0始,spark使用SparkSession接口代替SQLcontext和HiveContext

val spark = SparkSession.builder()

.appName("mysql_test")

.master("local")

.getOrCreate()

 

//implicits隐士转换包,toDF、show方法等都需要这个包

import spark.implicits._

 

 

//jdbc连接,传入上面设置的参数

val studentDF = spark.read

.format("jdbc")

.option("url","jdbc:mysql://127.0.0.1:3306/MyDB") //jdbc连接的地址

.option("driver","com.mysql.jdbc.Driver") //驱动

.option("dbtable","StudentInfo") //学生信息表

.option("user","root") //用户名

.option("password","root") //密码

.load()

 

val scoreDF = spark.read

.format("jdbc")

.option("url","jdbc:mysql://127.0.0.1:3306/MyDB") //jdbc连接的地址

.option("driver","com.mysql.jdbc.Driver") //驱动

.option("dbtable","Score") //学生信息表

.option("user","root") //用户名

.option("password","root") //密码

.load()

 

//连表查询

studentDF.join(scoreDF,Seq("ID","Name")).select("Name","score").show

 

}

}

spark sql——5. spark sql操作mysql表_第3张图片

 

 

 

保存为表,存到mysql数据库:

import java.util.Properties

//创建Properties存储数据库相关属性

val prop = new Properties()

prop.put("user","root")

prop.put("password","root")

//写入

scoreDF.write.jdbc("jdbc:mysql://127.0.0.1:3306/MyDB","score_1",prop)

 

查看mysql:

mysql> select * from score_1;

+------+------+-------+

| ID | Name | score |

+------+------+-------+

| 1 | A | 91 |

| 2 | B | 87 |

| 5 | E | 88 |

| 9 | H | 89 |

| 10 | P | 97 |

+------+------+-------+

5 rows in set (0.00 sec)

 

 

也可以用和加载类似的方式:

jdbcDF.write .format("jdbc") .option("url", "jdbc:postgresql:dbserver") .option("dbtable", "schema.tablename") .option("user", "username") .option("password", "password") .save()

 

 

 

 

 

 

 

 

 

 

你可能感兴趣的:(spark(scala))