在项目的pom文件中加上Spark GraphX的包:
org.apache.spark
spark-graphx_2.10
1.6.0
2. 设置运行环境
// 设置运行环境
val conf = new SparkConf().setAppName("Simple GraphX").setMaster("spark://master:7077").setJars(Seq("E:\\Intellij\\Projects\\SimpleGraphX\\SimpleGraphX.jar"))
val sc = new SparkContext(conf)
3. 图的构造
// 顶点
val vertexArray = Array(
(1L,("Alice", 38)),
(2L,("Henry", 27)),
(3L,("Charlie", 55)),
(4L,("Peter", 32)),
(5L,("Mike", 35)),
(6L,("Kate", 23))
)
// 边
val edgeArray = Array(
Edge(2L, 1L, 5),
Edge(2L, 4L, 2),
Edge(3L, 2L, 7),
Edge(3L, 6L, 3),
Edge(4L, 1L, 1),
Edge(5L, 2L, 3),
Edge(5L, 3L, 8),
Edge(5L, 6L, 8)
)
然后再利用点和边生成各自的RDD:
//构造vertexRDD和edgeRDD
val vertexRDD:RDD[(Long,(String,Int))] = sc.parallelize(vertexArray)
val edgeRDD:RDD[Edge[Int]] = sc.parallelize(edgeArray)
最后利用两个RDD生成图:
// 构造图
val graph:Graph[(String,Int),Int] = Graph(vertexRDD, edgeRDD)
4. 图的属性操作
//图的属性操作
println("*************************************************************")
println("属性演示")
println("*************************************************************")
// 方法一
println("找出图中年龄大于20的顶点方法之一:")
graph.vertices.filter{case(id,(name,age)) => age>20}.collect.foreach {
case(id,(name,age)) => println(s"$name is $age")
}
// 方法二
println("找出图中年龄大于20的顶点方法之二:")
graph.vertices.filter(v => v._2._2>20).collect.foreach {
v => println(s"${v._2._1} is ${v._2._2}")
}
// 边的操作
println("找出图中属性大于3的边:")
graph.edges.filter(e => e.attr>3).collect.foreach(e => println(s"${e.srcId} to ${e.dstId} att ${e.attr}"))
println
// Triplet操作
println("列出所有的Triples:")
for(triplet <- graph.triplets.collect){
println(s"${triplet.srcAttr._1} likes ${triplet.dstAttr._1}")
}
println
println("列出边属性>3的Triples:")
for(triplet <- graph.triplets.filter(t => t.attr > 3).collect){
println(s"${triplet.srcAttr._1} likes ${triplet.dstAttr._1}")
}
println
// Degree操作
println("找出图中最大的出度,入度,度数:")
def max(a:(VertexId,Int), b:(VertexId,Int)):(VertexId,Int) = {
if (a._2>b._2) a else b
}
println("Max of OutDegrees:" + graph.outDegrees.reduce(max))
println("Max of InDegrees:" + graph.inDegrees.reduce(max))
println("Max of Degrees:" + graph.degrees.reduce(max))
println
运行结果:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/05/22 20:45:35 INFO Slf4jLogger: Slf4jLogger started
17/05/22 20:45:35 INFO Remoting: Starting remoting
17/05/22 20:45:35 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:53375]
*************************************************************
属性演示
*************************************************************
找出图中年龄大于20的顶点方法之一:
Peter is 32
Alice is 38
Charlie is 55
Mike is 35
找出图中年龄大于20的顶点方法之二:
Peter is 32
Alice is 38
Charlie is 55
Mike is 35
找出图中属性大于3的边:
to 2 att 7
to 3 att 8
to 6 att 8
列出所有的Triples:
Henry likes Alice
Henry likes Peter
Charlie likes Henry
Charlie likes Kate
Peter likes Alice
Mike likes Henry
Mike likes Charlie
Mike likes Kate
列出边属性>3的Triples:
Charlie likes Henry
Mike likes Charlie
Mike likes Kate
找出图中最大的出度,入度,度数:
Max of OutDegrees:(5,3)
Max of InDegrees:(1,2)
Max of Degrees:(2,4)
5. 图的转换操作
// 转换操作
println("*************************************************************")
println("转换操作")
println("*************************************************************")
println("顶点的转换操作,顶点age+10:")
graph.mapVertices{case(id,(name,age)) => (id,(name,age+10))}.vertices.collect.foreach(v => println(s"${v._2._1} is${v._2._2}"))
println("边的转换操作,边的属性*2:")
graph.mapEdges(e => e.attr*2).edges.collect.foreach(e => println(s"${e.srcId} to ${e.dstId} att ${e.attr}"))
运行结果:
*************************************************************
转换操作
*************************************************************
顶点的转换操作,顶点age+1:
is(Peter,33)
is(Kate,24)
is(Henry,28)
is(Alice,39)
is(Charlie,56)
is(Mike,36)
边的转换操作,边的属性*3:
to 1 att 15
to 4 att 6
to 2 att 21
to 6 att 9
to 1 att 3
to 2 att 9
to 3 att 24
to 6 att 24
6. 图的结构操作
println("*************************************************************")
println("结构操作")
println("*************************************************************")
println("顶点年纪>25的子图:")
val subGraph = graph.subgraph(vpred = (id,vd) => vd._2 >= 25)
println("子图所有顶点:")
subGraph.vertices.collect.foreach(v => println(s"${v._2._1} is ${v._2._2}"))
println
println("子图所有边:")
subGraph.edges.collect.foreach(e => println(s"${e.srcId} to ${e.dstId} att ${e.attr}"))
从图中选出年龄大于等于25岁的顶点。运行结果:
*************************************************************
结构操作
*************************************************************
顶点年纪>25的子图:
子图所有顶点:
Peter is 32
Henry is 27
Alice is 38
Charlie is 55
Mike is 35
子图所有边:
to 1 att 5
to 4 att 2
to 2 att 7
to 1 att 1
to 2 att 3
to 3 att 8
7. 图的连接操作
// 连接操作
println("*************************************************************")
println("连接操作")
println("*************************************************************")
case class User(name:String, age:Int, inDeg:Int, outDeg:Int)
// 创建一个新图,顶点VD的数据类型为User,并从graph做类型转换
val initialUserGraph:Graph[User, Int] = graph.mapVertices{case(id,(name,age)) => User(name,age,0,0)}
// initialUserGraph与inDegrees,outDegrees(RDD)进行连接,并修改initialUserGraph中inDeg值,outDeg值
val userGraph = initialUserGraph.outerJoinVertices(initialUserGraph.inDegrees){
case(id, u, inDegOpt) => User(u.name, u.age, inDegOpt.getOrElse(0), u.outDeg)}.outerJoinVertices(initialUserGraph.outDegrees){
case(id, u, outDegOpt) => User(u.name, u.age, u.inDeg, outDegOpt.getOrElse(0))
}
println("连接图的属性:")
userGraph.vertices.collect.foreach(v => println(s"${v._2.name} inDeg:${v._2.inDeg} outDeg:${v._2.outDeg}"))
println("出度和入度相同的人员:")
userGraph.vertices.filter{
case(id, v) => v.inDeg==v.outDeg
}.collect.foreach{
case(id, property) => println(property.name)
}
println
其实是一个图分别和自己的入度图、出度图进行连接操作,以便把顶点的出度和入度写入顶点的属性。
*************************************************************
连接操作
*************************************************************
连接图的属性:
Peter inDeg:1 outDeg:1
Kate inDeg:2 outDeg:0
Henry inDeg:2 outDeg:2
Alice inDeg:2 outDeg:0
Charlie inDeg:1 outDeg:2
Mike inDeg:0 outDeg:3
出度和入度相同的人员:
Peter
Henry
8. 图的聚合操作
// 聚合操作
println("*************************************************************")
println("聚合操作")
println("*************************************************************")
println("找出年纪最大的追求者:")
val oldestFollower:VertexRDD[(String,Int)] = userGraph.mapReduceTriplets[(String,Int)](
// 将源顶点的属性发送给目标顶点,map过程
edge => Iterator((edge.dstId,(edge.srcAttr.name,edge.srcAttr.age))),
// 得到最大追求者,reduce过程
(a,b) => if(a._2>b._2) a else b
)
userGraph.vertices.leftJoin(oldestFollower){(id,user,optOldestFollower) =>
optOldestFollower match{
case None => s"${user.name} does not have any followers."
case Some(oldestAge) => s"The oldest age of ${user.name} \'s followers is ${oldestAge._2}(${oldestAge._1})."
}
}.collect.foreach{case(id,str) => println(str)}
println
// 找出追求者的平均年龄
println("找出追求者的平均年龄:")
val averageAge:VertexRDD[Double] = userGraph.mapReduceTriplets[(Int,Double)](
// 将源顶点的属性(1,Age)发送给目标顶点,map过程
edge => Iterator((edge.dstId,(1,edge.srcAttr.age.toDouble))),
// 得到追求者的数量和总年龄
(a,b) => ((a._1+b._1),(a._2+b._2))
).mapValues((id,p) => p._2/p._1)
userGraph.vertices.leftJoin(averageAge){(id,user,optAverageAge) =>
optAverageAge match{
case None => s"${user.name} does not have any followers."
case Some(avgAge) => s"The average age of ${user.name} \'s followers is $avgAge."
}
}.collect.foreach{case(id,str) => println(str)}
println
// 聚合操作2
println("*************************************************************")
println("聚合操作2")
println("*************************************************************")
println("找出3到各顶点的最短距离:")
// 定义源点
val sourceId:VertexId = 3L
val initialGraph = graph.mapVertices((id,_) => if(id==sourceId) 0.0 else Double.PositiveInfinity)
val sssp = initialGraph.pregel(Double.PositiveInfinity)(
(id,dist,newDist) => math.min(dist,newDist),
// 权重计算
triplet=>{
if(triplet.srcAttr + triplet.attr < triplet.dstAttr){
Iterator((triplet.dstId, triplet.srcAttr+triplet.attr))
} else{
Iterator.empty
}
},
// 最短距离
(a,b) => math.min(a,b)
)
println(sssp.vertices.collect.mkString("\n"))
运行结果:
*************************************************************
聚合操作
*************************************************************
找出年纪最大的追求者:
The oldest age of Peter 's followers is 27(Henry).
The oldest age of Kate 's followers is 55(Charlie).
The oldest age of Henry 's followers is 55(Charlie).
The oldest age of Alice 's followers is 32(Peter).
The oldest age of Charlie 's followers is 35(Mike).
Mike does not have any followers.
找出追求者的平均年龄:
The average age of Peter 's followers is 27.0.
The average age of Kate 's followers is 45.0.
The average age of Henry 's followers is 45.0.
The average age of Alice 's followers is 29.5.
The average age of Charlie 's followers is 35.0.
Mike does not have any followers.
*************************************************************
聚合操作2
*************************************************************
找出3到各顶点的最短距离:
(4,9.0)
(6,3.0)
(2,7.0)
(1,10.0)
(3,0.0)
(5,Infinity)