グラフ API

グラフの表現
グラフの生成
グラフのプロパティ
グラフの変換
グラフの変異
近傍メソッド
グラフの検証

グラフの表現

GellyではGraphは頂点のDataSetと辺のDataSetで表現されます。

Graph ノードはVertexの型によって表現されます。Vertex はユニークなIDと値によって定義されます。Vertex ID はComparable インタフェースを実装しなければなりません。値を持たない頂点は値の型をNullValueに設定することで表現されるかもしれません。

// create a new vertex with a Long ID and a String value
Vertex<Long, String> v = new Vertex<Long, String>(1L, "foo");

// create a new vertex with a Long ID and no value
Vertex<Long, NullValue> v = new Vertex<Long, NullValue>(1L, NullValue.getInstance());

// create a new vertex with a Long ID and a String value
val v = new Vertex(1L, "foo")

// create a new vertex with a Long ID and no value
val v = new Vertex(1L, NullValue.getInstance())

グラフの頂点はEdge の型によって表現されます。Edge は始点ID (始点のVertexのID)、終点ID (終点のVertexのID)および任意の値によって定義されます。始点および終点IDはVertex IDとして同じ型でなければなりません。値を持たない辺はNullValueの値の型を持ちます。

Edge<Long, Double> e = new Edge<Long, Double>(1L, 2L, 0.5);

// reverse the source and target of this edge
Edge<Long, Double> reversed = e.reverse();

Double weight = e.getValue(); // weight = 0.5

val e = new Edge(1L, 2L, 0.5)

// reverse the source and target of this edge
val reversed = e.reverse

val weight = e.getValue // weight = 0.5

GellyではEdge は常い始点から終点に向いています。各Edgeについてそれが始点から終点へ合致するEdgeを含む場合、Graph は向きが無いかもしれません。

上に戻る

グラフの生成

以下の方法でGraphを作成することができます:

辺のDataSetと任意の頂点のDataSetから:

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

DataSet<Vertex<String, Long>> vertices = ...

DataSet<Edge<String, Double>> edges = ...

Graph<String, Long, Double> graph = Graph.fromDataSet(vertices, edges, env);

val env = ExecutionEnvironment.getExecutionEnvironment

val vertices: DataSet[Vertex[String, Long]] = ...

val edges: DataSet[Edge[String, Double]] = ...

val graph = Graph.fromDataSet(vertices, edges, env)

辺を表すTuple2のDataSetから。Gellyは各 Tuple2 を Edge に変換します。ここで、最初のフィールドは始点IDで、2つ目のフィールドは終点IDでしょう。頂点と辺の両方の値はNullValueに設定されるでしょう。

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

DataSet<Tuple2<String, String>> edges = ...

Graph<String, NullValue, NullValue> graph = Graph.fromTuple2DataSet(edges, env);

val env = ExecutionEnvironment.getExecutionEnvironment

val edges: DataSet[(String, String)] = ...

val graph = Graph.fromTuple2DataSet(edges, env)

Tuple3の DataSetと任意のTuple2のDataSetから。この場合、Gellyは各Tuple3を Edgeに変換するでしょう。ここで、最初のフィールドは始点IDで、2つ目のフィールドは終点IDで、3つ目のフィール尾は辺の値でしょう。同様に、各Tuple2 はVertexに変換されるでしょう。ここで、最初のフィールドは頂点IDで、2つ目のフィールドは頂点の値でしょう:

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

DataSet<Tuple2<String, Long>> vertexTuples = env.readCsvFile("path/to/vertex/input").types(String.class, Long.class);

DataSet<Tuple3<String, String, Double>> edgeTuples = env.readCsvFile("path/to/edge/input").types(String.class, String.class, Double.class);

Graph<String, Long, Double> graph = Graph.fromTupleDataSet(vertexTuples, edgeTuples, env);

辺データのCSVファイルと任意の頂点データのCSVファイルから。この場合、Gellyは辺のCSVファイルの各行をEdgeに変換するでしょう。ここで、最初のフィールドは始点IDで、2つ目のフィールドは終点IDで、3つ目のフィールド(存在する場合)は辺の値でしょう。同様に、任意の頂点のCSVファイルからの行は、Vertexに変換されるでしょう。ここで、最初のフィールドは頂点IDで、2つ目のフィールド(存在する場合)は頂点の値でしょう。GraphCsvReaderからGraphを取得するには、以下のメソッドのうちの1つを使って型を指定する必要があります:
types(Class<K> vertexKey, Class<VV> vertexValue,Class<EV> edgeValue): 頂点および辺の値の両方が存在します。
edgeTypes(Class<K> vertexKey, Class<EV> edgeValue): グラフは辺の値を持ちますが、頂点の値を持ちません。
vertexTypes(Class<K> vertexKey, Class<VV> vertexValue): グラフは頂点の値を持ちますが、辺の値を持ちません。
keyType(Class<K> vertexKey): グラフは頂点の値も辺の値も持ちません。

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

// create a Graph with String Vertex IDs, Long Vertex values and Double Edge values
Graph<String, Long, Double> graph = Graph.fromCsvReader("path/to/vertex/input", "path/to/edge/input", env)
					.types(String.class, Long.class, Double.class);


// create a Graph with neither Vertex nor Edge values
Graph<Long, NullValue, NullValue> simpleGraph = Graph.fromCsvReader("path/to/edge/input", env).keyType(Long.class);

val env = ExecutionEnvironment.getExecutionEnvironment

val vertexTuples = env.readCsvFile[String, Long]("path/to/vertex/input")

val edgeTuples = env.readCsvFile[String, String, Double]("path/to/edge/input")

val graph = Graph.fromTupleDataSet(vertexTuples, edgeTuples, env)

辺データのCSVファイルと任意の頂点データのCSVファイルから。この場合、Gellyは辺のCSV ファイルからの各行をEdgeに変換します。各行の最初のフィールドは始点IDで、2つ目のフィールドは終点IDで、3つ目のフィールド(存在する場合)は辺の値でしょう。辺が関連する値を持たない場合、辺の値の型のパラメータ(3つ目の型の引数)をNullValueに設定します。頂点が頂点の値で初期化されるように指定することもできます。pathVerticesを使ってCSVファイルへのパスを提供すると、このファイルの各行はVertexに変換されるでしょう。各行の3つ目のフィールドは頂点IDで、2つ目のフィールドは頂点の値でしょう。vertexValueInitializerを使って頂点値のイニシャライザ MapFunction を与えると、この関数は頂点値を生成するために使われます。頂点のセットは自動的に辺の入力から生成されるでしょう。頂点に関連する値が無い場合は、頂点値の型パラメータ(2つ目の引数)をNullValueに設定します。頂点は自動的に辺の入力から頂点値の型NullValueを使って生成されるでしょう。

val env = ExecutionEnvironment.getExecutionEnvironment

// create a Graph with String Vertex IDs, Long Vertex values and Double Edge values
val graph = Graph.fromCsvReader[String, Long, Double](
		pathVertices = "path/to/vertex/input",
		pathEdges = "path/to/edge/input",
		env = env)


// create a Graph with neither Vertex nor Edge values
val simpleGraph = Graph.fromCsvReader[Long, NullValue, NullValue](
		pathEdges = "path/to/edge/input",
		env = env)

// create a Graph with Double Vertex values generated by a vertex value initializer and no Edge values
val simpleGraph = Graph.fromCsvReader[Long, Double, NullValue](
        pathEdges = "path/to/edge/input",
        vertexValueInitializer = new MapFunction[Long, Double]() {
            def map(id: Long): Double = {
                id.toDouble
            }
        },
        env = env)

辺のCollection と任意の頂点のCollectionから:

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

List<Vertex<Long, Long>> vertexList = new ArrayList...

List<Edge<Long, String>> edgeList = new ArrayList...

Graph<Long, Long, String> graph = Graph.fromCollection(vertexList, edgeList, env);

グラフの生成時に頂点の入力が与えられない場合、Gellyは辺の入力から自動的に Vertex DataSetを生成するでしょう。この場合、生成された頂点は値を持たないでしょう。別のやり方として、Vertex値を初期化するために生成メソッドの引数としてMapFunction を与えることができます:

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

// initialize the vertex value to be equal to the vertex ID
Graph<Long, Long, String> graph = Graph.fromCollection(edgeList,
				new MapFunction<Long, Long>() {
					public Long map(Long value) {
						return value;
					}
				}, env);

val env = ExecutionEnvironment.getExecutionEnvironment

val vertexList = List(...)

val edgeList = List(...)

val graph = Graph.fromCollection(vertexList, edgeList, env)

val env = ExecutionEnvironment.getExecutionEnvironment

// initialize the vertex value to be equal to the vertex ID
val graph = Graph.fromCollection(edgeList,
    new MapFunction[Long, Long] {
       def map(id: Long): Long = id
    }, env)

上に戻る

グラフのプロパティ

Gelly は様々なグラフプロパティとメトリクスを扱うために以下のメソッドを含みます:

// get the Vertex DataSet
DataSet<Vertex<K, VV>> getVertices()

// get the Edge DataSet
DataSet<Edge<K, EV>> getEdges()

// get the IDs of the vertices as a DataSet
DataSet<K> getVertexIds()

// get the source-target pairs of the edge IDs as a DataSet
DataSet<Tuple2<K, K>> getEdgeIds()

// get a DataSet of <vertex ID, in-degree> pairs for all vertices
DataSet<Tuple2<K, LongValue>> inDegrees()

// get a DataSet of <vertex ID, out-degree> pairs for all vertices
DataSet<Tuple2<K, LongValue>> outDegrees()

// get a DataSet of <vertex ID, degree> pairs for all vertices, where degree is the sum of in- and out- degrees
DataSet<Tuple2<K, LongValue>> getDegrees()

// get the number of vertices
long numberOfVertices()

// get the number of edges
long numberOfEdges()

// get a DataSet of Triplets<srcVertex, trgVertex, edge>
DataSet<Triplet<K, VV, EV>> getTriplets()

// get the Vertex DataSet
getVertices: DataSet[Vertex[K, VV]]

// get the Edge DataSet
getEdges: DataSet[Edge[K, EV]]

// get the IDs of the vertices as a DataSet
getVertexIds: DataSet[K]

// get the source-target pairs of the edge IDs as a DataSet
getEdgeIds: DataSet[(K, K)]

// get a DataSet of <vertex ID, in-degree> pairs for all vertices
inDegrees: DataSet[(K, LongValue)]

// get a DataSet of <vertex ID, out-degree> pairs for all vertices
outDegrees: DataSet[(K, LongValue)]

// get a DataSet of <vertex ID, degree> pairs for all vertices, where degree is the sum of in- and out- degrees
getDegrees: DataSet[(K, LongValue)]

// get the number of vertices
numberOfVertices: Long

// get the number of edges
numberOfEdges: Long

// get a DataSet of Triplets<srcVertex, trgVertex, edge>
getTriplets: DataSet[Triplet[K, VV, EV]]

上に戻る

グラフの変換

Map: Gellyは頂点値あるいは辺の値のmap変換を適用するための特別なメソッドを提供します。mapVertices およびmapEdges は新しいGraphを返します。この時頂点(あるいは辺)のIDは変更されませんが、値は与えられたユーザ定義のmap関数に応じて変換されます。map関数は頂点あるいは辺の値も変更することができます。

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
Graph<Long, Long, Long> graph = Graph.fromDataSet(vertices, edges, env);

// increment each vertex value by one
Graph<Long, Long, Long> updatedGraph = graph.mapVertices(
				new MapFunction<Vertex<Long, Long>, Long>() {
					public Long map(Vertex<Long, Long> value) {
						return value.getValue() + 1;
					}
				});

val env = ExecutionEnvironment.getExecutionEnvironment
val graph = Graph.fromDataSet(vertices, edges, env)

// increment each vertex value by one
val updatedGraph = graph.mapVertices(v => v.getValue + 1)

変換: Gellyは頂点および辺のIDの値および/あるいは型(translateGraphIDs)、頂点の値 (translateVertexValues)、あるいは辺の値(translateEdgeValues)の変換に特化したメソッドを提供します。変換はユーザ定義のmap関数によって行われ、それらの幾つかは org.apache.flink.graph.asm.translateパッケージの中で提供されます。3つ全ての変換メソッドの中で、同じMapFunctionを使うことができます。

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
Graph<Long, Long, Long> graph = Graph.fromDataSet(vertices, edges, env);

// translate each vertex and edge ID to a String
Graph<String, Long, Long> updatedGraph = graph.translateGraphIds(
				new MapFunction<Long, String>() {
					public String map(Long id) {
						return id.toString();
					}
				});

// translate vertex IDs, edge IDs, vertex values, and edge values to LongValue
Graph<LongValue, LongValue, LongValue> updatedGraph = graph
                .translateGraphIds(new LongToLongValue())
                .translateVertexValues(new LongToLongValue())
                .translateEdgeValues(new LongToLongValue())

val env = ExecutionEnvironment.getExecutionEnvironment
val graph = Graph.fromDataSet(vertices, edges, env)

// translate each vertex and edge ID to a String
val updatedGraph = graph.translateGraphIds(id => id.toString)

フィルタ: フィルタ変換はユーザ定義のフィルタ関数をGraphの頂点あるいは辺に適用します。filterOnEdges は与えられた述語を満足する辺のみを維持しながら、元のグラフのサブグラフを生成するでしょう。頂点のデータセットは修正されないことに注意してください。filterOnVertices はグラフの頂点それぞれにフィルターを適用します。頂点の叙述を満たさない始点および/あるいは終点を持つ辺は、結果の辺のデータセットから削除されます。subgraphメソッドはフィルタ関数を頂点と辺に同時に適用するために使うことができます。

Graph<Long, Long, Long> graph = ...

graph.subgraph(
		new FilterFunction<Vertex<Long, Long>>() {
			   	public boolean filter(Vertex<Long, Long> vertex) {
					// keep only vertices with positive values
					return (vertex.getValue() > 0);
			   }
		   },
		new FilterFunction<Edge<Long, Long>>() {
				public boolean filter(Edge<Long, Long> edge) {
					// keep only edges with negative values
					return (edge.getValue() < 0);
				}
		})

val graph: Graph[Long, Long, Long] = ...

// keep only vertices with positive values
// and only edges with negative values
graph.subgraph((vertex => vertex.getValue > 0), (edge => edge.getValue < 0))

フィルタ変換

Join: Gellyは他の入力データセットを使って頂点と辺のデータセットをjoinするための特別なメソッドを提供します。joinWithVertices は Tuple2 入力データセットを使って頂点をjoinします。join は頂点ID と Tuple2 入力の3つ目のフィールドをjoinキーとして使って行われます。メソッドは与えられたユーザ定義の変換関数に応じて頂点の値が更新された新しいGraphを返します。同様に、入力データセットは3つのメソッドのうちの1つを使って、辺をjoinすることができます。joinWithEdges はTuple3の入力DataSetを期待し、始点および終点IDの両方の複合キーでjoinします。joinWithEdgesOnSource はTuple2のDataSetを期待し、辺の開始キーと入力データセットの最初の属性をjoinします。joinWithEdgesOnTargetはTuple2のDataSetを期待し、辺の終了キーと入力データセットの最初の属性をjoinします3つ全てのメソッドは変と入力データセットの値に変換関数を適用します。もし入力データセットが複数回キーを含む場合、全てのGellyのjoinメソッドは最初に見つけた値のみを考慮するでしょう。

Graph<Long, Double, Double> network = ...

DataSet<Tuple2<Long, LongValue>> vertexOutDegrees = network.outDegrees();

// assign the transition probabilities as the edge weights
Graph<Long, Double, Double> networkWithWeights = network.joinWithEdgesOnSource(vertexOutDegrees,
				new VertexJoinFunction<Double, LongValue>() {
					public Double vertexJoin(Double vertexValue, LongValue inputValue) {
						return vertexValue / inputValue.getValue();
					}
				});

val network: Graph[Long, Double, Double] = ...

val vertexOutDegrees: DataSet[(Long, LongValue)] = network.outDegrees

// assign the transition probabilities as the edge weights
val networkWithWeights = network.joinWithEdgesOnSource(vertexOutDegrees, (v1: Double, v2: LongValue) => v1 / v2.getValue)

Reverse: reverse() メソッドは全ての辺の方向が逆になった新しいGraph を返します。
無向: GellyではGraphは常に有向です。無向グラフはグラフに対して全て反対向きの辺を追加することで表現することができます。この目的のために、GellyはgetUndirected()メソッドを提供します。
Union: Gellyの union() メソッドは指定されたグラフと現在のグラフの頂点と辺のセットにunion操作を行います。重複した頂点は結果のGraphから削除されます。一方でもし重複した辺が存在してもそれらは保持されるでしょう。

Union変換

Difference: Gellyの difference() メソッドは現在のグラフと指定されたグラフの頂点と辺のセットの difference を行います。
Intersect: Gellyの intersect() メソッドは現在のグラフと指定されたグラフの辺のセットのintersectを行います。結果は両方の入力のグラフに存在する全ての辺を含む新しいGraphです。もし2つの辺が同じ始点識別子、終点識別子、および辺の値を持つ場合は、2つの辺は等しいものと見なされます。結果のグラフの頂点は値を持ちません。もし頂点の値が必要な場合は、例えば joinWithVertices() メソッドを使って入力グラフのうちの1つからそれらを取り出すことができます。パラメータ distinct次第で、等しい辺は結果のGraphに1度だけ含まれるか、入力グラフの等しい辺のペアが現れる度に含まれます。

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

// create first graph from edges {(1, 3, 12) (1, 3, 13), (1, 3, 13)}
List<Edge<Long, Long>> edges1 = ...
Graph<Long, NullValue, Long> graph1 = Graph.fromCollection(edges1, env);

// create second graph from edges {(1, 3, 13)}
List<Edge<Long, Long>> edges2 = ...
Graph<Long, NullValue, Long> graph2 = Graph.fromCollection(edges2, env);

// Using distinct = true results in {(1,3,13)}
Graph<Long, NullValue, Long> intersect1 = graph1.intersect(graph2, true);

// Using distinct = false results in {(1,3,13),(1,3,13)} as there is one edge pair
Graph<Long, NullValue, Long> intersect2 = graph1.intersect(graph2, false);

val env = ExecutionEnvironment.getExecutionEnvironment

// create first graph from edges {(1, 3, 12) (1, 3, 13), (1, 3, 13)}
val edges1: List[Edge[Long, Long]] = ...
val graph1 = Graph.fromCollection(edges1, env)

// create second graph from edges {(1, 3, 13)}
val edges2: List[Edge[Long, Long]] = ...
val graph2 = Graph.fromCollection(edges2, env)


// Using distinct = true results in {(1,3,13)}
val intersect1 = graph1.intersect(graph2, true)

// Using distinct = false results in {(1,3,13),(1,3,13)} as there is one edge pair
val intersect2 = graph1.intersect(graph2, false)

- 上に戻る

グラフの変異

Gellyは入力のGraphから頂点と辺を追加と削除するために以下のメソッドを持ちます:

// adds a Vertex to the Graph. If the Vertex already exists, it will not be added again.
Graph<K, VV, EV> addVertex(final Vertex<K, VV> vertex)

// adds a list of vertices to the Graph. If the vertices already exist in the graph, they will not be added once more.
Graph<K, VV, EV> addVertices(List<Vertex<K, VV>> verticesToAdd)

// adds an Edge to the Graph. If the source and target vertices do not exist in the graph, they will also be added.
Graph<K, VV, EV> addEdge(Vertex<K, VV> source, Vertex<K, VV> target, EV edgeValue)

// adds a list of edges to the Graph. When adding an edge for a non-existing set of vertices, the edge is considered invalid and ignored.
Graph<K, VV, EV> addEdges(List<Edge<K, EV>> newEdges)

// removes the given Vertex and its edges from the Graph.
Graph<K, VV, EV> removeVertex(Vertex<K, VV> vertex)

// removes the given list of vertices and their edges from the Graph
Graph<K, VV, EV> removeVertices(List<Vertex<K, VV>> verticesToBeRemoved)

// removes *all* edges that match the given Edge from the Graph.
Graph<K, VV, EV> removeEdge(Edge<K, EV> edge)

// removes *all* edges that match the edges in the given list
Graph<K, VV, EV> removeEdges(List<Edge<K, EV>> edgesToBeRemoved)

// adds a Vertex to the Graph. If the Vertex already exists, it will not be added again.
addVertex(vertex: Vertex[K, VV])

// adds a list of vertices to the Graph. If the vertices already exist in the graph, they will not be added once more.
addVertices(verticesToAdd: List[Vertex[K, VV]])

// adds an Edge to the Graph. If the source and target vertices do not exist in the graph, they will also be added.
addEdge(source: Vertex[K, VV], target: Vertex[K, VV], edgeValue: EV)

// adds a list of edges to the Graph. When adding an edge for a non-existing set of vertices, the edge is considered invalid and ignored.
addEdges(edges: List[Edge[K, EV]])

// removes the given Vertex and its edges from the Graph.
removeVertex(vertex: Vertex[K, VV])

// removes the given list of vertices and their edges from the Graph
removeVertices(verticesToBeRemoved: List[Vertex[K, VV]])

// removes *all* edges that match the given Edge from the Graph.
removeEdge(edge: Edge[K, EV])

// removes *all* edges that match the edges in the given list
removeEdges(edgesToBeRemoved: List[Edge[K, EV]])

近傍メソッド

近傍メソッドを使って頂点はそれらの最初の隣接状での集約を行うことができます。reduceOnEdges() は頂点の隣接辺の値での集約を計算するために使うことができ、reduceOnNeighbors() は隣接頂点の値での集約を計算するために使うことができます。これらのメソッドは結合および可換な集約を前提とし、極めてパフォーマンスを改善しながら、内部的に結合器を利用します。隣接スコープはEdgeDirection パラメータによって定義されます。IN, OUT あるいは ALLを取ります。IN は頂点の全てのやってくる辺(隣接)を集め、OUT は全ての出ていく辺(隣接)を集めます。一方でALL は全ての辺(隣接)を集めるでしょう。

例えば、以下のグラフの各頂点について、全ての出ていく辺の最小の重みを選択したいとします:

reduceOnEdgesの例

以下のコードは各頂点について外向きの辺を集め、各結果の近傍にユーザ定義関数SelectMinWeight()を適用するでしょう:

Graph<Long, Long, Double> graph = ...

DataSet<Tuple2<Long, Double>> minWeights = graph.reduceOnEdges(new SelectMinWeight(), EdgeDirection.OUT);

// user-defined function to select the minimum weight
static final class SelectMinWeight implements ReduceEdgesFunction<Double> {

		@Override
		public Double reduceEdges(Double firstEdgeValue, Double secondEdgeValue) {
			return Math.min(firstEdgeValue, secondEdgeValue);
		}
}

val graph: Graph[Long, Long, Double] = ...

val minWeights = graph.reduceOnEdges(new SelectMinWeight, EdgeDirection.OUT)

// user-defined function to select the minimum weight
final class SelectMinWeight extends ReduceEdgesFunction[Double] {
	override def reduceEdges(firstEdgeValue: Double, secondEdgeValue: Double): Double = {
		Math.min(firstEdgeValue, secondEdgeValue)
	}
 }

reduceOnEdgesの例

同様に、各頂点について、全てのやってくる近傍の値の合計を計算したいとします。以下のコードは各頂点についてやってくる隣接を集め、各近傍にユーザ定義関数SumValues()を適用するでしょう。

Graph<Long, Long, Double> graph = ...

DataSet<Tuple2<Long, Long>> verticesWithSum = graph.reduceOnNeighbors(new SumValues(), EdgeDirection.IN);

// user-defined function to sum the neighbor values
static final class SumValues implements ReduceNeighborsFunction<Long> {

	    	@Override
	    	public Long reduceNeighbors(Long firstNeighbor, Long secondNeighbor) {
		    	return firstNeighbor + secondNeighbor;
	  	}
}

val graph: Graph[Long, Long, Double] = ...

val verticesWithSum = graph.reduceOnNeighbors(new SumValues, EdgeDirection.IN)

// user-defined function to sum the neighbor values
final class SumValues extends ReduceNeighborsFunction[Long] {
   	override def reduceNeighbors(firstNeighbor: Long, secondNeighbor: Long): Long = {
    	firstNeighbor + secondNeighbor
    }
}

reduceOnNeighbors の例

集約関数が結合性および可換性を持たないか、頂点ごとに1つ以上の値を返すことが望ましい場合は、もっと一般的なgroupReduceOnEdges() および groupReduceOnNeighbors() メソッドを使うことができます。これらのメソッドは頂点ごとに0、1つ以上の値を返し、近傍全体へのアクセスを提供します。

例えば、以下のコードは0.5以上の重みを持つ辺と接続している全ての頂点のペアを出力するでしょう:

Graph<Long, Long, Double> graph = ...

DataSet<Tuple2<Vertex<Long, Long>, Vertex<Long, Long>>> vertexPairs = graph.groupReduceOnNeighbors(new SelectLargeWeightNeighbors(), EdgeDirection.OUT);

// user-defined function to select the neighbors which have edges with weight > 0.5
static final class SelectLargeWeightNeighbors implements NeighborsFunctionWithVertexValue<Long, Long, Double,
		Tuple2<Vertex<Long, Long>, Vertex<Long, Long>>> {

		@Override
		public void iterateNeighbors(Vertex<Long, Long> vertex,
				Iterable<Tuple2<Edge<Long, Double>, Vertex<Long, Long>>> neighbors,
				Collector<Tuple2<Vertex<Long, Long>, Vertex<Long, Long>>> out) {

			for (Tuple2<Edge<Long, Double>, Vertex<Long, Long>> neighbor : neighbors) {
				if (neighbor.f0.f2 > 0.5) {
					out.collect(new Tuple2<Vertex<Long, Long>, Vertex<Long, Long>>(vertex, neighbor.f1));
				}
			}
		}
}

val graph: Graph[Long, Long, Double] = ...

val vertexPairs = graph.groupReduceOnNeighbors(new SelectLargeWeightNeighbors, EdgeDirection.OUT)

// user-defined function to select the neighbors which have edges with weight > 0.5
final class SelectLargeWeightNeighbors extends NeighborsFunctionWithVertexValue[Long, Long, Double,
  (Vertex[Long, Long], Vertex[Long, Long])] {

	override def iterateNeighbors(vertex: Vertex[Long, Long],
		neighbors: Iterable[(Edge[Long, Double], Vertex[Long, Long])],
		out: Collector[(Vertex[Long, Long], Vertex[Long, Long])]) = {

			for (neighbor <- neighbors) {
				if (neighbor._1.getValue() > 0.5) {
					out.collect(vertex, neighbor._2);
				}
			}
		}
   }

集約の計算が(集約が実施される)頂点の値へのアクセスを必要としない場合、ユーザ定義の関数のためにもっと効率的なEdgesFunction および NeighborsFunction を使うことをお勧めします。頂点の値へのアクセスが必要な場合、代わりにEdgesFunctionWithVertexValue と NeighborsFunctionWithVertexValue を使うべきです。

上に戻る

グラフの検証

Gellyは入力グラフ上で検証チェックを行うための簡単なユーティリティを提供します。アプリケーションのコンテキストに応じて、グラフは特定の条件に応じて有効あるいはそうでないかもしれません。例えば、グラフが重複した辺を含むかどうか、あるいはその構造が二分であるかどうかを検証する必要があるかもしれません。グラフを検証するために、独自のGraphValidatorを定義し、validate()メソッドを実装することができます。InvalidVertexIdsValidator は Gelly の事前定義されたvalidatorです。辺のセットが有効な頂点IDを含むか、つまり全ての辺のIDが頂点IDのセットの中に存在するかをチェックします。

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

// create a list of vertices with IDs = {1, 2, 3, 4, 5}
List<Vertex<Long, Long>> vertices = ...

// create a list of edges with IDs = {(1, 2) (1, 3), (2, 4), (5, 6)}
List<Edge<Long, Long>> edges = ...

Graph<Long, Long, Long> graph = Graph.fromCollection(vertices, edges, env);

// will return false: 6 is an invalid ID
graph.validate(new InvalidVertexIdsValidator<Long, Long, Long>());

val env = ExecutionEnvironment.getExecutionEnvironment

// create a list of vertices with IDs = {1, 2, 3, 4, 5}
val vertices: List[Vertex[Long, Long]] = ...

// create a list of edges with IDs = {(1, 2) (1, 3), (2, 4), (5, 6)}
val edges: List[Edge[Long, Long]] = ...

val graph = Graph.fromCollection(vertices, edges, env)

// will return false: 6 is an invalid ID
graph.validate(new InvalidVertexIdsValidator[Long, Long, Long])

上に戻る