Gelly: Flink Graph API

Gelly はFlinkのためのグラフAPIです。Flinkでのグラフ解析アプリケーションの開発を簡単にすることを目的とするメソッドとユーティリティのセットを含んでいます。Gellyでは、グラフはバッチ処理APIによって提供されるものと似た高レベル関数を使って、変換および修正することができます。Gellyはグラフの生成、変換と修正、およびグラフのアルゴリズムのライブラリを提供します。

Gellyの使用

Gelly は現在のところlibraries Mavenプロジェクトの一部です。全ての関連するクラスは org.apache.flink.graph パッケージにあります。

Gellyを使うために以下の依存を pom.xmlに追加してください。

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-gelly_2.10</artifactId>
    <version>1.3-SNAPSHOT</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-gelly-scala_2.10</artifactId>
    <version>1.3-SNAPSHOT</version>
</dependency>

Note that Gelly is not part of the binary distribution. See linking for instructions on packaging Gelly libraries into Flink user programs.

The remaining sections provide a description of available methods and present several examples of how to use Gelly and how to mix it with the Flink DataSet API.

Running Gelly Examples

The Gelly library and examples jars are provided in the Flink distribution in the folder opt (for versions older than Flink 1.2 these can be manually downloaded from Maven Central.

To run the Gelly examples the flink-gelly (for Java) or flink-gelly-scala (for Scala) jar must be copied to Flink’s lib directory.

cp opt/flink-gelly_*.jar lib/
cp opt/flink-gelly-scala_*.jar lib/

Gelly’s examples jar includes both drivers for the library methods as well as additional example algorithms. After configuring and starting the cluster, list the available algorithm classes:

./bin/start-cluster.sh
./bin/flink run opt/flink-gelly-examples_*.jar

The Gelly drivers can generate RMat graph data or read the edge list from a CSV file. Each node in a cluster must have access to the input file. Calculate graph metrics on a directed generated graph:

./bin/flink run -c org.apache.flink.graph.drivers.GraphMetrics opt/flink-gelly-examples_*.jar \
    --directed true --input rmat

The size of the graph is adjusted by the --scale and --edge_factor parameters. The library generator provides access to additional configuration to adjust the power-law skew and random noise.

Sample social network data is provided by the Stanford Network Analysis Project. The com-lj data set is a good starter size. Run a few algorithms and monitor the job progress in Flink’s Web UI:

wget -O - http://snap.stanford.edu/data/bigdata/communities/com-lj.ungraph.txt.gz | gunzip -c > com-lj.ungraph.txt

./bin/flink run -q -c org.apache.flink.graph.drivers.GraphMetrics opt/flink-gelly-examples_*.jar \
    --directed true --input csv --type integer --input_filename com-lj.ungraph.txt --input_field_delimiter '\t'

./bin/flink run -q -c org.apache.flink.graph.drivers.ClusteringCoefficient opt/flink-gelly-examples_*.jar \
    --directed true --input csv --type integer --input_filename com-lj.ungraph.txt  --input_field_delimiter '\t' \
    --output hash

./bin/flink run -q -c org.apache.flink.graph.drivers.JaccardIndex opt/flink-gelly-examples_*.jar \
    --input csv --type integer --simplify true --input_filename com-lj.ungraph.txt --input_field_delimiter '\t' \
    --output hash

Please submit feature requests and report issues on the user mailing list or Flink Jira. We welcome suggestions for new algorithms and features as well as code contributions.

上に戻る

TOP
inserted by FC2 system