This documentation is for an unreleased version of Apache Flink. We recommend you use the latest stable version.

実験的な機能 #

このセクションでは、DataStream APIの実験的な機能について説明します。実験的な機能はまだ開発中であり、不安定になったり、不完全になったり、将来のバージョンで大幅に変更される可能性があります。

事前にパーティション化されたデータストリームをキー付きストリームとして解釈する #

シャッフルを避けるために、事前にパーティション化されたデータストリームをキー付きのストリームとして再解釈することができます。

警告: 再解釈されたデータストリームは、FlinkのkeyByがシャッフルでデータを分割するのと正確に同じ方法で分割されている必要ｙがあります。キーグループ割り合て

このユースケースの1つは2つのジョブ間の具体化されたシャッフルです: 最初のジョブはkeyByシャッフルを行い、各出力をパーティションに具体化します。2つ目のジョブは、並列インスタンスごとに、最初のジョブによって作成された対応するパーティションから読み込むソースがあります。これらのソースは、キー付きストリームとして再解釈でき、例えばウィンドウを適用します。Notice that this trick makes the second job embarrassingly parallel, which can be helpful for a fine-grained recovery scheme.

この再解釈機能はDataStreamUtilsを通じて公開されます:

static <T, K> KeyedStream<T, K> reinterpretAsKeyedStream(
    DataStream<T> stream,
    KeySelector<T, K> keySelector,
    TypeInformation<K> typeInfo)

基本ストリーム、キーセレクタ、タイプ情報が与えられると、メソッドは基本ストリームからキー付きストリームを生成します。

コードの例:

Java

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<Integer> source = ...;
DataStreamUtils.reinterpretAsKeyedStream(source, (in) -> in, TypeInformation.of(Integer.class))
    .window(TumblingEventTimeWindows.of(Time.seconds(1)))
    .reduce((a, b) -> a + b)
    .addSink(new DiscardingSink<>());
env.execute();

Scala

val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
val source = ...
new DataStreamUtils(source).reinterpretAsKeyedStream((in) => in)
  .window(TumblingEventTimeWindows.of(Time.seconds(1)))
  .reduce((a, b) => a + b)
  .addSink(new DiscardingSink[Int])
env.execute()