チェックポイント

概要

Checkpoints make state in Flink fault tolerant by allowing state and the corresponding stream positions to be recovered, thereby giving the application the same semantics as a failure-free execution.

See Checkpointing for how to enable and configure checkpoints for your program.

外部化されたチェックポイント

Checkpoints are by default not persisted externally and are only used to resume a job from failures. They are deleted when a program is cancelled. You can, however, configure periodic checkpoints to be persisted externally similarly to savepoints. These externalized checkpoints write their meta data out to persistent storage and are not automatically cleaned up when the job fails. このやり方で、ジョブが失敗した時に再開するためのチェックポイントをそのあたりに持つでしょう。

CheckpointConfig config = env.getCheckpointConfig();
config.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

ExternalizedCheckpointCleanup モードはジョブを取り消した時に外部化されたチェックポイントに何が起きるかを設定します:

  • ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION: ジョブが取り消された時に外部化されたチェックポイントを維持します。この場合取り消しの後でチェックポイントの状態を手動で掃除する必要があることに注意してください。

  • ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION: ジョブが取り消された時に外部化されたチェックポイントを削除します。チェックポイントの状態はジョブが失敗した時にのみ利用可能でしょう。

Directory Structure

Similarly to savepoints, an externalized checkpoint consists of a meta data file and, depending on the state back-end, some additional data files. The target directory for the externalized checkpoint’s meta data is determined from the configuration key state.checkpoints.dir which, currently, can only be set via the configuration files.

state.checkpoints.dir: hdfs:///checkpoints/

そしてこのディレクトリはチェックポイントを回復するために必要なチェックポイントのメタデータを含むでしょう。For the MemoryStateBackend, this meta data file will be self-contained and no further files are needed.

FsStateBackend and RocksDBStateBackend write separate data files and only write the paths to these files into the meta data file. These data files are stored at the path given to the state back-end during construction.

env.setStateBackend(new RocksDBStateBackend("hdfs:///checkpoints-data/");

Difference to Savepoints

Externalized checkpoints have a few differences from savepoints. They - use a state backend specific (low-level) data format, - may be incremental, - do not support Flink specific features like rescaling.

Resuming from an externalized checkpoint

A job may be resumed from an externalized checkpoint just as from a savepoint by using the checkpoint’s meta data file instead (see the savepoint restore guide). Note that if the meta data file is not self-contained, the jobmanager needs to have access to the data files it refers to (see Directory Structure above).

$ bin/flink run -s :checkpointMetaDataPath [:runArgs]

上に戻る

TOP
inserted by FC2 system