Flink源码剖析之:如何根据JobGraph生成ExecutionGraph
Flink源码剖析之:如何根据JobGraph生成ExecutionGraph在上一篇Flink源码剖析中,我们介绍了Flink如何根据StreamGraph生成JobGraph的流程,并着重分析了其算子链的归并过程和JobGraph的构造流程。
对于StreamGraph和JobGraph的生成来说,其都是在客户端生成的,本文将会报告JobGraph到ExecutionGraph的生成过程,而这一过程会在Flink JobManager的服务端来完成。当JobGraph从客户端提交到JobManager后,JobManager会根据JobGraph生成对应的ExecutionGraph,而ExecutionGraph就是Flink作业调度时使用的焦点数据结构。 本篇将会详细介绍JobGraph转换为ExecutionGraph的流程。
主体流程梳理
Flink在将JobGraph转换成ExecutionGraph后,便可以开始实行真正的任务。这一转换流程主要在Flink源码中的DefaultExecutionGraphBuilder类中的buildGraph方法中实现的。在转换过程中,涉及到了一些新的根本概念,先来简朴介绍一下这些概念,对于明白ExecutionGraph有较大的资助:
[*]ExecutionJobVertex: 在ExecutionGraph中表示实行顶点,与JobGraph中的JobVertex逐一对应。实际上,每个ExecutionJobVertex也是依赖JobVertex来创建的。
[*]ExecutionVertex: 在ExecutionJobVertex类中创建,每个并发度都对应了一个ExecutionVertex对象,每个ExecutionVertex都代表JobVertex在某个特定并行子任务中的实行。在实际实行时,每个ExecutionVertex实际上就是一个Task,是ExecutionJobVertex并行实行的一个子任务。
[*]Execution: Execution表示ExecutionVertex的一次实行。由于ExecutionVertex可以被实行多次(用于恢复、重新计算、重新分配),这个类用于跟踪该ExecutionVertex的单个实行状态和资源。
[*]IntermediateResult: 在JobGraph中用IntermediateDataSet表示上游JobVertex的输出数据流,而在ExecutionGraph中,则用IntermediateResult来表示ExecutionJobVertex的输出数据流。
[*]IntermediateResultPartition:这是IntermediateResult的一部分或一个分片。由于有多个并行任务(ExecutionVertex)实行雷同的操作,每个任务都会产生一部分IntermediateResult。这些结果在物理存储和计算过程中,大概会被进一步划分成多个分区,每个分区对应一个 IntermediateResultPartition对象。
从上面的根本概念也可以看出,在ExecutionGraph中:
[*]相比StreamGraph和JobGraph,ExecutionGraph是实际根据任务并行度来生成拓扑结构的,在ExecutionGraph中,每个并行子任务都对应一个ExecutionVertex顶点和IntermediateResultPartition输出数据流分区。
[*]在ExecutionGraph中,上下游节点之间的连接是通过ExecutionVertex -> IntermediateResultPartition -> ExecutionVertex 对象来完成的。
整体的实行流程图如下所示:
https://i-blog.csdnimg.cn/direct/90780b5fdb4142ec84dc6eb7377fe86f.png
入口方法:DefaultExecutionGraphBuilder.buildGraph
ExecutionGraph的生成是在DefaultExecutionGraphBuilder类的buildGraph方法中实现的:
public class DefaultExecutionGraphBuilder {
public static DefaultExecutionGraph buildGraph(
JobGraph jobGraph,
Configuration jobManagerConfig,
ScheduledExecutorService futureExecutor,
Executor ioExecutor,
ClassLoader classLoader,
CompletedCheckpointStore completedCheckpointStore,
CheckpointsCleaner checkpointsCleaner,
CheckpointIDCounter checkpointIdCounter,
Time rpcTimeout,
MetricGroup metrics,
BlobWriter blobWriter,
Logger log,
ShuffleMaster<?> shuffleMaster,
JobMasterPartitionTracker partitionTracker,
TaskDeploymentDescriptorFactory.PartitionLocationConstraint partitionLocationConstraint,
ExecutionDeploymentListener executionDeploymentListener,
ExecutionStateUpdateListener executionStateUpdateListener,
long initializationTimestamp,
VertexAttemptNumberStore vertexAttemptNumberStore,
VertexParallelismStore vertexParallelismStore)
throws JobExecutionException, JobException {
checkNotNull(jobGraph, "job graph cannot be null");
final String jobName = jobGraph.getName();
final JobID jobId = jobGraph.getJobID();
// 创建JobInformation
final JobInformation jobInformation =
new JobInformation(
jobId,
jobName,
jobGraph.getSerializedExecutionConfig(),
jobGraph.getJobConfiguration(),
jobGraph.getUserJarBlobKeys(),
jobGraph.getClasspaths());
// Execution 保留的最大历史数
final int maxPriorAttemptsHistoryLength =
jobManagerConfig.getInteger(JobManagerOptions.MAX_ATTEMPTS_HISTORY_SIZE);
// IntermediateResultPartitions的释放策略
final PartitionGroupReleaseStrategy.Factory partitionGroupReleaseStrategyFactory =
PartitionGroupReleaseStrategyFactoryLoader.loadPartitionGroupReleaseStrategyFactory(
jobManagerConfig);
// create a new execution graph, if none exists so far
final DefaultExecutionGraph executionGraph;
try {
// 创建默认的ExecutionGraph执行图对象,最后会返回该创建好的执行图对象
executionGraph =
new DefaultExecutionGraph(
jobInformation,
futureExecutor,
ioExecutor,
rpcTimeout,
maxPriorAttemptsHistoryLength,
classLoader,
blobWriter,
partitionGroupReleaseStrategyFactory,
shuffleMaster,
partitionTracker,
partitionLocationConstraint,
executionDeploymentListener,
executionStateUpdateListener,
initializationTimestamp,
vertexAttemptNumberStore,
vertexParallelismStore);
} catch (IOException e) {
throw new JobException("Could not create the ExecutionGraph.", e);
}
// set the basic properties
try {
executionGraph.setJsonPlan(JsonPlanGenerator.generatePlan(jobGraph));
} catch (Throwable t) {
log.warn("Cannot create JSON plan for job", t);
// give the graph an empty plan
executionGraph.setJsonPlan("{}");
}
// initialize the vertices that have a master initialization hook
// file output formats create directories here, input formats create splits
final long initMasterStart = System.nanoTime();
log.info("Running initialization on master for job {} ({}).", jobName, jobId);
for (JobVertex vertex : jobGraph.getVertices()) {
String executableClass = vertex.getInvokableClassName();
if (executableClass == null || executableClass.isEmpty()) {
throw new JobSubmissionException(
jobId,
"The vertex "
+ vertex.getID()
+ " ("
+ vertex.getName()
+ ") has no invokable class.");
}
try {
vertex.initializeOnMaster(classLoader);
} catch (Throwable t) {
throw new JobExecutionException(
jobId,
"Cannot initialize task '" + vertex.getName() + "': " + t.getMessage(),
t);
}
}
log.info(
"Successfully ran initialization on master in {} ms.",
(System.nanoTime() - initMasterStart) / 1_000_000);
// topologically sort the job vertices and attach the graph to the existing one
// 这里会先做一个排序,source源节点会放在最前面,接着开始遍历
// 必须保证当前添加到集合的节点的前置节点都已经添加进去了
List<JobVertex> sortedTopology = jobGraph.getVerticesSortedTopologicallyFromSources();
if (log.isDebugEnabled()) {
log.debug(
"Adding {} vertices from job graph {} ({}).",
sortedTopology.size(),
jobName,
jobId);
}
// 构建执行图的重点方法。生成具体的ExecutionGraph
executionGraph.attachJobGraph(sortedTopology);
if (log.isDebugEnabled()) {
log.debug(
"Successfully created execution graph from job graph {} ({}).", jobName, jobId);
}
// configure the state checkpointing
// checkpoint的相关配置
if (isCheckpointingEnabled(jobGraph)) {
JobCheckpointingSettings snapshotSettings = jobGraph.getCheckpointingSettings();
// Maximum number of remembered checkpoints
int historySize = jobManagerConfig.getInteger(WebOptions.CHECKPOINTS_HISTORY_SIZE);
CheckpointStatsTracker checkpointStatsTracker =
new CheckpointStatsTracker(
historySize,
snapshotSettings.getCheckpointCoordinatorConfiguration(),
metrics);
// load the state backend from the application settings
final StateBackend applicationConfiguredBackend;
final SerializedValue<StateBackend> serializedAppConfigured =
snapshotSettings.getDefaultStateBackend();
if (serializedAppConfigured == null) {
applicationConfiguredBackend = null;
} else {
try {
applicationConfiguredBackend =
serializedAppConfigured.deserializeValue(classLoader);
} catch (IOException | ClassNotFoundException e) {
throw new JobExecutionException(
jobId, "Could not deserialize application-defined state backend.", e);
}
}
// StateBackend配置
final StateBackend rootBackend;
try {
rootBackend =
StateBackendLoader.fromApplicationOrConfigOrDefault(
applicationConfiguredBackend,
snapshotSettings.isChangelogStateBackendEnabled(),
jobManagerConfig,
classLoader,
log);
} catch (IllegalConfigurationException | IOException | DynamicCodeLoadingException e) {
throw new JobExecutionException(
jobId, "Could not instantiate configured state backend", e);
}
// load the checkpoint storage from the application settings
final CheckpointStorage applicationConfiguredStorage;
final SerializedValue<CheckpointStorage> serializedAppConfiguredStorage =
snapshotSettings.getDefaultCheckpointStorage();
if (serializedAppConfiguredStorage == null) {
applicationConfiguredStorage = null;
} else {
try {
applicationConfiguredStorage =
serializedAppConfiguredStorage.deserializeValue(classLoader);
} catch (IOException | ClassNotFoundException e) {
throw new JobExecutionException(
jobId,
"Could not deserialize application-defined checkpoint storage.",
e);
}
}
final CheckpointStorage rootStorage;
try {
rootStorage =
CheckpointStorageLoader.load(
applicationConfiguredStorage,
null,
rootBackend,
jobManagerConfig,
classLoader,
log);
} catch (IllegalConfigurationException | DynamicCodeLoadingException e) {
throw new JobExecutionException(
jobId, "Could not instantiate configured checkpoint storage", e);
}
// instantiate the user-defined checkpoint hooks
// 示例化用户自定义的cp hook
final SerializedValue<MasterTriggerRestoreHook.Factory[]> serializedHooks =
snapshotSettings.getMasterHooks();
final List<MasterTriggerRestoreHook<?>> hooks;
if (serializedHooks == null) {
hooks = Collections.emptyList();
} else {
final MasterTriggerRestoreHook.Factory[] hookFactories;
try {
hookFactories = serializedHooks.deserializeValue(classLoader);
} catch (IOException | ClassNotFoundException e) {
throw new JobExecutionException(
jobId, "Could not instantiate user-defined checkpoint hooks", e);
}
final Thread thread = Thread.currentThread();
final ClassLoader originalClassLoader = thread.getContextClassLoader();
thread.setContextClassLoader(classLoader);
try {
hooks = new ArrayList<>(hookFactories.length);
for (MasterTriggerRestoreHook.Factory factory : hookFactories) {
hooks.add(MasterHooks.wrapHook(factory.create(), classLoader));
}
} finally {
thread.setContextClassLoader(originalClassLoader);
}
}
final CheckpointCoordinatorConfiguration chkConfig =
snapshotSettings.getCheckpointCoordinatorConfiguration();
// 创建CheckpointCoordinator对象
executionGraph.enableCheckpointing(
chkConfig,
hooks,
checkpointIdCounter,
completedCheckpointStore,
rootBackend,
rootStorage,
checkpointStatsTracker,
checkpointsCleaner);
}
// create all the metrics for the Execution Graph
// 添加metrics指标
metrics.gauge(RestartTimeGauge.METRIC_NAME, new RestartTimeGauge(executionGraph));
metrics.gauge(DownTimeGauge.METRIC_NAME, new DownTimeGauge(executionGraph));
metrics.gauge(UpTimeGauge.METRIC_NAME, new UpTimeGauge(executionGraph));
return executionGraph;
}
在这个方法里,会先创建一个 ExecutionGraph 对象,然后对 JobGraph 中的 JobVertex 列表做一下排序(先把有 source 节点的 JobVertex 放在最前面,然后开始遍历,只有当前 JobVertex 的前置节点都已经添加到集合后才华把当前 JobVertex 节点添加到集合中),最后通过过 attachJobGraph() 方法生成详细的ExecutionGraph。
在上面的代码中,最需要焦点关注的方法是:executionGraph.attachJobGraph(sortedTopology);。该方法是创建ExecutionGraph的焦点方法,包括了创建上面我们说的各种ExecutionGraph中涉及的对象,以及连接它们来形成ExecutionGraph拓扑结构。
接下来我们进入该方法来一探究竟。
生成ExecutionGraph:attachJobGraph
先来看下attachJobGraph方法的实现:
public void attachJobGraph(List<JobVertex> topologicallySorted) throws JobException {
assertRunningInJobMasterMainThread();
LOG.debug(
"Attaching {} topologically sorted vertices to existing job graph with {} "
+ "vertices and {} intermediate results.",
topologicallySorted.size(),
tasks.size(),
intermediateResults.size());
final long createTimestamp = System.currentTimeMillis();
// 遍历排序好的拓扑JobVertex
for (JobVertex jobVertex : topologicallySorted) {
if (jobVertex.isInputVertex() && !jobVertex.isStoppable()) {
this.isStoppable = false;
}
// 获取节点并行度信息
VertexParallelismInformation parallelismInfo =
parallelismStore.getParallelismInfo(jobVertex.getID());
// create the execution job vertex and attach it to the graph
// 创建ExecutionJobVertex
ExecutionJobVertex ejv =
new ExecutionJobVertex(
this,
jobVertex,
maxPriorAttemptsHistoryLength,
rpcTimeout,
createTimestamp,
parallelismInfo,
initialAttemptCounts.getAttemptCounts(jobVertex.getID()));
// 重要方法!!!
// 构建ExecutionGraph,连接上下游节点
ejv.connectToPredecessors(this.intermediateResults);
ExecutionJobVertex previousTask = this.tasks.putIfAbsent(jobVertex.getID(), ejv);
if (previousTask != null) {
throw new JobException(
String.format(
"Encountered two job vertices with ID %s : previous=[%s] / new=[%s]",
jobVertex.getID(), ejv, previousTask));
}
// 遍历ExecutionJobVertex的输出IntermediateResult
for (IntermediateResult res : ejv.getProducedDataSets()) {
IntermediateResult previousDataSet =
this.intermediateResults.putIfAbsent(res.getId(), res);
if (previousDataSet != null) {
throw new JobException(
String.format(
"Encountered two intermediate data set with ID %s : previous=[%s] / new=[%s]",
res.getId(), res, previousDataSet));
}
}
this.verticesInCreationOrder.add(ejv);
this.numVerticesTotal += ejv.getParallelism();
}
//将所有的执行顶点和结果分区注册到分布式资源管理系统中,以便能够进行分布式调度。
registerExecutionVerticesAndResultPartitions(this.verticesInCreationOrder);
// the topology assigning should happen before notifying new vertices to failoverStrategy
// 转换执行拓扑
executionTopology = DefaultExecutionTopology.fromExecutionGraph(this);
partitionGroupReleaseStrategy =
// 创建部分组释放策略的方法,依赖于当前的调度的拓扑结构,这决定了当何时释放特定的中间数据结果所需的策略。
partitionGroupReleaseStrategyFactory.createInstance(getSchedulingTopology());
}
在上面attchGraph方法中,起首遍历输入的排序后的JobVertex列表,对每一个JobVertex:
[*]判断是否停止: 对于单个 JobVertex,如果它是一个输入顶点且不可停止,则整个 Job 不可停止。这在流处理任务中是常见的,一些输入数据源大概无法停止(如Kafka)。
[*]获取并行信息并创建实行的顶点: 根据JobVertex的ID,从parallelismStore中获取并行信息。利用这些信息创建ExecutionJobVertex实例,它代表运行在特定TaskManager上的taskId,可以是待调度、运行或已完成的。
[*]判断新添加的顶点是否已经存在: 如果试图添加一个已经存在的顶点,这意味着存在程序错误,因为每个JobVertex应当有唯一的ID。这将抛出异常。
[*]判断数据集是否已经存在: 同样。如果试图添加一个已经存在的IntermediateResult,这将抛出异常。
[*]添加实行顶点到创建顺序列表和增加总的顶点数量: 记载创建顶点的顺序能够确保在实行时能够按照精确的依赖关系举行。并同时更新总的顶点数量。
遍历完成后, 注册实行顶点和结果分区,将所有的实行顶点和结果分区注册到分布式资源管理体系中,以便能够举行分布式调度。
利用DefaultExecutionTopology工具类将ExecutionGraph转换为SchedulingTopology,如许便于任务调度器举行处理。
最后,调用partitionGroupReleaseStrategyFactory.createInstance(getSchedulingTopology())根据当前的调度的拓扑结构来创建组开释策略,这决定了当何时开释特定的中心数据结果所需的策略。
上面流程中,最需要关注的方法就是new ExecutionJobVertex和ejv.connectToPredecessors(this.intermediateResults);
接下来,我们分别对其举行探究。
创建 ExecutionJobVertex 对象
进入到该方法的源码中:
@VisibleForTesting
public ExecutionJobVertex(
InternalExecutionGraphAccessor graph,
JobVertex jobVertex,
int maxPriorAttemptsHistoryLength,
Time timeout,
long createTimestamp,
VertexParallelismInformation parallelismInfo,
SubtaskAttemptNumberStore initialAttemptCounts)
throws JobException {
if (graph == null || jobVertex == null) {
throw new NullPointerException();
}
this.graph = graph;
this.jobVertex = jobVertex;
this.parallelismInfo = parallelismInfo;
// verify that our parallelism is not higher than the maximum parallelism
if (this.parallelismInfo.getParallelism() > this.parallelismInfo.getMaxParallelism()) {
throw new JobException(
String.format(
"Vertex %s's parallelism (%s) is higher than the max parallelism (%s). Please lower the parallelism or increase the max parallelism.",
jobVertex.getName(),
this.parallelismInfo.getParallelism(),
this.parallelismInfo.getMaxParallelism()));
}
this.resourceProfile =
ResourceProfile.fromResourceSpec(jobVertex.getMinResources(), MemorySize.ZERO);
this.taskVertices = new ExecutionVertex;
this.inputs = new ArrayList<>(jobVertex.getInputs().size());
// take the sharing group
this.slotSharingGroup = checkNotNull(jobVertex.getSlotSharingGroup());
this.coLocationGroup = jobVertex.getCoLocationGroup();
// create the intermediate results
this.producedDataSets =
new IntermediateResult;
for (int i = 0; i < jobVertex.getProducedDataSets().size(); i++) {
final IntermediateDataSet result = jobVertex.getProducedDataSets().get(i);
this.producedDataSets =
new IntermediateResult(
result.getId(),
this,
this.parallelismInfo.getParallelism(),
result.getResultType());
}
// create all task vertices
for (int i = 0; i < this.parallelismInfo.getParallelism(); i++) {
ExecutionVertex vertex =
new ExecutionVertex(
this,
i,
producedDataSets,
timeout,
createTimestamp,
maxPriorAttemptsHistoryLength,
initialAttemptCounts.getAttemptCount(i));
this.taskVertices = vertex;
}
// sanity check for the double referencing between intermediate result partitions and
// execution vertices
for (IntermediateResult ir : this.producedDataSets) {
if (ir.getNumberOfAssignedPartitions() != this.parallelismInfo.getParallelism()) {
throw new RuntimeException(
"The intermediate result's partitions were not correctly assigned.");
}
}
final List<SerializedValue<OperatorCoordinator.Provider>> coordinatorProviders =
getJobVertex().getOperatorCoordinators();
if (coordinatorProviders.isEmpty()) {
this.operatorCoordinators = Collections.emptyList();
} else {
final ArrayList<OperatorCoordinatorHolder> coordinators =
new ArrayList<>(coordinatorProviders.size());
try {
for (final SerializedValue<OperatorCoordinator.Provider> provider :
coordinatorProviders) {
coordinators.add(
OperatorCoordinatorHolder.create(
provider, this, graph.getUserClassLoader()));
}
} catch (Exception | LinkageError e) {
IOUtils.closeAllQuietly(coordinators);
throw new JobException(
"Cannot instantiate the coordinator for operator " + getName(), e);
}
this.operatorCoordinators = Collections.unmodifiableList(coordinators);
}
// set up the input splits, if the vertex has any
try {
@SuppressWarnings("unchecked")
InputSplitSource<InputSplit> splitSource =
(InputSplitSource<InputSplit>) jobVertex.getInputSplitSource();
if (splitSource != null) {
Thread currentThread = Thread.currentThread();
ClassLoader oldContextClassLoader = currentThread.getContextClassLoader();
currentThread.setContextClassLoader(graph.getUserClassLoader());
try {
inputSplits =
splitSource.createInputSplits(this.parallelismInfo.getParallelism());
if (inputSplits != null) {
splitAssigner = splitSource.getInputSplitAssigner(inputSplits);
}
} finally {
currentThread.setContextClassLoader(oldContextClassLoader);
}
} else {
inputSplits = null;
}
} catch (Throwable t) {
throw new JobException(
"Creating the input splits caused an error: " + t.getMessage(), t);
}
}
在上面这段代码中,主要实现了ExecutionVertex的创建和IntermediateResult对象的创建:
[*]遍历当前JobVertex的输出IntermediateDataSet列表,并根据IntermediateDataSet来创建相应的IntermediateResult对象。每个IntermediateDataSet都会对应一个IntermediateResult。
[*]根据当前JobVertex的并发度,来创建雷同数量的ExecutionVertex对象,每个ExecutionVertex对象代表一个并行计算任务,在实际实行时就是一个Task任务。
创建ExecutionVertex对象
进一步地,我们观察创建ExecutionVertex对象的实现逻辑如下所示:
public ExecutionVertex(
ExecutionJobVertex jobVertex,
int subTaskIndex,
IntermediateResult[] producedDataSets,
Time timeout,
long createTimestamp,
int maxPriorExecutionHistoryLength,
int initialAttemptCount) {
this.jobVertex = jobVertex;
this.subTaskIndex = subTaskIndex;
this.executionVertexId = new ExecutionVertexID(jobVertex.getJobVertexId(), subTaskIndex);
this.taskNameWithSubtask =
String.format(
"%s (%d/%d)",
jobVertex.getJobVertex().getName(),
subTaskIndex + 1,
jobVertex.getParallelism());
this.resultPartitions = new LinkedHashMap<>(producedDataSets.length, 1);
// 根据IntermediateResult创建当前subTaskIndex分区下的IntermediateResultPartiton
for (IntermediateResult result : producedDataSets) {
IntermediateResultPartition irp =
new IntermediateResultPartition(
result,
this,
subTaskIndex,
getExecutionGraphAccessor().getEdgeManager());
// 记录当前分区的irp到ir中
result.setPartition(subTaskIndex, irp);
// 记录分区ip与irp的对应关系
resultPartitions.put(irp.getPartitionId(), irp);
}
this.priorExecutions = new EvictingBoundedList<>(maxPriorExecutionHistoryLength);
// 创建对应的Execution对象,初始化时initialAttempCount为0,如果后面重新调度这个task,它会自增加1
this.currentExecution =
new Execution(
getExecutionGraphAccessor().getFutureExecutor(),
this,
initialAttemptCount,
createTimestamp,
timeout);
getExecutionGraphAccessor().registerExecution(currentExecution);
this.timeout = timeout;
this.inputSplits = new ArrayList<>();
}
上述创建ExecutionVertex的过程主要实现了以下步骤:
[*]生成中心结果分区IntermediateResultPartition
中心结果分区代表一个并行任务产生的输出,同一并行任务大概会有多个输出(对应多个后续任务),也就是多个中心结果分区。
[*]基于 result,在相应的索引 subTaskIndex 上创建一个 IntermediateResultPartition 并给它赋值。IntermediateResultPartition 提供了并行任务的输出数据,对应于某个特定实行顶点 ExecutionVertex 的并行子任务。
[*]在创建过程中,需要使用 getExecutionGraphAccessor().getEdgeManager() 获取边管理器,边管理器是用于维护这个分区与其它 ExecutionVertex 之间的连接关系。
[*]记载这个 IntermediateResultPartition 到 result 中的相应索引位置,并在 resultPartitions 映射表中保存 IntermediateResultPartition。
[*]创建实行(Execution)对象:
这一过程是基于 Execution 的构造函数引发的。它用于代表该 ExecutionVertex 在某一特定点时间的一次尝试实行。创建 Execution 实例后,会将其注册到实行图(ExecutionGraph)中,以便于后续调度和实行任务。
通过以上流程,生成了中心结果分区,映射了每一个分区和其对应的任务关系,并且创建了 Execution 对象用于管理并跟踪任务的实行状态。
在创建好ExecutionVertex和IntermediateResultPartition后,根据上面的流程图,就是考虑如何将它们举行连接生成ExecutionGraph了。
这部分的实现逻辑就在attachJobGraph方法的ejv.connectToPredecessors(this.intermediateResults);方法中实现的。
生成ExecutionGraph
同样地,我们进入源码来深入观察一下实现逻辑:
public void connectToPredecessors(
Map<IntermediateDataSetID, IntermediateResult> intermediateDataSets)
throws JobException {
List<JobEdge> inputs = jobVertex.getInputs();
if (LOG.isDebugEnabled()) {
LOG.debug(
String.format(
"Connecting ExecutionJobVertex %s (%s) to %d predecessors.",
jobVertex.getID(), jobVertex.getName(), inputs.size()));
}
for (int num = 0; num < inputs.size(); num++) {
JobEdge edge = inputs.get(num);
if (LOG.isDebugEnabled()) {
if (edge.getSource() == null) {
LOG.debug(
String.format(
"Connecting input %d of vertex %s (%s) to intermediate result referenced via ID %s.",
num,
jobVertex.getID(),
jobVertex.getName(),
edge.getSourceId()));
} else {
LOG.debug(
String.format(
"Connecting input %d of vertex %s (%s) to intermediate result referenced via predecessor %s (%s).",
num,
jobVertex.getID(),
jobVertex.getName(),
edge.getSource().getProducer().getID(),
edge.getSource().getProducer().getName()));
}
}
// fetch the intermediate result via ID. if it does not exist, then it either has not
// been created, or the order
// in which this method is called for the job vertices is not a topological order
IntermediateResult ires = intermediateDataSets.get(edge.getSourceId());
if (ires == null) {
throw new JobException(
"Cannot connect this job graph to the previous graph. No previous intermediate result found for ID "
+ edge.getSourceId());
}
this.inputs.add(ires);
EdgeManagerBuildUtil.connectVertexToResult(this, ires, edge.getDistributionPattern());
}
}
这段代码主要完成了将当前的ExecutionJobVertex与其前置任务(predecessors)连接的流程。传入的参数intermediateDatasets包罗了JobGraph中所有的中心计算结果,这些结果是由上游前置任务产生的。
需要注意的是,该过程要求连接操作的实行顺序应遵循任务的拓扑顺序。Flink的计算任务通常由多个阶段构成,每个阶段的输出是下一个阶段的输入,每个阶段(JobVertex)都处理一种类型的计算,例如map或reduce。
流程大致如下:
[*]获取输入: 起首获取jobVertex的输入,输入是JobEdge列表,每一条JobEdge都代表一个上游产生的中心数据集和连接上下游的方式(例如HASH, BROADCAST)。
[*]循环处理每个输入: 然后遍历这些inputs,对于每一条JobEdge:
[*]基于edge.getSourceId()从intermediateDatasets获取IntermediateResult,这是一个中心计算结果。
[*]查抄该中心结果是否存在,如果不存在,则表示这不是一个拓扑排序,因为预期的情况是当你尝试访问一个中心结果时,它应该已经被创建了。如果找不到,抛出一个异常。
[*]如果存在(没有异常被抛出),将找到的IntermediateResult添加到ExecutionJobVertex的inputs(List类型)中,如许当前任务就知道它的输入来自哪些中心结果。
[*]调用EdgeManagerBuildUtil.connectVertexToResult方法来建立当前ExecutionJobVertex与找到的IntermediateResult之间的连接。 EdgeManager是Flink中负责管理输入输出边的组件,它表现地记载了发送端的分区和吸收端的分区对应关系。
这个流程紧张的是建立了Job中每个任务的实行依赖关系,并明确了数据传输的方式,让任务在实行时清楚本身的输入来自哪里,当任务实行完成后,它产生的输出会通过何种方式被发送到哪些任务。
详细的连接方式,我们需要继承进入到EdgeManagerBuildUtil.connectVertexToResult方法中。其源码如下所示:
/**
* Calculate the connections between {@link ExecutionJobVertex} and {@link IntermediateResult} *
* based on the {@link DistributionPattern}.
*
* @param vertex the downstream consumer {@link ExecutionJobVertex}
* @param intermediateResult the upstream consumed {@link IntermediateResult}
* @param distributionPattern the {@link DistributionPattern} of the edge that connects the
* upstream {@link IntermediateResult} and the downstream {@link IntermediateResult}
*/
static void connectVertexToResult(
ExecutionJobVertex vertex,
IntermediateResult intermediateResult,
DistributionPattern distributionPattern) {
switch (distributionPattern) {
// 点对点的连接方式
case POINTWISE:
connectPointwise(vertex.getTaskVertices(), intermediateResult);
break;
// 全连接的方式
case ALL_TO_ALL:
connectAllToAll(vertex.getTaskVertices(), intermediateResult);
break;
default:
throw new IllegalArgumentException("Unrecognized distribution pattern.");
}
}
会根据DistributionPattern选择不同的连接方式,这里主要分两种情况,DistributionPattern是跟Partitioner的配置有关。
这里以POINTWISE的连接方式来举例,看一下其是如安在构造ExecutionGraph时连接上下游节点的。
private static void connectPointwise(
ExecutionVertex[] taskVertices, IntermediateResult intermediateResult) {
final int sourceCount = intermediateResult.getPartitions().length;
final int targetCount = taskVertices.length;
if (sourceCount == targetCount) {
for (int i = 0; i < sourceCount; i++) {
ExecutionVertex executionVertex = taskVertices;
IntermediateResultPartition partition = intermediateResult.getPartitions();
ConsumerVertexGroup consumerVertexGroup =
ConsumerVertexGroup.fromSingleVertex(executionVertex.getID());
partition.addConsumers(consumerVertexGroup);
ConsumedPartitionGroup consumedPartitionGroup =
createAndRegisterConsumedPartitionGroupToEdgeManager(
partition.getPartitionId(), intermediateResult);
executionVertex.addConsumedPartitionGroup(consumedPartitionGroup);
}
} else if (sourceCount > targetCount) {
for (int index = 0; index < targetCount; index++) {
ExecutionVertex executionVertex = taskVertices;
ConsumerVertexGroup consumerVertexGroup =
ConsumerVertexGroup.fromSingleVertex(executionVertex.getID());
int start = index * sourceCount / targetCount;
int end = (index + 1) * sourceCount / targetCount;
List<IntermediateResultPartitionID> consumedPartitions =
new ArrayList<>(end - start);
for (int i = start; i < end; i++) {
IntermediateResultPartition partition = intermediateResult.getPartitions();
partition.addConsumers(consumerVertexGroup);
consumedPartitions.add(partition.getPartitionId());
}
ConsumedPartitionGroup consumedPartitionGroup =
createAndRegisterConsumedPartitionGroupToEdgeManager(
consumedPartitions, intermediateResult);
executionVertex.addConsumedPartitionGroup(consumedPartitionGroup);
}
} else {
for (int partitionNum = 0; partitionNum < sourceCount; partitionNum++) {
IntermediateResultPartition partition =
intermediateResult.getPartitions();
ConsumedPartitionGroup consumedPartitionGroup =
createAndRegisterConsumedPartitionGroupToEdgeManager(
partition.getPartitionId(), intermediateResult);
int start = (partitionNum * targetCount + sourceCount - 1) / sourceCount;
int end = ((partitionNum + 1) * targetCount + sourceCount - 1) / sourceCount;
List<ExecutionVertexID> consumers = new ArrayList<>(end - start);
for (int i = start; i < end; i++) {
ExecutionVertex executionVertex = taskVertices;
executionVertex.addConsumedPartitionGroup(consumedPartitionGroup);
consumers.add(executionVertex.getID());
}
ConsumerVertexGroup consumerVertexGroup =
ConsumerVertexGroup.fromMultipleVertices(consumers);
partition.addConsumers(consumerVertexGroup);
}
}
}
上面这段代码的目标是通过“点对点”的方式(即每个生产者产生的数据只被一个消费者消费)建立任务节点(ExecutionVertex)与中心结果集(IntermediateResultPartition)之间的连接关系。
这个方法的逻辑主要是根据上游任务产生的IntermediateResultPartition的数量(源)和下游ExecutionVertex节点数量(目标)的比例关系,做不同的操作:
[*]源和目标数量相等:方法会将每个源中心结果分区与对应的下游ExecutionVertex节点连接。这种情况下,每个任务都完全独立,只会消费一个特定的上游中心结果分区
[*]源数量大于目标数量:源中心结果分区会被尽大概平均地分配给下游ExecutionVertex节点,即每个ExecutionVertex大概会消费多个源中心结果分区数据。
[*]源数量小于目标数量:每个源中心结果分区大概会被分配给多个下游ExecutionVertex节点消费,即多个ExecutionVertex节点大概消费同一个源中心结果分区数据。
⠀在实行连接的过程中,会创建ConsumerVertexGroup和ConsumedPartitionGroup:
[*]ConsumerVertexGroup包罗一组吸收同一个中心结果分区(IntermediateResultPartition)的顶点集合。
[*]ConsumedPartitionGroup包罗顶点要消费的一组中心结果分区。
⠀注意,当源数量小于目标数量时,会有多个任务消费同一个源数据,以是需要使用ConsumerVertexGroup.fromMultipleVertices(consumers)来创建ConsumerVertexGroup。
几种连接情况的示例图如下所示:
https://i-blog.csdnimg.cn/direct/ecfdb4c7a20c44c6b049a7ed2cbca56e.png
到这里,这个作业的 ExecutionGraph 就创建完成了,有了 ExecutionGraph,JobManager 才华对这个作业做相应的调度。
总结
本文详细介绍了JobGraph生成ExecutionGraph的流程,介绍了ExecutionJobVertex、ExecutionVertex、IntermediateResult、IntermediateResultPartition相干概念的原理和生成过程。最后我们介绍了Flink在生成ExecutionGraph时是如何实现IntermediateResultPartition和ExecutionVertex的连接的。
到这里,StreamGraph、JobGraph和Execution的生成过程,在近来的三篇文章中都已经详细解说完成了,当然除了我们介绍的内容外,另有更多的实现细节没有介绍,有兴趣的读者可以参考文本来阅读源码,以此加深本身的明白和对更多实现细节的挖掘。
最后,再对StreamGraph、JobGraph和ExecutionGraph做一个总结:
[*]StreamGraph. StreamGraph 是表示 Flink 流计算的图模子,它是用户界说的计算逻辑的内部表示形式,是最原始的用户逻辑,一个没有做任何优化的DataFlow;
[*]JobGraph. JobGraph 由一个或多个 JobVertex 对象和它们之间的 JobEdge 对象构成,包罗并行任务的信息。在JobGraph中对StreamGraph举行了优化,将能够归并在同个算子链中的操作符举行归并,以此减少任务实行时的上下文切换,提任务实行性能。
[*]ExecutionGraph. ExecutionGraph是 JobGraph 的并发实行版本,由 ExecutionVertex 和 IntermediateResultPartition 构成。每个 JobVertex 会被转换为一个或多个 ExecutionVertex,ExecutorGraph 包罗了每个任务的全部实例,包罗它们的状态、位置、输入输出结果。ExecutionGraph 是 Flink 中最焦点的部分,它用于任务的调度、失败恢复等。
参考:
https://matt33.com/2019/12/20/flink-execution-graph-4/
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。
页:
[1]