使用java远程提交flink任务到yarn集群
背景
由于业务需要,使用命令行的方式提交flink任务比较麻烦,要么将后端任务部署到大数据集群,要么弄一个提交机,感觉都不是很离线。颠末一些调研,发现可以实现远程的任务发布。接下来就记录一下实现过程。这里用flink on yarn 的Application模式实现
情况预备
- 大数据集群,只要有hadoop就行
- 后端服务器,linux mac都行,windows不行
正式开始
1. 上传flink jar包到hdfs
去flink官网下载你需要的版本,我这里用的是flink-1.18.1,把flink lib目次下的jar包传到hdfs中。
其中flink-yarn-1.18.1.jar需要各人本身去maven仓库下载。
2. 编写一段flink代码
随便写一段flink代码就行,我们目的是测试
- package com.azt;
- import org.apache.flink.streaming.api.datastream.DataStreamSource;
- import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
- import org.apache.flink.streaming.api.functions.source.SourceFunction;
- import java.util.Random;
- import java.util.concurrent.TimeUnit;
- public class WordCount {
- public static void main(String[] args) throws Exception {
- StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
- DataStreamSource<String> source = env.addSource(new SourceFunction<String>() {
- @Override
- public void run(SourceContext<String> ctx) throws Exception {
- String[] words = {"spark", "flink", "hadoop", "hdfs", "yarn"};
- Random random = new Random();
- while (true) {
- ctx.collect(words[random.nextInt(words.length)]);
- TimeUnit.SECONDS.sleep(1);
- }
- }
- @Override
- public void cancel() {
- }
- });
- source.print();
- env.execute();
- }
- }
复制代码 3. 打包第二步的代码,上传到hdfs
4. 拷贝配置文件
- 拷贝flink conf下的所有文件到java项目的resource中
- 拷贝hadoop配置文件到到java项目的resource中
具体看截图
5. 编写java远程提交任务的程序
这一步有个留意的地方就是,如果你跟我一样是windows电脑,那么本地用idea提交会报错;如果你是mac或者linux,那么可以直接在idea中提交任务。
- package com.test;
- import org.apache.flink.client.deployment.ClusterDeploymentException;
- import org.apache.flink.client.deployment.ClusterSpecification;
- import org.apache.flink.client.deployment.application.ApplicationConfiguration;
- import org.apache.flink.client.program.ClusterClient;
- import org.apache.flink.client.program.ClusterClientProvider;
- import org.apache.flink.configuration.*;
- import org.apache.flink.runtime.client.JobStatusMessage;
- import org.apache.flink.yarn.YarnClientYarnClusterInformationRetriever;
- import org.apache.flink.yarn.YarnClusterDescriptor;
- import org.apache.flink.yarn.YarnClusterInformationRetriever;
- import org.apache.flink.yarn.configuration.YarnConfigOptions;
- import org.apache.flink.yarn.configuration.YarnDeploymentTarget;
- import org.apache.flink.yarn.configuration.YarnLogConfigUtil;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.yarn.api.records.ApplicationId;
- import org.apache.hadoop.yarn.client.api.YarnClient;
- import org.apache.hadoop.yarn.conf.YarnConfiguration;
- import java.util.ArrayList;
- import java.util.Collection;
- import java.util.Collections;
- import java.util.List;
- import java.util.concurrent.CompletableFuture;
- import static org.apache.flink.configuration.MemorySize.MemoryUnit.MEGA_BYTES;
- /**
- * @date :2021/5/12 7:16 下午
- */
- public class Main {
- public static void main(String[] args) throws Exception {
- ///home/root/flink/lib/lib
- System.setProperty("HADOOP_USER_NAME","root");
- // String configurationDirectory = "C:\\project\\test_flink_mode\\src\\main\\resources\\conf";
- String configurationDirectory = "/export/server/flink-1.18.1/conf";
- org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
- conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem");
- conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
- String flinkLibs = "hdfs://node1.itcast.cn/flink/lib";
- String userJarPath = "hdfs://node1.itcast.cn/flink/user-lib/original.jar";
- String flinkDistJar = "hdfs://node1.itcast.cn/flink/lib/flink-yarn-1.18.1.jar";
- YarnClient yarnClient = YarnClient.createYarnClient();
- YarnConfiguration yarnConfiguration = new YarnConfiguration();
- yarnClient.init(yarnConfiguration);
- yarnClient.start();
- YarnClusterInformationRetriever clusterInformationRetriever = YarnClientYarnClusterInformationRetriever
- .create(yarnClient);
- //获取flink的配置
- Configuration flinkConfiguration = GlobalConfiguration.loadConfiguration(
- configurationDirectory);
- flinkConfiguration.set(CheckpointingOptions.INCREMENTAL_CHECKPOINTS, true);
- flinkConfiguration.set(
- PipelineOptions.JARS,
- Collections.singletonList(
- userJarPath));
- YarnLogConfigUtil.setLogConfigFileInConfig(flinkConfiguration,configurationDirectory);
- Path remoteLib = new Path(flinkLibs);
- flinkConfiguration.set(
- YarnConfigOptions.PROVIDED_LIB_DIRS,
- Collections.singletonList(remoteLib.toString()));
- flinkConfiguration.set(
- YarnConfigOptions.FLINK_DIST_JAR,
- flinkDistJar);
- //设置为application模式
- flinkConfiguration.set(
- DeploymentOptions.TARGET,
- YarnDeploymentTarget.APPLICATION.getName());
- //yarn application name
- flinkConfiguration.set(YarnConfigOptions.APPLICATION_NAME, "jobname");
- //设置配置,可以设置很多
- flinkConfiguration.set(JobManagerOptions.TOTAL_PROCESS_MEMORY, MemorySize.parse("1024",MEGA_BYTES));
- flinkConfiguration.set(TaskManagerOptions.TOTAL_PROCESS_MEMORY, MemorySize.parse("1024",MEGA_BYTES));
- flinkConfiguration.set(TaskManagerOptions.NUM_TASK_SLOTS, 4);
- flinkConfiguration.setInteger("parallelism.default", 4);
- ClusterSpecification clusterSpecification = new ClusterSpecification.ClusterSpecificationBuilder()
- .createClusterSpecification();
- // 设置用户jar的参数和主类
- ApplicationConfiguration appConfig = new ApplicationConfiguration(args,"com.azt.WordCount");
- YarnClusterDescriptor yarnClusterDescriptor = new YarnClusterDescriptor(
- flinkConfiguration,
- yarnConfiguration,
- yarnClient,
- clusterInformationRetriever,
- true);
- ClusterClientProvider<ApplicationId> clusterClientProvider = null;
- try {
- clusterClientProvider = yarnClusterDescriptor.deployApplicationCluster(
- clusterSpecification,
- appConfig);
- } catch (ClusterDeploymentException e){
- e.printStackTrace();
- }
- ClusterClient<ApplicationId> clusterClient = clusterClientProvider.getClusterClient();
- System.out.println(clusterClient.getWebInterfaceURL());
- ApplicationId applicationId = clusterClient.getClusterId();
- System.out.println(applicationId);
- Collection<JobStatusMessage> jobStatusMessages = clusterClient.listJobs().get();
- int counts = 30;
- while (jobStatusMessages.size() == 0 && counts > 0) {
- Thread.sleep(1000);
- counts--;
- jobStatusMessages = clusterClient.listJobs().get();
- if (jobStatusMessages.size() > 0) {
- break;
- }
- }
- if (jobStatusMessages.size() > 0) {
- List<String> jids = new ArrayList<>();
- for (JobStatusMessage jobStatusMessage : jobStatusMessages) {
- jids.add(jobStatusMessage.getJobId().toHexString());
- }
- System.out.println(String.join(",",jids));
- }
- }
- }
复制代码 由于我这里是windows电脑,以是我打包放到服务器上去运行
执行命令 :
java -cp test_flink_mode-1.0-SNAPSHOT.jar com.test.Main
不出以外的话,会打印如下日志
- log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
- log4j:WARN Please initialize the log4j system properly.
- log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
- http://node2:33811
- application_1715418089838_0017
- 6d4d6ed5277a62fc9a3a274c4f34a468
复制代码 复制打印的url毗连,就可以打开flink的webui了,在yarn的前端页面中也可以看到flink任务。
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |