ToB企服应用市场:ToB评测及商务社交产业平台

标题: 使用java远程提交flink任务到yarn集群 [打印本页]

作者: 缠丝猫    时间: 2024-7-27 00:38
标题: 使用java远程提交flink任务到yarn集群
使用java远程提交flink任务到yarn集群

背景

由于业务需要,使用命令行的方式提交flink任务比较麻烦,要么将后端任务部署到大数据集群,要么弄一个提交机,感觉都不是很离线。颠末一些调研,发现可以实现远程的任务发布。接下来就记录一下实现过程。这里用flink on yarn 的Application模式实现
情况预备


正式开始

1. 上传flink jar包到hdfs

去flink官网下载你需要的版本,我这里用的是flink-1.18.1,把flink lib目次下的jar包传到hdfs中。

其中flink-yarn-1.18.1.jar需要各人本身去maven仓库下载。
2. 编写一段flink代码

随便写一段flink代码就行,我们目的是测试
  1. package com.azt;
  2. import org.apache.flink.streaming.api.datastream.DataStreamSource;
  3. import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
  4. import org.apache.flink.streaming.api.functions.source.SourceFunction;
  5. import java.util.Random;
  6. import java.util.concurrent.TimeUnit;
  7. public class WordCount {
  8.     public static void main(String[] args) throws Exception {
  9.         StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
  10.         DataStreamSource<String> source = env.addSource(new SourceFunction<String>() {
  11.             @Override
  12.             public void run(SourceContext<String> ctx) throws Exception {
  13.                 String[] words = {"spark", "flink", "hadoop", "hdfs", "yarn"};
  14.                 Random random = new Random();
  15.                 while (true) {
  16.                     ctx.collect(words[random.nextInt(words.length)]);
  17.                     TimeUnit.SECONDS.sleep(1);
  18.                 }
  19.             }
  20.             @Override
  21.             public void cancel() {
  22.             }
  23.         });
  24.         source.print();
  25.         env.execute();
  26.     }
  27. }
复制代码
3. 打包第二步的代码,上传到hdfs


4. 拷贝配置文件


具体看截图

5. 编写java远程提交任务的程序

这一步有个留意的地方就是,如果你跟我一样是windows电脑,那么本地用idea提交会报错;如果你是mac或者linux,那么可以直接在idea中提交任务。
  1. package com.test;
  2. import org.apache.flink.client.deployment.ClusterDeploymentException;
  3. import org.apache.flink.client.deployment.ClusterSpecification;
  4. import org.apache.flink.client.deployment.application.ApplicationConfiguration;
  5. import org.apache.flink.client.program.ClusterClient;
  6. import org.apache.flink.client.program.ClusterClientProvider;
  7. import org.apache.flink.configuration.*;
  8. import org.apache.flink.runtime.client.JobStatusMessage;
  9. import org.apache.flink.yarn.YarnClientYarnClusterInformationRetriever;
  10. import org.apache.flink.yarn.YarnClusterDescriptor;
  11. import org.apache.flink.yarn.YarnClusterInformationRetriever;
  12. import org.apache.flink.yarn.configuration.YarnConfigOptions;
  13. import org.apache.flink.yarn.configuration.YarnDeploymentTarget;
  14. import org.apache.flink.yarn.configuration.YarnLogConfigUtil;
  15. import org.apache.hadoop.fs.Path;
  16. import org.apache.hadoop.yarn.api.records.ApplicationId;
  17. import org.apache.hadoop.yarn.client.api.YarnClient;
  18. import org.apache.hadoop.yarn.conf.YarnConfiguration;
  19. import java.util.ArrayList;
  20. import java.util.Collection;
  21. import java.util.Collections;
  22. import java.util.List;
  23. import java.util.concurrent.CompletableFuture;
  24. import static org.apache.flink.configuration.MemorySize.MemoryUnit.MEGA_BYTES;
  25. /**
  26. * @date :2021/5/12 7:16 下午
  27. */
  28. public class Main {
  29.     public static void main(String[] args) throws Exception {
  30.         ///home/root/flink/lib/lib
  31.         System.setProperty("HADOOP_USER_NAME","root");
  32. //        String configurationDirectory = "C:\\project\\test_flink_mode\\src\\main\\resources\\conf";
  33.         String configurationDirectory = "/export/server/flink-1.18.1/conf";
  34.         org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
  35.         conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem");
  36.         conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
  37.         String flinkLibs = "hdfs://node1.itcast.cn/flink/lib";
  38.         String userJarPath = "hdfs://node1.itcast.cn/flink/user-lib/original.jar";
  39.         String flinkDistJar = "hdfs://node1.itcast.cn/flink/lib/flink-yarn-1.18.1.jar";
  40.         YarnClient yarnClient = YarnClient.createYarnClient();
  41.         YarnConfiguration yarnConfiguration = new YarnConfiguration();
  42.         yarnClient.init(yarnConfiguration);
  43.         yarnClient.start();
  44.         YarnClusterInformationRetriever clusterInformationRetriever = YarnClientYarnClusterInformationRetriever
  45.                 .create(yarnClient);
  46.         //获取flink的配置
  47.         Configuration flinkConfiguration = GlobalConfiguration.loadConfiguration(
  48.                 configurationDirectory);
  49.         flinkConfiguration.set(CheckpointingOptions.INCREMENTAL_CHECKPOINTS, true);
  50.         flinkConfiguration.set(
  51.                 PipelineOptions.JARS,
  52.                 Collections.singletonList(
  53.                         userJarPath));
  54.         YarnLogConfigUtil.setLogConfigFileInConfig(flinkConfiguration,configurationDirectory);
  55.         Path remoteLib = new Path(flinkLibs);
  56.         flinkConfiguration.set(
  57.                 YarnConfigOptions.PROVIDED_LIB_DIRS,
  58.                 Collections.singletonList(remoteLib.toString()));
  59.         flinkConfiguration.set(
  60.                 YarnConfigOptions.FLINK_DIST_JAR,
  61.                 flinkDistJar);
  62.         //设置为application模式
  63.         flinkConfiguration.set(
  64.                 DeploymentOptions.TARGET,
  65.                 YarnDeploymentTarget.APPLICATION.getName());
  66.         //yarn application name
  67.         flinkConfiguration.set(YarnConfigOptions.APPLICATION_NAME, "jobname");
  68.         //设置配置,可以设置很多
  69.         flinkConfiguration.set(JobManagerOptions.TOTAL_PROCESS_MEMORY, MemorySize.parse("1024",MEGA_BYTES));
  70.         flinkConfiguration.set(TaskManagerOptions.TOTAL_PROCESS_MEMORY, MemorySize.parse("1024",MEGA_BYTES));
  71.         flinkConfiguration.set(TaskManagerOptions.NUM_TASK_SLOTS, 4);
  72.         flinkConfiguration.setInteger("parallelism.default", 4);
  73.         ClusterSpecification clusterSpecification = new ClusterSpecification.ClusterSpecificationBuilder()
  74.                 .createClusterSpecification();
  75. //                设置用户jar的参数和主类
  76.         ApplicationConfiguration appConfig = new ApplicationConfiguration(args,"com.azt.WordCount");
  77.         YarnClusterDescriptor yarnClusterDescriptor = new YarnClusterDescriptor(
  78.                 flinkConfiguration,
  79.                 yarnConfiguration,
  80.                 yarnClient,
  81.                 clusterInformationRetriever,
  82.                 true);
  83.         ClusterClientProvider<ApplicationId> clusterClientProvider = null;
  84.         try {
  85.             clusterClientProvider = yarnClusterDescriptor.deployApplicationCluster(
  86.                     clusterSpecification,
  87.                     appConfig);
  88.         } catch (ClusterDeploymentException e){
  89.             e.printStackTrace();
  90.         }
  91.         ClusterClient<ApplicationId> clusterClient = clusterClientProvider.getClusterClient();
  92.         System.out.println(clusterClient.getWebInterfaceURL());
  93.         ApplicationId applicationId = clusterClient.getClusterId();
  94.         System.out.println(applicationId);
  95.         Collection<JobStatusMessage> jobStatusMessages = clusterClient.listJobs().get();
  96.         int counts = 30;
  97.         while (jobStatusMessages.size() == 0 && counts > 0) {
  98.             Thread.sleep(1000);
  99.             counts--;
  100.             jobStatusMessages = clusterClient.listJobs().get();
  101.             if (jobStatusMessages.size() > 0) {
  102.                 break;
  103.             }
  104.         }
  105.         if (jobStatusMessages.size() > 0) {
  106.             List<String> jids = new ArrayList<>();
  107.             for (JobStatusMessage jobStatusMessage : jobStatusMessages) {
  108.                 jids.add(jobStatusMessage.getJobId().toHexString());
  109.             }
  110.             System.out.println(String.join(",",jids));
  111.         }
  112.     }
  113. }
复制代码
由于我这里是windows电脑,以是我打包放到服务器上去运行
执行命令 :
   java -cp test_flink_mode-1.0-SNAPSHOT.jar com.test.Main
  不出以外的话,会打印如下日志
  1. log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
  2. log4j:WARN Please initialize the log4j system properly.
  3. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
  4. http://node2:33811
  5. application_1715418089838_0017
  6. 6d4d6ed5277a62fc9a3a274c4f34a468
复制代码
复制打印的url毗连,就可以打开flink的webui了,在yarn的前端页面中也可以看到flink任务。

免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。




欢迎光临 ToB企服应用市场:ToB评测及商务社交产业平台 (https://dis.qidao123.com/) Powered by Discuz! X3.4