ToB企服应用市场:ToB评测及商务社交产业平台

标题: Spark 之 HiveStrategies [打印本页]

作者: 小秦哥    时间: 2024-6-19 21:11
标题: Spark 之 HiveStrategies
HiveTableRelation 相关代码

HiveStrategies.scala
当 relation.tableMeta.stats.isEmpty 是, 即调用 hiveTableWithStats
  1. class DetermineTableStats(session: SparkSession) extends Rule[LogicalPlan] {
  2.   private def hiveTableWithStats(relation: HiveTableRelation): HiveTableRelation = {
  3.     val table = relation.tableMeta
  4.     val partitionCols = relation.partitionCols
  5.     // For partitioned tables, the partition directory may be outside of the table directory.
  6.     // Which is expensive to get table size. Please see how we implemented it in the AnalyzeTable.
  7.     val sizeInBytes = if (conf.fallBackToHdfsForStatsEnabled && partitionCols.isEmpty) {
  8.       try {
  9.         val hadoopConf = session.sessionState.newHadoopConf()
  10.         val tablePath = new Path(table.location)
  11.         val fs: FileSystem = tablePath.getFileSystem(hadoopConf)
  12.         fs.getContentSummary(tablePath).getLength
  13.       } catch {
  14.         case e: IOException =>
  15.           logWarning("Failed to get table size from HDFS.", e)
  16.           conf.defaultSizeInBytes
  17.       }
  18.     } else {
  19.       conf.defaultSizeInBytes
  20.     }
  21.     val stats = Some(Statistics(sizeInBytes = BigInt(sizeInBytes)))
  22.     relation.copy(tableStats = stats)
  23.   }
  24.   override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
  25.     case relation: HiveTableRelation
  26.       if DDLUtils.isHiveTable(relation.tableMeta) && relation.tableMeta.stats.isEmpty =>
  27.       hiveTableWithStats(relation)
  28.     // handles InsertIntoStatement specially as the table in InsertIntoStatement is not added in its
  29.     // children, hence not matched directly by previous HiveTableRelation case.
  30.     case i @ InsertIntoStatement(relation: HiveTableRelation, _, _, _, _, _)
  31.       if DDLUtils.isHiveTable(relation.tableMeta) && relation.tableMeta.stats.isEmpty =>
  32.       i.copy(table = hiveTableWithStats(relation))
  33.   }
  34. }
复制代码

  1. /**
  2. * A `LogicalPlan` that represents a hive table.
  3. *
  4. * TODO: remove this after we completely make hive as a data source.
  5. */
  6. case class HiveTableRelation(
  7.     tableMeta: CatalogTable,
  8.     dataCols: Seq[AttributeReference],
  9.     partitionCols: Seq[AttributeReference],
  10.     tableStats: Option[Statistics] = None,
  11.     @transient prunedPartitions: Option[Seq[CatalogTablePartition]] = None)
复制代码
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。




欢迎光临 ToB企服应用市场:ToB评测及商务社交产业平台 (https://dis.qidao123.com/) Powered by Discuz! X3.4