Spark 之 HiveStrategies

打印 上一主题 下一主题

主题 541|帖子 541|积分 1623

HiveTableRelation 相关代码

HiveStrategies.scala
当 relation.tableMeta.stats.isEmpty 是, 即调用 hiveTableWithStats
  1. class DetermineTableStats(session: SparkSession) extends Rule[LogicalPlan] {
  2.   private def hiveTableWithStats(relation: HiveTableRelation): HiveTableRelation = {
  3.     val table = relation.tableMeta
  4.     val partitionCols = relation.partitionCols
  5.     // For partitioned tables, the partition directory may be outside of the table directory.
  6.     // Which is expensive to get table size. Please see how we implemented it in the AnalyzeTable.
  7.     val sizeInBytes = if (conf.fallBackToHdfsForStatsEnabled && partitionCols.isEmpty) {
  8.       try {
  9.         val hadoopConf = session.sessionState.newHadoopConf()
  10.         val tablePath = new Path(table.location)
  11.         val fs: FileSystem = tablePath.getFileSystem(hadoopConf)
  12.         fs.getContentSummary(tablePath).getLength
  13.       } catch {
  14.         case e: IOException =>
  15.           logWarning("Failed to get table size from HDFS.", e)
  16.           conf.defaultSizeInBytes
  17.       }
  18.     } else {
  19.       conf.defaultSizeInBytes
  20.     }
  21.     val stats = Some(Statistics(sizeInBytes = BigInt(sizeInBytes)))
  22.     relation.copy(tableStats = stats)
  23.   }
  24.   override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
  25.     case relation: HiveTableRelation
  26.       if DDLUtils.isHiveTable(relation.tableMeta) && relation.tableMeta.stats.isEmpty =>
  27.       hiveTableWithStats(relation)
  28.     // handles InsertIntoStatement specially as the table in InsertIntoStatement is not added in its
  29.     // children, hence not matched directly by previous HiveTableRelation case.
  30.     case i @ InsertIntoStatement(relation: HiveTableRelation, _, _, _, _, _)
  31.       if DDLUtils.isHiveTable(relation.tableMeta) && relation.tableMeta.stats.isEmpty =>
  32.       i.copy(table = hiveTableWithStats(relation))
  33.   }
  34. }
复制代码


  • HiveTableRelation
  1. /**
  2. * A `LogicalPlan` that represents a hive table.
  3. *
  4. * TODO: remove this after we completely make hive as a data source.
  5. */
  6. case class HiveTableRelation(
  7.     tableMeta: CatalogTable,
  8.     dataCols: Seq[AttributeReference],
  9.     partitionCols: Seq[AttributeReference],
  10.     tableStats: Option[Statistics] = None,
  11.     @transient prunedPartitions: Option[Seq[CatalogTablePartition]] = None)
复制代码
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。
回复

使用道具 举报

0 个回复

倒序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

小秦哥

金牌会员
这个人很懒什么都没写!

标签云

快速回复 返回顶部 返回列表