INFO allowed=true ugi=hdfs (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/user/sqoop2/.Trash/Current dst=null perm=null
复制代码
,这就是 HDFS 审计记载器的日志消息(来自图 6-8 )的输出方式
Thread-Time-Category-Context Layout(TTCCLayout): This Layout outputs the invoking thread, time (in milliseconds since application started), the category or Logger used to create this logging event, and nested diagnostic context. All these properties are optional and if they are all disabled, the Layout will still write out the logging level and the message itself, just like Simple Layout. If you specify the following options in log4j.properties:
(The -v option for command grep filters records with the keyword specified after the option.) The results included normal user activity plus the following suspicious activity by a user RogueITGuy:
2014-03-06 22:26:08,280 INFO FSNamesystem.audit: allowed=true ugi=RogueITGuy (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo
src=/Ticketing/Ticket_details_20140220
dst=null perm=null
2014-03-06 22:26:08,296 INFO FSNamesystem.audit: allowed=true ugi=RogueITGuy (auth:SIMPLE) ip=/127.0.0.1 cmd=rename
src=/Ticketing/Ticket_details_20140220
dst=/Ticketing/Ticket_stg/Ticket_details_20140220
perm=RogueITGuy:supergroup:rw-r--r—
2014-03-06 22:27:02,666 INFO FSNamesystem.audit: allowed=true ugi=RogueITGuy
第一个条目(cmd=getfileinfo src=/Ticketing/Ticket_details_20140220)是确保他从自己的 PC 上传了正确的(修改后删除了他女朋侪的机票条目)文件。
第三个条目是确保修改后的文件被正确地上传到暂存位置。
Hive log: If this user overwrote a partition with the modified file, he would have done that using Hive. So, investigators looked at the Hive logs next (in /var/log/hive for Cloudera CDH4; may vary as per your distribution and configuration):
2014-03-06 22:28:01,394 INFO org.apache.hadoop.mapred.JobInProgress: job_201403042158_0008: nMaps=1 nReduces=0 max=-1
2014-03-06 22:28:01,764 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201403042158_0008 = 74\. Number of splits = 1
2014-03-06 22:28:01,765 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201403042158_0008_m_000000 has split on node:/default/localhost.localdomain
2014-03-06 22:28:01,765 INFO org.apache.hadoop.mapred.JobInProgress: Job job_201403042158_0008 initialized successfully with 1 map tasks and 0 reduce tasks.
2014-03-06 22:28:02,089 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_SETUP) 'attempt_201403042158_0008_m_000002_0' to tip task_201403042158_0008_m_000002, for tracker 'tracker_localhost.localdomain:localhost.localdomain/127.0.0.1:47799'
复制代码
使用 Web 浏览器检索工作具体信息
您还可以使用浏览器界面轻松查察 JobTracker 和 TaskTracker 日志记载。最好使用浏览器界面查察作业的运行时统计信息或作业的 XML 文件。记载的 URL 由跟踪者的姓名和 web 访问端口组成。例如,如果您的 JobTracker 主机名为'MyJobHost',并使用端口 50030 进行 web 访问,那么可以在http://MyJobHost:50030/logs/查察 JobTracker 日志。同样,在主机'MyTaskHost'上运行并使用端口 50060 的 TaskTracker 的日志可以在http://MyTaskHost:50060/logs/查察。检查您的配置文件(mapred-site.xml)以了解运行特定保卫步伐和端口的主机的具体信息。文件名可能因发行版而异,但日志文件的名称中会有TaskTracker或JobTracker,使它们易于识别。
图 6-12 显示了使用 MapReduce 1.0 的集群的日志目录和各种 MapReduce 日志文件。
gmond: gmond needs to be installed on every host you want monitored. It interacts with the host operating system to acquire Metrics such as load Metrics (e.g., average cluster load), process Metrics (e.g., total running processes) or rpc Metrics (e.g., RpcAuthenticationFailures). It is modular and uses operating system–specific plugins to take measurements. Since only the necessary plugins are installed at compile time, gmond has a very small footprint and negligible overhead.
gmond 不是根据来自外部轮询引擎的哀求被调用的(用于测量),而是根据本地配置文件定义的时间表进行轮询。通过在同一组播地点广播的简单监听/告示协议,与其他主机(来自集群)共享测量结果。每个 gmond 主机还记载它从集群中的其他主机吸收到的指标。
因此,Ganglia 集群中的每个主机都知道同一个集群中每个其他主机记载的每个指标的当前值。这就是为什么每个集群只需要轮询一台主机来获取整个集群的指标,任何单个主机故障都不会影响系统!此外,这种设计淘汰了需要以指数方式轮询的主机数量,因此很容易扩展到大型集群。
gweb: gweb is the visualization interface for Ganglia. It provides instant access to any Metric from any host in the cluster without specifying any configuration details. It visually summarizes the entire grid using graphs that combine Metrics by cluster and provides drop-downs for additional details. If you need details of a specific host or Metric, you can specify the details and create a custom graph of exactly what you want to see.
gweb 允许您更改图表中的时间段,支持以各种文本格式(CSV、JSON 等)提取数据,并提供了一个功能齐全的 URL 接口,以便您可以通过特定的 URL 将必要的图表嵌入到其他步伐中。此外,gweb 是一个 PHP 步伐,运行在 Apache web 服务器下,通常安装在与 gmetad 雷同的物理硬件上,由于它需要访问 gmetad 创建的 RRD 数据库。
set hive.encrypt.master.keyProviderParameters=keyStoreUrl=file:////root/bcl/BCLKeyStore.keystore&keyStoreType=JCEKS&password=bcl2601;
set hive.encrypt.keyProviderParameters=keyStoreUrl=file:////root/bcl/BCLKeyStore.keystore&keyStoreType=JCEKS&password=bcl2601;
set mapred.crypto.secrets.protector.class=com.intel.hadoop.mapreduce.cryptocontext.provider.AgentSecretsProtector;
set mapred.agent.encryption.key.provider=org.apache.hadoop.io.crypto.KeyStoreKeyProvider;
set mapred.agent.encryption.key.provider.parameters=keyStoreUrl=file:////keys/clusterpublic.TrustStore&keyStoreType=JKS&password=123456;
set mapred.agent.encryption.keyname=HIVEKEYCLUSTERPUBLICASYM;
复制代码
创建一个指向 Pig 创建的加密数据文件的加密外部表:
create external table bcl_encrypted_pig_data(name STRING, age INT, country STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/tmp/bcl/bcl_encrypted/' TBLPROPERTIES("hive.encrypt.enable"="true", "hive.encrypt.keyName"="BCLKey");