Java笔记 ·

Logstash6整合Hadoop-报错与解决方案

上接Logstash6整合Hadoop

报错与解决方案

WebHDFS::ServerError: Failed to connect to host 192.168.0.80:50070,No route to host

问题

 Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<WebHDFS::ServerError: Failed to connect to host 192.168.0.80:50070, No route to host - Failed to open TCP connection to 192.168.0.80:50070 (No route to host - SocketChannel.connect)>, :backtrace=>["/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:351:in `request'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:275:in `operate_requests'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:138:in `list'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs_helper.rb:49:in `test_client'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:155:in `register'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:102:in `register'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:46:in `register'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:241:in `register_plugin'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:252:in `block in register_plugins'", "org/jruby/RubyArray.java:1734:in `each'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:252:in `register_plugins'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:593:in `maybe_setup_out_plugins'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:262:in `start_workers'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:199:in `run'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:159:in `block in start'"], :thread=>"#<Thread:0x77eb0a57 run>"}[2018-09-27T10:03:36,647][ERROR][logstash.agent           ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create<main>, action_result: false", :backtrace=>nil}

原因

196.168.0.79上未在/etc/hosts中配置192.168.0.80的记录

解决方案

编辑/etc/hosts

sudo vi /etc/hosts

添加以下记录即可:

192.168.0.80    hadoop

Hadoop获得的日志记录与预期不符

最开始Logstash中没设置codec,获得的记录是这样的:

2018-09-28T08:39:22.294Z {name=dev.windcoder.com} %{message}

之后设置

codec => plain {
                  format => "%{message}"
 }

又变成了

%{message}

最后设置

codec => "json"

得到了所需要的上面所示的格式。

原因就是Logstash 的Filter插件部分已将非结构化的数据进行了结构化操作,在输出时需要通过codec解码成相应的格式,对于这里就是json.

Maybe you should increase retry_interval or reduce number of workers

问题

[2018-09-28T15:57:43,341][WARN ][logstash.outputs.webhdfs ] webhdfs write caused an exception: {"RemoteException":{"exception":"RecoveryInProgressException","javaClassName":"org.apache.hadoop.hdfs.protocol.RecoveryInProgressException","message":"Failed to APPEND_FILE /user/parim/logstash-data/logstash-2018-09-28.log for DFSClient_NONMAPREDUCE_965601342_27 on 192.168.0.80 because lease recovery is in progress. Try again later.\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2443)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirAppendOp.appendFile(FSDirAppendOp.java:117)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2498)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:759)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:437)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489)\n"}}. Maybe you should increase retry_interval or reduce number of workers. Retrying...

原因

HDFS的读写模式为 "write-once-read-many",为了实现write-once,需要设计一种互斥机制,租约(Lease)应运而生

租约本质上是一个有时间约束的锁,即:在一定时间内对租约持有者(也就是客户端)赋予一定的权限。

这里的警告说Lease正在被另一个进程释放中,需要等会再试。这样就说明可能是我们Logstash写入HDFS的频率过快,导致HDFS来不及释放Lease。而且最开始Logstash中的webhdfs用的均是默认配置:

解决方案

在Logstash中的webhdfs中添加配置,做输出优化:

    flush_size => 5000
    idle_flush_time => 5
    retry_interval => 3

配置项含义:

  • flush_size : 如果event计数超出flush_size设置的值,即使未达到store_interval_in_secs,也会将数据发送到webhdfs,默认是500
  • idle_flush_time : 以x秒为间隔将数据发送到webhdfs,默认是1
  • retry_interval : 两次重试之间等待多长时间,默认是0.5

优化思路:
- 提高flush_size的值,来减少访问webhdfs的频率,同时提高HDFS的写入量
- 降低idle_flush_time的值,因为提高了flush_size,所以可以适当的减少数据发送到webhdfs的时间间隔
- 提高retry_interval的值,来减少高频重试带来的额外负载

参考:

Logstash学习(七)Logstash的webhdfs插件

Webhdfs output plugin

HDFS租约实践

HDFS租约机制

Failed to flush outgoing items...WebHDFS::ServerError

问题

Failed to flush outgoing items {:outgoing_count=>1, :exception=>"WebHDFS::ServerError", :backtrace=>["/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:351:in `request'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:270:in `operate_requests'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:73:in `create'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:228:in `write_data'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:211 :in `block in flush'", "org/jruby/RubyHash.java:1343:in `each'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:199 :in `flush'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:219:in `block in buffer_flush'", "org/jruby/RubyHash.java:1343:in `each'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:216: in `buffer_flush'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:159:in `buffer_receive'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:182:in `receive'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/outputs/base.rb:89:in `block in multi_receive'", "org/jruby/RubyArray.java:1734:in `each'","/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/outputs/base.rb:89:in `multi_receive'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:114:in `multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:97:in `multi_receive'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:372:in `block in output_batch'","org/jruby/RubyHash.java:1343:in `each'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:371:in `output_batch'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:323:in `worker_loop'","/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:285:in `block in start_workers'"]}

原因

最开始Hadoop的hdfs-site.xml没有配置

    <property>
        <name> dfs.datanode.hostname</name>
        <value>192.168.0.80</value>
    </property>

当在通过Hadoop的WebHDFS api测试读取文件:

curl -i -L http://192.168.0.80:50070/webhdfs/v1//user/parim/logstash/logs/hadoop-parim-namenode-localhost.localdomain.log?op=OPEN

发现192.168.0.79结果如下:

HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Fri, 28 Sep 2018 04:35:55 GMT
Date: Fri, 28 Sep 2018 04:35:55 GMT
Pragma: no-cache
Expires: Fri, 28 Sep 2018 04:35:55 GMT
Date: Fri, 28 Sep 2018 04:35:55 GMT
Pragma: no-cache
Content-Type: application/octet-stream
X-FRAME-OPTIONS: SAMEORIGIN
Location: http://localhost:50075/webhdfs/v1//user/parim/logstash/hadoop-parim-datanode-localhost.localdomain.log?op=OPEN&namenoderpcaddress=192.168.0.80:54310&offset=0
Content-Length: 0

curl: (7) couldn't connect to host

192.168.0.80虽然也会重定向,但会正常返回数据。查询Open and Read a File得知其处理流程:

提交get请求,其自动跟踪重定向->请求被重定向到可以读取文件数据的datanode->客户端遵循重定向到datanode并接收文件数据

此处的DATANODE成了Hadoop默认的localhost,自然无法请求到192.168.0.80的datanode。

解决方案

在hdfs-site.xml中添加如下配置即可:

    <property>
        <name> dfs.datanode.hostname</name>
        <value>192.168.0.80</value>
    </property>

最开始在官方的hdfs-default.xml下并未找到该配置属性,后来通过搜索在webhdfs两个步骤上载文件中才得知这个属性。

另外,这个问题在搜索过程中发现涉及“WebHDFS::ServerError”的大部分都是在说“没有写入权限的问题”,特此也记录一下:

  1. HDFS访问账户问题;
  2. HDFS的主机解析问题;

HDFS访问账户问题

Logstash的输出插件中的webhdfs部分的user,Logstash解释是webhdfs的用户名。一般默认使用启动Hadoop的Username。

原则上只要该user对path中的根文件夹有读写,对其子文件夹和文件有创建、读写等必需权限即可,可设置user为path中的根文件夹的所有者(owner )。

HDFS的主机解析问题

直接将所有Hadoop的节点/IP映射放入/etc/hosts中。

参考:

HDFS Permissions Guide

logstash的webhdfs使用问题

Logstash使用webhdfs插件遇到写入HDFS权限问题

Hadoop与Java版本

Hadoop Java
2.7及以后版本 Java 7 +
2.6及以前版本 Java 6 +

HadoopJavaVersions

参与评论