Logstash6整合Hadoop-报错与解决方案
报错与解决方案
WebHDFS::ServerError: Failed to connect to host 192.168.0.80:50070,No route to host
问题
Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<WebHDFS::ServerError: Failed to connect to host 192.168.0.80:50070, No route to host - Failed to open TCP connection to 192.168.0.80:50070 (No route to host - SocketChannel.connect)>, :backtrace=>["/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:351:in `request'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:275:in `operate_requests'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:138:in `list'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs_helper.rb:49:in `test_client'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:155:in `register'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:102:in `register'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:46:in `register'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:241:in `register_plugin'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:252:in `block in register_plugins'", "org/jruby/RubyArray.java:1734:in `each'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:252:in `register_plugins'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:593:in `maybe_setup_out_plugins'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:262:in `start_workers'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:199:in `run'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:159:in `block in start'"], :thread=>"#<Thread:0x77eb0a57 run>"}[2018-09-27T10:03:36,647][ERROR][logstash.agent ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create<main>, action_result: false", :backtrace=>nil}
原因
196.168.0.79上未在/etc/hosts
中配置192.168.0.80的记录
解决方案
编辑/etc/hosts
sudo vi /etc/hosts
添加以下记录即可:
192.168.0.80 hadoop
Hadoop获得的日志记录与预期不符
最开始Logstash中没设置codec
,获得的记录是这样的:
2018-09-28T08:39:22.294Z {name=dev.windcoder.com} %{message}
之后设置
codec => plain {
format => "%{message}"
}
2
3
又变成了
%{message}
最后设置
codec => "json"
得到了所需要的上面所示的格式。
原因就是Logstash 的Filter插件部分已将非结构化的数据进行了结构化操作,在输出时需要通过codec解码成相应的格式,对于这里就是json.
Maybe you should increase retry_interval or reduce number of workers
问题
[2018-09-28T15:57:43,341][WARN ][logstash.outputs.webhdfs ] webhdfs write caused an exception: {"RemoteException":{"exception":"RecoveryInProgressException","javaClassName":"org.apache.hadoop.hdfs.protocol.RecoveryInProgressException","message":"Failed to APPEND_FILE /user/parim/logstash-data/logstash-2018-09-28.log for DFSClient_NONMAPREDUCE_965601342_27 on 192.168.0.80 because lease recovery is in progress. Try again later.\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2443)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirAppendOp.appendFile(FSDirAppendOp.java:117)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2498)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:759)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:437)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489)\n"}}. Maybe you should increase retry_interval or reduce number of workers. Retrying...
原因
HDFS的读写模式为 “write-once-read-many”,为了实现write-once,需要设计一种互斥机制,租约(Lease)应运而生
租约本质上是一个有时间约束的锁,即:在一定时间内对租约持有者(也就是客户端)赋予一定的权限。
这里的警告说Lease正在被另一个进程释放中,需要等会再试。这样就说明可能是我们Logstash写入HDFS的频率过快,导致HDFS来不及释放Lease。而且最开始Logstash中的webhdfs用的均是默认配置:
解决方案
在Logstash中的webhdfs中添加配置,做输出优化:
flush_size => 5000
idle_flush_time => 5
retry_interval => 3
2
3
配置项含义:
- flush_size : 如果event计数超出flush_size设置的值,即使未达到store_interval_in_secs,也会将数据发送到webhdfs,默认是500
- idle_flush_time : 以x秒为间隔将数据发送到webhdfs,默认是1
- retry_interval : 两次重试之间等待多长时间,默认是0.5
优化思路:
- 提高flush_size的值,来减少访问webhdfs的频率,同时提高HDFS的写入量
- 降低idle_flush_time的值,因为提高了flush_size,所以可以适当的减少数据发送到webhdfs的时间间隔
- 提高retry_interval的值,来减少高频重试带来的额外负载
参考:
Logstash学习(七)Logstash的webhdfs插件
Failed to flush outgoing items…WebHDFS::ServerError
问题
Failed to flush outgoing items {:outgoing_count=>1, :exception=>"WebHDFS::ServerError", :backtrace=>["/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:351:in `request'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:270:in `operate_requests'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:73:in `create'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:228:in `write_data'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:211 :in `block in flush'", "org/jruby/RubyHash.java:1343:in `each'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:199 :in `flush'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:219:in `block in buffer_flush'", "org/jruby/RubyHash.java:1343:in `each'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:216: in `buffer_flush'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:159:in `buffer_receive'", "/home/parim/elk/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:182:in `receive'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/outputs/base.rb:89:in `block in multi_receive'", "org/jruby/RubyArray.java:1734:in `each'","/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/outputs/base.rb:89:in `multi_receive'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:114:in `multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:97:in `multi_receive'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:372:in `block in output_batch'","org/jruby/RubyHash.java:1343:in `each'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:371:in `output_batch'", "/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:323:in `worker_loop'","/home/parim/elk/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:285:in `block in start_workers'"]}
原因
最开始Hadoop的hdfs-site.xml没有配置
<property>
<name> dfs.datanode.hostname</name>
<value>192.168.0.80</value>
</property>
2
3
4
当在通过Hadoop的WebHDFS api测试读取文件:
curl -i -L http://192.168.0.80:50070/webhdfs/v1//user/parim/logstash/logs/hadoop-parim-namenode-localhost.localdomain.log?op=OPEN
发现192.168.0.79结果如下:
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Fri, 28 Sep 2018 04:35:55 GMT
Date: Fri, 28 Sep 2018 04:35:55 GMT
Pragma: no-cache
Expires: Fri, 28 Sep 2018 04:35:55 GMT
Date: Fri, 28 Sep 2018 04:35:55 GMT
Pragma: no-cache
Content-Type: application/octet-stream
X-FRAME-OPTIONS: SAMEORIGIN
Location: http://localhost:50075/webhdfs/v1//user/parim/logstash/hadoop-parim-datanode-localhost.localdomain.log?op=OPEN&namenoderpcaddress=192.168.0.80:54310&offset=0
Content-Length: 0
curl: (7) couldn't connect to host
2
3
4
5
6
7
8
9
10
11
12
13
14
192.168.0.80虽然也会重定向,但会正常返回数据。查询Open and Read a File得知其处理流程:
提交get请求,其自动跟踪重定向->请求被重定向到可以读取文件数据的datanode->客户端遵循重定向到datanode并接收文件数据
此处的DATANODE成了Hadoop默认的localhost,自然无法请求到192.168.0.80的datanode。
解决方案
在hdfs-site.xml中添加如下配置即可:
<property>
<name> dfs.datanode.hostname</name>
<value>192.168.0.80</value>
</property>
2
3
4
最开始在官方的hdfs-default.xml下并未找到该配置属性,后来通过搜索在webhdfs两个步骤上载文件中才得知这个属性。
另外,这个问题在搜索过程中发现涉及“WebHDFS::ServerError”的大部分都是在说“没有写入权限的问题”,特此也记录一下:
- HDFS访问账户问题;
- HDFS的主机解析问题;
HDFS访问账户问题
Logstash的输出插件中的webhdfs部分的user,Logstash解释是webhdfs的用户名。一般默认使用启动Hadoop的Username。
原则上只要该user对path中的根文件夹有读写,对其子文件夹和文件有创建、读写等必需权限即可,可设置user为path中的根文件夹的所有者(owner )。
HDFS的主机解析问题
直接将所有Hadoop的节点/IP映射放入/etc/hosts中。
参考:
Logstash使用webhdfs插件遇到写入HDFS权限问题
Hadoop与Java版本
Hadoop | Java |
---|---|
2.7及以后版本 | Java 7 + |
2.6及以前版本 | Java 6 + |
除特别注明外,本站所有文章均为 windcoder 原创,转载请注明出处来自: logstash6zhenghehadoop-baocuoyujiejuefangan

暂无数据