Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
770 views
in Technique[技术] by (71.8m points)

hadoop - Flume HDFS sink keeps rolling small files

I'm trying to stream twitter data into hdfs using flume and this: https://github.com/cloudera/cdh-twitter-example/

Whatever I try here, it keeps creating files in HDFS that range in size from 1.5kB to 15kB where I would like to see large files (64Mb). Here is the agent configuration:

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxxxx
TwitterAgent.sources.Twitter.accessToken = xxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxx
TwitterAgent.sources.Twitter.keywords = test

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost.localdomain:8020/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 67108864
TwitterAgent.sinks.HDFS.hdfs.rollCount = 0
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 0
TwitterAgent.sinks.HDFS.hdfs.idleTimeout = 0

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 1000

EDIT: I looked into the log files and found this happening all the time:

9:11:27.526 AM WARN org.apache.flume.sink.hdfs.BucketWriter Block Under-replication detected. Rotating file. 9:11:37.036 AM ERROR org.apache.flume.sink.hdfs.BucketWriter

Hit max consecutive under-replication rotations (30); will not continue rolling files under this path due to under-replication

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

It seemed to be a problem with the HDFS replication factor. As I am working on a virtual machine with 1 virtual datanode I had to set the replication factor to 1 in order for it to work as expected.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...