The best way to know is to try it, see my results bellow.
But before trying, my guess is that even if you can only allocate 80 full blocks in your configuration, you can allocate more than 80 non-empty files. This is because I think HDFS does not use a full block each time you allocate a non-empty file. Said in another way, HDFS blocks are not a storage allocation unit, but a replication unit. I think the storage allocation unit of HDFS is the unit of the underlying filesystem (if you use ext4 with a block size of 4 KB and you create a 1 KB file in a cluster with replication factor of 3, you consume 3 times 4 KB = 12 KB of hard disk space).
Enough guessing and thinking, let's try it. My lab configuration is as follow:
- hadoop version 1.0.4
- 4 data nodes, each with a little less than 5.0G of available space, ext4 block size of 4K
- block size of 64 MB, default replication of 1
After starting HDFS, I have the following NameNode summary:
- 1 files and directories, 0 blocks = 1 total
- DFS Used: 112 KB
- DFS Remaining: 19.82 GB
Then I do the following commands:
hadoop fs -mkdir /test
for f in $(seq 1 10); do hadoop fs -copyFromLocal ./1K_file /test/$f; done
With these results:
- 12 files and directories, 10 blocks = 22 total
- DFS Used: 122.15 KB
- DFS Remaining: 19.82 GB
So the 10 files did not consume 10 times 64 MB (no modification of "DFS Remaining").
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…