apache spark - Does Hive preserve file order when selecting data

Question

Welcome To Ask or Share your Answers For Others

apache spark - Does Hive preserve file order when selecting data

1 Answer

深蓝 · Answer 1 · 2021-10-23T20:07:03+0000

Without ORDER BY the order is not guaranteed.

Data is being read in parallel by many processes (mappers), after splits were calculated, each process starts reading some piece of file or few files, depending on splits calculated.

All parallel processes can process different volume of data and running on different nodes, the load is not the same each time, so they start returning rows and finishing at different times, depending on too many factors, such as node load, network load, volume of data per process, etc, etc.

Removing all this factors you can increase the order prediction accuracy. Say, single thread sequential file read may return rows in the same order as they are in the file. But this is not how the database works.

Also according to Codd's relational theory, the order of columns and rows is immaterial.

Categories

apache spark - Does Hive preserve file order when selecting data

apache spark - Does Hive preserve file order when selecting data

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags