Without ORDER BY
the order is not guaranteed.
Data is being read in parallel by many processes (mappers), after splits were calculated, each process starts reading some piece of file or few files, depending on splits calculated.
All parallel processes can process different volume of data and running on different nodes, the load is not the same each time, so they start returning rows and finishing at different times, depending on too many factors, such as node load, network load, volume of data per process, etc, etc.
Removing all this factors you can increase the order prediction accuracy. Say, single thread sequential file read may return rows in the same order as they are in the file. But this is not how the database works.
Also according to Codd's relational theory, the order of columns and rows is immaterial.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…