Although it's good to wonder about how it might be explained that you often see the same order, I'd like to point out that it never a good idea to rely on implicit order caused by the particular implementation of the underlying database engine. In other words, its nice to know why, but you should never ever rely on it. For MS SQL, the only thing that reliably delivers the rows in a certain order, is an explicit ORDER BY
clause.
Not only do different RDMBS-es behave differently, one particular instance may behave differently due to an update (patch). Not only that, even the state of the RDBMS software may have an impact: a "warm" database behaves differently than a "cold" one, a small table behaves differently than a large one.
Even if you have background information about the implementation (ex: "there is a clustered index, thus it is likely the data will be returned by order of the clustered index"), there is always a possibility that there is another mechanism you don't know about that causes the rows to be returned in a different order (ex1: "if another session just did a full table scan with an explicit ORDER BY
the resultset may have been cached; a subsequent full scan will attempt to return the rows from the cache"; ex2: "a GROUP BY
may be implemented by sorting the data, thus impacting the order the rows are returned"; ex3: "If the selected columns are all in a secondary index that is already cached in memory, the engine may scan the secondary index instead of the table, most likely returning the rows by order of the secondary index").
Here's a very simple test that illustrates some of my points.
First, startup SQL server (I'm using 2008). Create this table:
create table test_order (
id int not null identity(1,1) primary key
, name varchar(10) not null
)
Examine the table and witness that a clusted index was created to support the primary key
on the id
column. For example, in sql server management studio, you can use the tree view and navigate to the indexes folder beneath your table. There you should see one index, with a name like: PK__test_ord__3213E83F03317E3D (Clustered)
Insert the first row with this statement:
insert into test_order(name)
select RAND()
Insert more rows by repeating this statement 16 times:
insert into test_order(name)
select RAND()
from test_order
You should now have 65536 rows:
select COUNT(*)
from test_order
Now, select all rows without using an order by:
select *
from test_order
Most likely, the results will be returned by order of the primary key (although there is no guarantee). Here's the result I got (which is indeed by order of primary key):
# id name
1 1 0.605831
2 2 0.517251
3 3 0.52326
. . .......
65536 65536 0.902214
(the # is not a column but the ordinal position of the row in the result)
Now, create a secondary index on the name
column:
create index idx_name on test_order(name)
Select all rows, but retrieve only the name
column:
select name
from test_order
Most likely the results will be returned by order of the secondary index idx_name, since the query can be resolved by only scanning the index (i.o.w. idx_name
is a covering index). Here's the result I got, which is indeed by order of name
.
# name
1 0.0185732
2 0.0185732
. .........
65536 0.981894
Now, select all columns and all rows again:
select *
from test_order
Here's the result I got:
# id name
1 17 0.0185732
2 18 0.0185732
3 19 0.0185732
... .. .........
as you can see, quite different from the first time we ran this query. (It looks like the rows are ordered by the secondary index, but I don't have an explanation why that should be so).
Anyway, the bottom line is - don't rely on implicit order. You can think of explanations why a particular order can be observed, but even then you can't always predict it (like in the latter case) without having intimate knowledge of implementation and runtime state.