I propose a faster way.
Get the row count:
SELECT CEIL(COUNT(*)/2) FROM data;
Then take the middle value in a sorted subquery:
SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;
I tested this with a 5x10e6 dataset of random numbers and it will find the median in under 10 seconds.
This will find an arbitrary percentile by replacing the COUNT(*)/2
with COUNT(*)*n
where n
is the percentile (.5 for median, .75 for 75th percentile, etc).
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…