Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

string - determining most used set of words php mysql

I'm trying to figure out how to go about determining the most used words on a mysql dataset.

Not sure how to go about this or if there's a simpler approach. Read a couple posts where some suggests an algorithm.

Example:

From 24,500 records, find out the top 10 used words.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Right, this runs like a dog and is limited to working with a single delimiter, but hopefully will give you an idea.

SELECT aWord, COUNT(*) AS WordOccuranceCount
FROM (SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(concat(SomeColumn, ' '), ' ', aCnt), ' ', -1) AS aWord
FROM SomeTable
CROSS JOIN (
SELECT a.i+b.i*10+c.i*100 + 1 AS aCnt
FROM integers a, integers b, integers c) Sub1
WHERE (LENGTH(SomeColumn) + 1 - LENGTH(REPLACE(SomeColumn, ' ', ''))) >= aCnt) Sub2
WHERE Sub2.aWord != ''
GROUP BY aWord
ORDER BY WordOccuranceCount DESC
LIMIT 10

This relies on having a table called integers with a single column called i with 10 rows with the values 0 to 9. It copes with up to ~1000 words but can easily be altered to cope with more (but will slow down even more).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...