Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
293 views
in Technique[技术] by (71.8m points)

How to list the top n frequent words in a string array in Java?

I need to analyse a text and find the top n frequent words there. Where n is a number of frequent words to be printed which a user can specify I used hashmaps for this. But for now, I can only find one most frequent word

Supposing I have such a hash map

    cat: 4
    dog: 3
    sky: 10
    blue: 1

My code to find the most frequent word looks like:

        int compareValue = 0;
        String compareKey = "";

  for (Map.Entry<String, Integer> set : pairs.entrySet()) {
            if (set.getValue() > compareValue) {
                compareKey = set.getKey();
                compareValue = set.getValue();
            }
        }

Could you please advise me how I can modify this code to find more than 1 most frequent word? And having a variable to specify the required number of frequent words


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Here is your answer:

        String text = "Very very very good text to compare good text with cats, cats and dogs. " +
                "Very good dogs.";

        int mostFrequentWordsNumber = 5;

        Map<String, Integer> mapOfFrequentWords = new TreeMap<>();

        String[] words = text.split("\s+");

        for (String word : words) {
            if (!mapOfFrequentWords.containsKey(word)) {
                mapOfFrequentWords.put(word, 1);
            } else {
                mapOfFrequentWords.put(word, mapOfFrequentWords.get(word) + 1);
            }
        }

        Map<String, Integer> sorted = mapOfFrequentWords
                .entrySet()
                .stream()
                .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
                .limit(mostFrequentWordsNumber)
                .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (e1, e2) -> e2,
                                LinkedHashMap::new));
        System.out.println(sorted);

Result will be: {good=3, Very=2, dogs.=2, text=2, very=2}. You can change it to case insensitive to get rid of Very and very as different words.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...