I'm having an issue with stemCompletion. Here is a reproducible example.
library("tm")
library("SnowballC")
text <- "communicate. communicate Communicates communicating Communication 1"
corpus <- Corpus(VectorSource(text))
toSpace <- content_transformer(function (x, pattern) gsub(pattern, " ", x))
corpus <- tm_map(corpus, toSpace, "/")
corpus <- tm_map(corpus, toSpace, "@")
corpus <- tm_map(corpus, toSpace, "\|")
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, content_transformer(removeNumbers))
corpus <- tm_map(corpus, removeWords, stopwords("english"))
corpus <- tm_map(corpus, content_transformer(removePunctuation))
corpus <- tm_map(corpus, content_transformer(stripWhitespace))
dictionary <- corpus # save this to use as a dictionary for stemCompletion
stemmed_corpus <- tm_map(corpus, content_transformer(stemDocument), language="english")
stemmed_corpus[[1]][1] # confirm words are stemmed properly
dictionary[[1]][1] # confirm the dictionary has complete words
# this is the part that's not working as expected:
completed_corpus <- tm_map(stemmed_corpus, content_transformer(stemCompletion), dictionary=dictionary, type=c("prevalent"))
inspect(completed_corpus)
inspect(completed_corpus) returns the stem ("communic") five times and one NA value.
What I'm aiming to get is the completed stems ("communicate" five times).
Thanks in advance for any suggestions.
question from:
https://stackoverflow.com/questions/65924820/r-tm-package-stemcompletion-returns-stems-and-na-instead-of-completed-stems 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…