I have a file like this:
This is a file with many words.
Some of the words appear more than once.
Some of the words only appear one time.
I would like to generate a two-column list. The first column shows what words appear, the second column shows how often they appear, for example:
this@1
is@1
a@1
file@1
with@1
many@1
words3
some@2
of@2
the@2
only@1
appear@2
more@1
than@1
one@1
once@1
time@1
- To make this work simpler, prior to processing the list, I will remove all punctuation, and change all text to lowercase letters.
- Unless there is a simple solution around it,
words
and word
can count as two separate words.
So far, I have this:
sed -i "s/ /
/g" ./file1.txt # put all words on a new line
while read line
do
count="$(grep -c $line file1.txt)"
echo $line"@"$count >> file2.txt # add word and frequency to file
done < ./file1.txt
sort -u -d # remove duplicate lines
For some reason, this is only showing "0" after each word.
How can I generate a list of every word that appears in a file, along with frequency information?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…