I created the word embedding vector for sentiment analysis. But I'm not sure about the code I wrote. If you see my mistakes while creating Word2vec or embedding matrix, please let me know.
EMBEDDING_DIM=100
review_lines = [sub.split() for sub in reviews]
model = gensim.models.Word2Vec(sentences=review_lines,size=EMBEDDING_DIM,window=6,workers=6,min_count=3,sg=1)
print('Words close to the given word:',model.wv.most_similar('film'))
words=list(model.wv.vocab)
print('Words:' , words)
file_name='embedding_word2vec.txt'
model.wv.save_word2vec_format(file_name,binary=False)
embeddings_index = {}
f=open(os.path.join('','embedding_word2vec.txt'),encoding="utf-8")
for line in f:
values =line.split()
word=values[0]
coefs=np.asarray(values[1:],dtype='float32')
embeddings_index[word]=coefs
f.close()
print("Number of word vectors found:",len(embeddings_index))
embedding_matrix = np.zeros((len(word_index)+1,EMBEDDING_DIM))
for word , i in word_index.items():
embedding_vector= embeddings_index.get(word)
if embedding_vector is not None:
embedding_matrix[i]=embedding_vector
OUTPUT:
array([[ 0. , 0. , 0. , ..., 0. ,
0. , 0. ],
[ 0.1029947 , 0.07595579, -0.06583303, ..., 0.10382118,
-0.56950015, -0.17402627],
[ 0.13758609, 0.05489254, 0.0969701 , ..., 0.18532865,
-0.49845088, -0.23407038],
...,
[ 0. , 0. , 0. , ..., 0. ,
0. , 0. ]])
question from:
https://stackoverflow.com/questions/65885615/why-are-there-rows-with-all-values-0-in-the-embedding-matrix 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…