Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
327 views
in Technique[技术] by (71.8m points)

keras - Audio LSTM repeats sequences

I am trying to make an audio generator neural net (sort of like Jukebox, but obviously on a much smaller scale since I'm doing this just for fun). The way I am doing this is by using an autoencoder to encode 512-sample audio into 64-dimension vectors. This part works fine. Now I want to be able to generate new audio by training an LSTM on actual music encoded as those 64-dimension vectors and then predicting the next vector. After that I can just decode the generated vectors and have a new audio sample. I tried to train an LSTM but the output keeps just making a drone, probably because it's repeating the same output over and over again. Is it a problem with my model or with my generation code?

Model (encoded_con is a bunch of encoded audio in the shape (n, 64)):

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout

split = int(len(encoded_con) * 0.8)
train = encoded_con[:split]
test = encoded_con[split:]
train_gen = keras.preprocessing.sequence.TimeseriesGenerator(train, train, 256)
test_gen = keras.preprocessing.sequence.TimeseriesGenerator(test, test, 256)

%load_ext tensorboard
regressor = Sequential()

regressor.add(LSTM(units = 64, return_sequences = True, input_shape = (256, 64)))
regressor.add(Dropout(0.2))
regressor.add(LSTM(units = 128))
regressor.add(Dropout(0.2))
regressor.add(TimeDistributed(Dense(units = 64)))

regressor.compile(optimizer = 'adam', loss = 'mean_squared_error', metrics=['mae', 'acc'])
print(regressor.summary())
earlystop = keras.callbacks.EarlyStopping(monitor='loss', 
                                    min_delta=0, 
                                    patience=250, 
                                    verbose=0, 
                                    mode='auto', 
                                    baseline=None, 
                                    restore_best_weights=True)
logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = keras.callbacks.TensorBoard(logdir, histogram_freq=1)

%tensorboard --logdir logs
regressor.fit(train_gen, validation_data=test_gen, epochs=30, callbacks=[earlystop, tensorboard_callback])

Generation code (dataset_scaled is just a 1-d array of audio samples scaled between 0 and 1):

from sklearn.preprocessing import MinMaxScaler
from keras import layers
autoencoder = keras.models.load_model(r"/content/drive/My Drive/audio_autoencoder2", compile=False)
encoder = keras.models.load_model(r"/content/drive/My Drive/audio_autoencoder_enc", compile=False)
inputs = layers.Input(shape=(64))
decode = autoencoder.layers[2](inputs)
decode = autoencoder.layers[3](decode)
decoder = keras.Model(inputs, decode)

sc = MinMaxScaler(feature_range = (0, 1))
dataset_scaled = sc.fit_transform(dataset.reshape(-1,1))

generated = dataset_scaled[512000:1024000]
for i in tqdm(range(200)):
  encoded = encoder.predict(generated[-512*256:].reshape((256,1,512)))
  print(encoded.shape)
  predicted_sample = regressor.predict(encoded.reshape((1,256,64)))
  print(predicted_sample.shape)
  predicted_decoded = decoder.predict(predicted_sample)
  print(predicted_decoded.shape)
  generated = np.append(generated, predicted_decoded)
  generated = generated.flatten()
import soundfile
new_audio = sc.inverse_transform(generated.reshape(-1,1)).flatten()
new_audio_scaled = new_audio/np.max(np.abs(new_audio))
#soundfile.write("/content/new_out.wav", new_audio_scaled, 44100)

EDIT: While I was training, I noticed the accuracy was being weird; the training and validation accuracy both hovered close to 1 (blue is train, pink is validation). The loss curve looked normal though. accuracy graph


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...